WO2024193704A1

WO2024193704A1 - Guide nucleic acids targeting dmd and uses thereof

Info

Publication number: WO2024193704A1
Application number: PCT/CN2024/083354
Authority: WO
Inventors: Guoling LI
Original assignee: Huidagene Therapeutics Co., Ltd.; Huidagene Therapeutics (Singapore) Pte. Ltd.
Priority date: 2023-03-22
Filing date: 2024-03-22
Publication date: 2024-09-26

Abstract

Provided herein are guide nucleic acids targeting DMD, systems comprising the same, and methods using the same for treating DMD associated diseases.

Description

GUIDE NUCLEIC ACIDS TARGETING DMD AND USES THEREOF

REFERENCE TO RELATED APPLICATIONS

The instant application claims the priority to and the benefit of the filing dates of PCT/CN2023/083192, filed on March 22, 2023, and PCT/CN2023/091710, filed on April 28, 2023, the entire contents of which, including any drawings and sequence listing, are incorporated herein by reference.

REFERENCE TO AN ELECTRONIC SEQUENCE LISTING

The disclosure contains a Sequence Listing XML file which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. Said XML copy, created on March 21, 2024, by software “WIPO Sequence” according to WIPO Standard ST. 26, is named HGP029PCT. xml, and is 122, 529 bytes in size.

According to WIPO Standard ST. 26, symbol “t” is used to denote both T in DNA and U in RNA. Thus, in the instant sequence listing prepared according to ST. 26, wherever a sequence is an RNA, the T in the sequence shall be deemed as U.

BACKGROUND

DMD gene encodes dystrophin protein, which plays a critical role in maintaining the structure of muscle fibers. Mutations in this gene can lead to Duchenne muscular dystrophy (DMD) , a devastating disorder characterized by progressive muscle weakness and wasting. DMD is an X-linked recessive disorder that primarily affects males, with symptoms typically appearing in early childhood. It would be desired to develop therapies to treat or even cure DMD.

Citation or identification of any document in the disclosure is not an admission that such a document is available as prior art to the disclosure. Each of the references mentioned or cited in the disclosure is incorporated by reference in its entirety.

SUMMARY

The disclosure provides guide nucleic acid targeting DMD and systems and methods using the same, which efficiently guides gene editing protein to DMD gene, achieving efficient gene editing that leads to restoration of dystrophin expression and muscle performance.

In an aspect, the disclosure provides a guide nucleic acid comprising a guide sequence capable of hybridizing to a target sequence on a target strand of a DMD gene; wherein the protospacer sequence on the nontarget strand of the DMD gene corresponding to the target sequence is located at or within an exon of the DMD gene, or at or within a splice donor or a splice acceptor of the exon; wherein the exon is selected from the group consisting of Exon 43, Exon 44, Exon 45, Exon 46, Exon 51, Exon 53, Exon 55.

In some embodiments, the protospacer sequence is immediately 3’ to a protospacer adjacent motif (PAM) of 5’-NTN-3’, wherein N is A, T, G, or C.

In some embodiments, the guide sequence comprises (1) a sequence of any one of SEQ ID NOs: 36, 1-35, and 37-51 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to any one of SEQ ID NOs: 36, 1-35, and 37-51 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to any one of SEQ ID NOs: 36, 1-35, and 37-51.

In some embodiments, the guide nucleic acid comprises a scaffold sequence capable of forming a complex with a nucleic acid programmable binding protein (napBP) , and wherein the hybridization of the guide sequence to the target sequence guides the complex to the DMD gene.

In some embodiments, the scaffold sequence has substantially the same secondary structure as the secondary structure of the sequence of SEQ ID NO: 52 or 53; or wherein the scaffold sequence comprises (1) a sequence of SEQ ID NO: 52 or 53 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: 52 or 53 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: 52 or 53.

In some embodiments, the guide nucleic acid comprises a sequence having a sequence identity of at least about 80%(e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of any one of SEQ ID NOs: 71-125; or a sequence having at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotide differences, whether consecutive or not, compared to the sequence of any one of SEQ ID NOs: 71-125.

In another aspect, the disclosure provides a polynucleotide comprising or encoding one or more (e.g., two, three) copies of the guide nucleic acid of the disclosure.

In yet another aspect, the disclosure provides a system comprising:

(1) the guide nucleic acid of the disclosure, or a polynucleotide encoding the guide nucleic acid; and

(2) a nucleic acid programmable binding protein (napBP) , or a polynucleotide encoding the napBP.

In some embodiments, the napBP is capable of recognizing a protospacer adjacent motif (PAM) of 5’-NTN-3’ immediately 5’ to a protospacer sequence on the nontarget strand of the DMD gene, wherein N is A, T, G, or C.

In some embodiments, the napBP is a nucleic acid programmable DNA endonuclease (napDNAn) .

In some embodiments, the napDNAn is a Cas9 endonuclease, a Cas12 endonuclease, or an IscB endonuclease.

In some embodiments, the napBP comprises a sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of SEQ ID NO: 55 or 57 or a N-terminal truncation thereof without the first N-terminal Methionine. In some embodiments, the napBP retains at least 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) of the guide sequence-specific DNA endonuclease activity of the sequence of SEQ ID NO: 55 or 57.

In yet another aspect, the disclosure provides a vector comprising the polynucleotide of the disclosure.

In some embodiments, the vector is a plasmid, an adeno-associated virus (AAV) vector, a retroviral vector, an adenoviral vector, or a lentiviral vector.

In yet another aspect, the disclosure provides a recombinant adeno-associated virus (rAAV) vector genome comprising:

(1) a first polynucleotide sequence comprising a sequence encoding a guide nucleic acid of the disclosure; and

(2) a second polynucleotide sequence comprising a sequence encoding a nucleic acid programmable binding protein (napBP) ,

wherein the rAAV vector genome is adapted to be encapsulated into a rAAV particle.

In some embodiments, the rAAV vector genome comprises a sequence having a sequence identity of at least about 80%to the sequence of SEQ ID NO: 69; or comprises the sequence of SEQ ID NO: 69.

The rAAV vector genome of the disclosure, wherein the rAAV vector genome comprises, from the 5’ to 3’,

(1) the 5’ ITR of SEQ ID NO: 58;

(2) the promoter of SEQ ID NO: 59;

(3) the scaffold sequence of SEQ ID NO: 53;

(4) the guide sequence of SEQ ID NO: 36;

(5) the scaffold sequence of SEQ ID NO: 53;

(6) the guide sequence of SEQ ID NO: 36;

(7) the scaffold sequence of SEQ ID NO: 53;

(8) the guide sequence of SEQ ID NO: 36;

(9) the promoter of SEQ ID NO: 60;

(10) the Kozak sequence of gccacc;

(11) the in-frame start codon ATG;

(12) the first sequence encoding the first NLS of SEQ ID NO: 62;

(13) the sequence encoding the napBP of SEQ ID NO: 57;

(14) the second sequence encoding the second NLS of SEQ ID NO: 65;

(15) an in-frame stop codon,

(16) the WPRE sequence of SEQ ID NO: 66;

(17) the sequence encoding a polyA signal of SEQ ID NO: 67; and

(18) the 3’ ITR;

wherein the rAAV vector genome comprises a sequence having a sequence identity of at least about 80%to the sequence of SEQ ID NO: 69;

wherein the rAAV vector genome does not contain any nucleotide difference from the sequence of SEQ ID NO: 69 in any one of the components (1) to (18) ;

wherein the rAAV vector genome contains one or more nucleotide difference or does contain any nucleotide difference from the sequence of SEQ ID NO: 69 between any two adjacent components of the components (1) to (18) .

In yet another aspect, the disclosure provides a recombinant AAV (rAAV) particle comprising the rAAV vector genome of the disclosure.

In some embodiments, the rAAV particle comprising a capsid with serotype of wild type AAV9.

In yet another aspect, the disclosure provides a method for production of the rAAV particle of the disclosure, comprising culturing in a host cell a transgene plasmid comprising the rAAV vector genome of the disclosure.

In yet another aspect, the disclosure provides a cell comprising a transgene plasmid comprising the rAAV vector genome of the disclosure for the production of a rAAV particle.

In yet another aspect, the disclosure provides a pharmaceutical composition comprising (1) the system of the disclosure or the rAAV particle of the disclosure, and (2) a pharmaceutically acceptable excipient.

In yet another aspect, the disclosure provides a method of modifying expression of a DMD gene, comprising contacting the DMD gene with the system of the disclosure, the rAAV particle of the disclosure, or the pharmaceutical composition of the disclosure.

In yet another aspect, the disclosure provides a cell or a progeny thereof comprising the guide nucleic acid of the disclosure, the system of the disclosure, or the rAAV particle of the disclosure.

In yet another aspect, the disclosure provides a cell or a progeny thereof comprising DMD gene modified by the system of the disclosure, the rAAV particle of the disclosure, or the method of the disclosure.

In yet another aspect, the disclosure provides a method for preventing, diagnosing, or treating a DMD associated disease in a subject in need thereof, comprising administering to the subject the system of the disclosure, the rAAV particle of the disclosure, or the pharmaceutical composition of the disclosure, wherein the napBP modifies DMD gene, and wherein the modification of the DMD gene treats the disease.

In some embodiments, the DMD associated disease is Duchenne muscular dystrophy.

The details of one or more embodiments of the disclosure are set forth in the description below. Other features or advantages of the disclosure will be apparent from the following drawings and detailed description of several embodiments, and also from the appended claims. It is understood that any aspect or embodiment of the disclosure can be combined with any other one or more aspects or embodiments of the disclosure, including aspects or embodiments only described in one sub-section, only in the examples, or only in the claims, to constitute another embodiment explicitly or implicitly disclosed herein unless otherwise indicated.

BRIEF DESCRIPTION OF THE DRAWINGS

An understanding of the features and advantages of the disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure may be utilized, and the accompanying drawings of which:

FIG. 1 is a schematic showing an exemplary target dsDNA, an exemplary guide RNA, and an exemplary napDNAbp.

FIG. 2 is another schematic showing an exemplary target dsDNA, an exemplary guide RNA, and an exemplary napDNAbp.

FIG. 3 is a schematic showing an exemplary target dsDNA, an exemplary transcript (target RNA) transcribed from the target dsDNA, an exemplary guide RNA, and an exemplary napRNAbp.

FIG. 4 is another schematic showing an exemplary target dsDNA, an exemplary transcript (target RNA) transcribed from the target dsDNA, an exemplary guide RNA, and an exemplary napRNAbp, wherein the guide sequence contains a mismatch with the target sequence.

FIG. 5 shows average (n=3) editing efficiency (indels %) for DMD-targeting guide sequences G1-G30 and G39-G51.

FIG. 6 shows a single-cut gene editing strategy for treating Duchenne muscular dystrophy via a single AAV vector. Schematic of the therapeutic vector and single-cut strategy. The removal of one exon by hfCas12Max system (All-in-one AAV) editing applies to approximately 83%of all DMD patients. One single gRNA is designed to target the vicinity of the intron-exon boundary and splice signal sequences, restoring the correct ORF by either exon skipping or exon reframing event.

FIG. 7 shows generation of novel humanized DMD mice (model) with human exon51.

FIG. 8 shows gRNA design and screening in vitro for editing of DMD gene. FIG. 8A. Flow diagram for detection of genome editing efficiency by transfected with plasmid encoding hfCas12Max and designed gRNAs, followed by FACS and NGS analysis. FIG. 8B-8C. Three guide sequence (G31-G33) targeting splicing acceptor (SA) and five guide sequence (G34-G38) targeting splicing donor (SD) of DMD Exon 51 were designed. NGS sequencing of the DMD gene Exon51 shows genomic indel (%) (editing efficiency) after transfection with each of the 7 gRNAs in HEK293T (B) and iPSC (C) , respectively. FIG. 8D. According to PEM-seq analysis, the top 12 potential off-target sites have been observed by amplicon sequencing in HEK293T cells to assess off-target editing for G36. FIG. 8E. Indels at the gRNA-dependent off-target sites (OT1 to OT11) were also confirmed in HEK293T cells by in silico Cas-OFFinder to assess off-target editing for G36. Data are represented as mean ± SEM (n=3) .

FIG. 9 shows rescue of dystrophin expression by intramuscular (IM) injection in tibialis anterior (TA) muscle 4-week post-injection of the rAAV9 particles with gRNA configurations “Dg” , “DgD” , “DgDgD” , or “DgDgDg” intoΔmE51E52, hE51KI mouse model in a fixed dose of2.5E11 vg .

FIG. 10 shows rescue of dystrophin expression by systemic delivery of the rAAV particles with gRNA configuration “DgDgDg” .

FIG. 11 and FIG. 12 shows rescue of dystrophin expression by tail vein injection of the rAAV particles with gRNA configuration “DgDgDg” .

The figures herein are for illustrative purposes only and are not necessarily drawn to scale.

Definitions

The disclosure will be described with respect to particular embodiments, but the disclosure is not limited thereto in any respect. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which this disclosure belongs. Terms as set forth hereinafter are generally to be understood in their plain and ordinary meaning or common sense unless indicated otherwise.

Overview

Nucleic acid programmable binding protein (napBP) , for example, nucleic acid programmable DNA binding protein, (napDNAbp) , such as Cas9, Cas12, IscB, nucleic acid programmable RNA binding protein (napRNAbp) , such as, Cas13, is capable of binding to a target nucleic acid (e.g., dsDNA, mRNA) as guided by a guide nucleic acid (e.g., a guide RNA) comprising a guide sequence targeting the target nucleic acid. In some embodiments, the target nucleic acid is eukaryotic.

Without wishing to be bound by theory, in some embodiments, the guide nucleic acid comprises a scaffold sequence responsible for forming a complex with the napBP, and a guide sequence that is intentionally designed to be responsible for hybridizing to a target sequence of the target nucleic acid, thereby guiding the complex comprising the napBP and the guide nucleic acid to the target nucleic acid.

Referring to FIG. 1, an exemplary target dsDNA (e.g., DMD gene) is depicted to comprise a 5’ to 3’ single DNA strand and a 3’ to 5’ single DNA strand.

An exemplary guide nucleic acid (e.g., a guide RNA) is depicted to comprise a guide sequence and a scaffold sequence. The guide sequence is designed to hybridize to a part of the 3’ to 5’ single DNA strand, and so the guide sequence “targets” that part. And thus, the 3’ to 5’ single DNA strand is referred to as a “target strand (TS) ” of the target dsDNA, while the opposite 5’ to 3’ single DNA strand is referred to as a “nontarget strand (NTS) ” of the target dsDNA. That part of the target strand based on which the guide sequence is designed and to which the guide sequence may hybridize is referred to as a “target sequence” , while the opposite part on the nontarget strand corresponding to that part is referred to as the “protospacer sequence” , which is 100% (fully) reversely complementary to the target sequence and is said to be “corresponding to” the target sequence in the disclosure.

Referring to FIG. 3, an exemplary target dsDNA (e.g., DMD gene) is depicted to comprise a 5’ to 3’ single DNA strand and a 3’ to 5’ single DNA strand. According to conventional transcription process, an exemplary target RNA (transcript, e.g., a pre-mRNA) may be transcribed using the 3’ to 5’ single DNA strand as a synthesis template, and thus the 3’ to 5’ single DNA strand is referred to as a “template strand” or a “antisense strand” . The transcript so transcribed has the same primary sequence as the 5’ to 3’ single DNA strand except for the replacement of T with U, and thus the 5’ to 3’ single DNA strand is referred to as a “coding strand” or a “sense strand” .

An exemplary guide nucleic acid (e.g., a guide RNA) is depicted to comprise a guide sequence and a scaffold sequence. The guide sequence is designed to hybridize to a part of the transcript (target RNA) , and so the guide sequence “targets” that part. And thus, that part of the target RNA based on which the guide sequence is designed and to which the guide sequence may hybridize is referred to as a “target sequence” . In some embodiments, the guide sequence is 100% (fully) reversely complementary to the target sequence. In some other embodiments, the guide sequence is reversely complementary to the target sequence and contains a mismatch with the target sequence (as exemplified in FIG. 4) .

Generally, as is conventional in the art, a nucleic acid sequence (e.g., a DNA sequence, an RNA sequence) is written in 5’ to 3’ direction /orientation unless explicitly indicated otherwise.

For example, for a DNA sequence of ATGC, it is usually understood as 5’-ATGC-3’ unless otherwise indicated. Its reverse sequence is 5’-CGTA-3’. Its fully complementary sequence is 5’-TACG-3’. Its fully reverse complementary sequence is 5’-GCAT-3’. Note that the fully complementary sequence usually does not have the ability to base-pair /hybridize with the original sequence.

Generally, the double-strand sequence of a dsDNA may be represented with the sequence of its 5’ to 3’ single DNA strand conventionally written in 5’ to 3’ direction /orientation unless otherwise indicated.

For example, for a dsDNA having a 5’ to 3’ single DNA strand of 5’-ATGC-3’ a nd a 3’ to 5’ single DNA strand of 3’-TACG-5’, the dsDNA may be simply represented as 5’-ATGC-3’.

5’-----ATGC -----3’

3’-----TACG -----5’

It should be noted that either the 5’ to 3’ single DNA strand or the 3’ to 5’ single DNA strand of a dsDNA can be a nontarget strand from which a protospacer sequence is selected.

Generally, for a gene as a dsDNA, the 5’ to 3’ single DNA strand is the sense strand of the gene, and the 3’ to 5’ single DNA strand is the antisense strand of the gene. It should be noted that either the sense strand or the antisense strand of a gene can be a nontarget strand from which a protospacer sequence is selected.

Normally, the transcript (target RNA) transcribed from the dsDNA then has a (target) sequence of 5’-AUGC-3’.

To hybridize to a target dsDNA, in one embodiment, the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-AUGC-3’ that is fully reversely complementary to the 3’ to 5’ strand of the target dsRNA, which would be set forth in ATGC in the electric sequence listing butmarked as an RNA sequence; and in another embodiment, the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-GCAU-3’ that is fully reversely complementary to the 5’ to 3’ strand of the target dsRNA, which would be set forth in GCAT in the electric sequence listing but marked as an RNA sequence.

In the case that the guide sequence of a guide nucleic acid is fully reversely complementary to the target sequence and the target sequence is fully reversely complementary to the protospacer sequence, the guide sequence is identical to the protospacer sequence except for the U in the guide sequence due to its RNA nature and correspondingly the T in the protospacer sequence due to its DNA nature. According to WIPO standard ST.26, symbol “t” is used to denote both T in DNA and U in RNA (See “Table 1: List of nucleotides symbols” , the definition of symbol “t” is “thymine in DNA/uracil in RNA (t/u) ” ) . Thus, in the electronic sequence listing of the disclosure prepared according to ST. 26, such a guide sequence could be set forth in the same sequence as a corresponding protospacer sequence. For convenience, a single SEQ ID NO in the electronic sequence listing can be used to denote both such guide sequence and protospacer sequence, regardless whether such a single SEQ ID NO is marked as DNA or RNA in the electronic sequence listing. When a reference is made to such a SEQ ID NO that sets forth a protospacer /guide sequence, it refers to either a protospacer sequence that is a DNA sequence or a guide sequence that is an RNA sequence depending on the context, no matter whether it is marked as a DNA or an RNA in the electronic sequence listing.

To hybridize to the target RNA, in one embodiment, the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-GCAU-3’ that is fully reversely complementary to the (target) sequence of the target RNA, which would be set forth in GCAT in the electric sequence listing but marked as an RNA sequence.

Term

As used herein, if a DNA sequence, for example, 5’-ATGC-3’ is transcribed to an RNA sequence, with each dT (deoxythymidine, or “T” for short) in the primary sequence replaced with a U (uridine) and other dA (deoxyadenosine, or “A” for short) , dG (deoxyguanosine, or “G” for short) , and dC (deoxycytidine, or “C” for short) replaced with A (adenosine) , G (guanosine) , and C (cytidine) , respectively, for example, 5’-AUGC-3’, it is said in the disclosure that the DNA sequence “encodes” the RNA sequence.

As used herein, the term “activity” refers to a biological activity. In some embodiments, the activity includes enzymatic activity, e.g., catalytic ability of an effector. For example, the activity can include nuclease activity, e.g., dsDNA endonuclease activity, RNA endonuclease activity.

As used herein, the term “nucleic acid programmable binding protein (napBP) ” may be used interchangeably with “nucleic acid programmable binding domain (napBD) ” to refer to a protein that can associate (e.g., bind) with a programmable nucleic acid (e.g., DNA or RNA) , such as a guide nucleic acid (e.g., gRNA) , that is able to be programmed to guide the protein to a specific sequence of a target nucleic acid via the interaction (e.g., hybridization) between the programmable nucleic acid (e.g., the guide sequence of the programmable nucleic acid) and the target nucleic acid (e.g., the target sequence of the target nucleic acid) . The napBP may be indirectly associated with (e.g., bound to) the target nucleic acid via the interaction (e.g., binding) between the napBP and the programmable nucleic acid (e.g., scaffold sequence of the programmable nucleic acid) and the interaction (e.g., hybridization) between the programmable nucleic acid (e.g., the guide sequence of the programmable nucleic acid) and the target nucleic acid (e.g., the target sequence of the target nucleic acid) . In some embodiments, the napBP is a nucleic acid programmable DNA binding protein (napDNAbp) . In some embodiments, the napBP is a nucleic acid programmable RNA binding protein (napRNAbp) .

As used herein, the term “complex” refers to a grouping of two or more molecules. In some embodiments, the complex comprises a polypeptide and a nucleic acid interacting with (e.g., binding to, coming into contact with, adhering to) one another. As used herein, the term “complex” can refer to a grouping of a guide nucleic acid and a polypeptide (e.g., a napBP) . As used herein, the term “complex” can refer to a grouping of a guide nucleic acid, a polypeptide (e.g., a napBP) , and a target nucleic acid.

As used herein, the term “protospacer adjacent motif’ or “PAM” refers to a short DNA sequence (or a DNA motif) adjacent to a protospacer sequence on the nontarget strand of a dsDNA. As used herein, the term “adjacent” includes instances wherein there is no nucleotide between the protospacer sequence and the PAM and also instances wherein there are a small number (e.g., 1, 2, 3, 4, or 5) of nucleotides between the protospacer sequence and the PAM. As used herein, A “immediately adjacent (to) ” B, A “immediately 5’ to” B, and A “immediately 3’ to” B mean that there is no nucleotide between A and B. In some embodiments, the PAM is immediately 5’ to a protospacer sequence. In some embodiments, the PAM is immediately 3’ to a protospacer sequence.

As used herein, the term “guide nucleic acid” refers to any nucleic acid that facilitates the targeting of a napBP to a target nucleic acid. For this purpose, the guide nucleic acid may be designed to include a guide sequence capable of hybridizing to a specific sequence of a target nucleic acid, and the guide nucleic acid may also comprise a scaffold sequence facilitating the guiding of a napBP to the target nucleic acid. In some embodiments, the guide nucleic acid is a guide RNA. In some embodiments, the guide nucleic acid is a nucleic acid encoding a guide RNA.

As used herein, the terms “nucleic acid” , “polynucleotide” , and “nucleotide sequence” are used interchangeably to refer to a polymeric formof nucleotides of any length, including deoxyribonucleotides, ribonucleotides, combinations thereof, and analogs or modifications thereof.

As used in the context of CRISPR-Cas techniques (e.g., CRISPR-Cas12 techniques) , the term “guide RNA” is used interchangeably with the term “CRISPR RNA (crRNA) ” , “single guide RNA (sgRNA) ” , or “RNA guide” , the term “guide sequence” is used interchangeably with the term “spacer sequence” , and the term “scaffold sequence” is used interchangeably with the term “direct repeat sequence” .

As described herein, the guide sequence is so designed to be capable of hybridizing to a target sequence. As used herein, the term “hybridize” , “hybridizing” , or “hybridization” refers to a reaction in which one or more polynucleotide sequences react to forma complex that is stabilized via hydrogen bonding between the bases of the polynucleotide sequences. The hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner. Apolynucleotide sequence capable of hybridizing to a given polynucleotide sequence is referred to as the “complement” of the given polynucleotide sequence. As used herein, the hybridization of a guide sequence and a target sequence is so stabilized to permit an effector polypeptide (e.g., a napBP) that is complexed with a nucleic acid comprising the guide sequence or a function domain associated (e.g., fused) with the effector polypeptide to act (e.g., cleave, deaminize) on the target sequence or its complement or nearby sequence.

For the purpose of hybridization, in some embodiments, the guide sequence is reversely complementary to a target sequence. As used herein, the term “reverse complementary” refers to the ability of nucleobases of a first polynucleotide sequence, such as a guide sequence, to base pair with nucleobases of a second polynucleotide sequence, such as a target sequence, by traditional Watson-Crick base-pairing. Two reverse complementary polynucleotide sequences are able to non-covalently bind under appropriate temperature and solution ionic strength conditions. In some embodiments, a first polynucleotide sequence (e.g., a guide sequence) comprises 100% (fully) reverse complementarity to a second nucleic acid (e.g., a target sequence) . In some embodiments, a first polynucleotide sequence (e.g., a guide sequence) is reverse complementary to a second polynucleotide sequence (e.g., a target sequence) if the first polynucleotide sequence comprises at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%complementarity to the second nucleic acid (i.e., at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%of the nucleotides of the first polynucleotide sequence can base-pair with the nucleotides of the second polynucleotide sequence) . As used herein, the term “substantially complementary” refers to a first polynucleotide sequence (e.g., a guide sequence) that has a certain level of complementarity to a second polynucleotide sequence (e.g., a target sequence) (e.g., at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%of the nucleotides of the first polynucleotide sequence can base-pair with the nucleotides of the second polynucleotide sequence, or at most 1, 2, 3, 4, or 5 contiguous or non-contiguous nucleotides of the first polynucleotide sequence mismatch the nucleotides of the second polynucleotide sequence) . In some embodiments, the level of complementarity is such that the first polynucleotide sequence (e.g., a guide sequence) can hybridize to the second polynucleotide sequence (e.g., a target sequence) with sufficient affinity to permit an effector polypeptide (e.g., a napBP) that is complexed with a nucleic acid comprising the first polynucleotide sequence or a function domain associated (e.g., fused) with the effector polypeptide to act (e.g., cleave, deaminize) on the target sequence or its complement or nearby sequence. In some embodiments, a guide sequence that is substantially complementary to a target sequence has less than 100%complementarity to the target sequence. In some embodiments, a guide sequence that is substantially complementary to a target sequence has at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% complementarity to the target sequence, and/or has at most 1, 2, 3, 4, or 5 contiguous or non-contiguous nucleotide mismatches from the target sequence.

As used herein, the term “sequence identity” is related to sequence homology. Homology comparisons may be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs may calculate percentage sequence identity (%) between two or more sequences (polypeptide or polynucleotide sequences) . Sequence homologies may be generated by any of a number of computer programs known in the art, for example, BLAST, FASTA. Asuitable computer program for carrying out such an alignment is the GCG Wisconsin Bestfit package (University of Wisconsin, U.S.A; Devereux et al., 1984, Nucleic Acids Research 12: 387) . Examples of other software than may perform sequence comparisons include, but are not limited to, the BLAST package (see Ausubel et al., 1999 ibid-Chapter 18) , FASTA (Atschul et al., 1990, J. Mol. Biol., 403-410) , and the GENEWORKS suite of comparison tools. Both BLAST and FASTA are available for offline and online searching (see Ausubel et al., 1999 ibid, pages 7-58 to 7-60) . Acommonly used online tool to calculate percentage sequence identity between two or more sequences (polypeptide or polynucleotide sequences) is available on the website of EMBL's European Bioinformatics Institute (www dot ebi dot ac dot uk slash jdispatcher slash) , allowing fast online calculation of percentage sequence identity by global alignment or local alignment.

As used herein, the terms “polypeptide” and “peptide” are used interchangeably herein to refer to polymers of amino acids of any length. Aprotein may have one or more polypeptides. An amino acid polymer can also be modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component.

As used herein, a “variant” is interpreted to mean a polynucleotide or polypeptide that differs from a reference polynucleotide or polypeptide, respectively, but retains essential properties, e.g., binding property of a napBP. Atypical variant of a polynucleotide differs in nucleic acid sequence from another reference polynucleotide. Achange in the nucleic acid sequence of the polynucleotide variant may or may not alter the amino acid sequence of a polypeptide encoded by the reference polynucleotide. A change in the nucleic acid sequence of the polynucleotide variant may result in an amino acid substitution, addition, and/or deletion in the polypeptide encoded by the reference polynucleotide. Atypical variant of a polypeptide differs in amino acid sequence from another reference polypeptide. Generally, the difference is limited so that the sequences of the reference polypeptide and the polypeptide variant are closely similar overall and, in many regions, identical. The polypeptide variant and reference polypeptide may differ in amino acid sequence by one or more substitutions, additions, and/or deletions in any combination. Avariant of a polynucleotide or polypeptide may be naturally occurring, such as, an allelic variant, or it may be a variant that is not known to occur naturally. Non-naturally occurring variants of polynucleotides and polypeptides may be made by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to skilled artisans.

As used herein, the terms “upstream” and “downstream” refer to the relative positions of two or more elements within a nucleic acid in 5’ to 3’ direction. Afirst sequence is upstream of a second sequence when the 3’ end of the first sequence is present at the left side of the 5’ end of the second sequence. Afirst sequence is downstream of a second sequence when the 5’ end of the first sequence is present at the right side of the 3’ end of the second sequence. In some embodiments, the PAM is upstream of a napBP-induced indel, and a napBP-induced indel is downstream of the PAM. In some embodiments, the PAM is downstream of a napBP-induced indel, and a napBP-induced indel is upstream of the PAM.

As used herein, the term “wild type” has the meaning commonly understood by those skilled in the art to mean a typical form of an organism, a strain, a gene, or a feature that distinguishes it from a mutant or variant when it exists in nature. It can be isolated from sources in nature and not intentionally modified.

As used herein, the terms “non-naturally occurring” and “engineered” are used interchangeably and refer to artificial participation. When these terms are used to describe a nucleic acid or a polypeptide, it is meant that the nucleic acid or polypeptide is at least substantially freed from at least one other component of its association in nature or as found in nature.

As used herein, the term “regulatory element” is intended to include promoters, enhancers, internal ribosome entry sites (IRES) , and other expression control elements (e.g., transcription termination signals, such as, polyadenylation signals and poly-U sequences) . Such regulatory elements are described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990) . Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cells and those that direct expression of a nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences) . Regulatory elements may also direct expression in a time-dependent manner, e.g., in a cell cycle-dependent or developmental stage-dependent manner, which may or may not be tissue or cell type specific.

As used herein, the term “cell” is understood to refer not only to a particular individual cell, but to the progeny or potential progeny of the cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term.

As used herein, the term “in vivo” refers to inside the body of an organism, and the terms “ex vivo” or “in vitro” means outside the body of an organism.

As used herein, the term “treat” , “treatment” , or “treating” is an approach for obtaining beneficial or desired results including clinical results. For purposes of the disclosure, the beneficial or desired clinical results include, but are not limited to, one or more of the following: alleviating one or more symptoms resulting from a disease, diminishing the extent of a disease, stabilizing a disease (e.g., delaying the worsening of a disease) , delaying the spread (e.g., metastasis) of a disease, delaying the recurrence of a disease, reducing recurrence rate of a disease, delay or slowing the progression of a disease, ameliorating a disease state, providing a remission (partial or total) of a disease, decreasing the dose of one or more other medications required to treat a disease, delaying the progression of a disease, increasing the quality of life, and/or prolonging survival. Also encompassed by “treatment” is a reduction of pathological consequence of a disease (such as cancer) . The methods of the disclosure contemplate any one or more of these aspects of treatment.

As used herein, the term “disease” includes the terms “disorder” and “condition” and is not limited to those have been specifically medically defined.

As used herein, reference to “not” a value or parameter generally means and describes “other than” a value or parameter. For example, the method is not used to treat cancer of type X means the method may be used to treat cancer of types other than X.

As used herein, the singular forms “a” , “an” , and “the” include plural referents unless the context clearly dictates otherwise.

As used herein, the term “and/or” in a phrase such as “A and/or B” is intended to include both A and B; A or B; A (alone) ; and B (alone) . Likewise, the term “and/or” in a phrase such as “A, B, and/or C” is intended to encompass each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C;A and B; B and C; A (alone) ; B (alone) ; and C (alone) .

As used herein, when the term “about” is ahead of a serious of numbers (for example, about 1, 2, 3) , it is understood that each of the serious of numbers is modified by the term “about” (that is, about 1, about 2, about 3) . The term “about X-Y” or “about X to Y” used herein has the same meaning as “about X to about Y. ”

It is understood that embodiments of the disclosure described herein include “consisting” and/or “consisting essentially of” embodiments.

It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely” , “only” , and the like in connection with the recitation of claim elements, or use of a “negative” limitation.

DETAILED DESCRIPTION

Overview

In some aspects, the disclosure provides tools and methods for treating DMD associated diseases, such as, Duchenne muscular dystrophy.

As used herein, the term “DMD associated diseases” includes diseases that are associated with DMD gene, the transcription (including transcript; e.g., increased or decreased transcription, nonfunctional or dysfunctional transcript) thereof, and/or the expression (including expression and expression product; e.g., increased or decreased expression, nonfunctional or dysfunctional expression product) thereof, including those that are directly or indirectly caused by the transcription (including transcription and transcript, e.g., DMD mRNA) of DMD gene, and those that are directly or indirectly caused by the expression (including expression product, e.g., dystrophin) of DMD gene. In some embodiments, the term “DMD associated disease” refers to a disease that is caused by abnormal expression of DMD gene. In some embodiments, the term “DMD associated disease” refers to a disease that is caused by the deficiency of a functional expression product of DMD gene.

As used herein, the term “transcript” includes any transcription product by transcription from a gene, including mRNA, non-coding RNA, and any variants, derivatives, or ancestors thereof, for example, pre-mRNA, and any transcripts or isoforms produced from the gene or the pre-mRNA by, e.g., alternative promoter usage, alternative splicing, alternative initiation, and any naturally occurring variants thereof or processed products therefrom. The transcript of the DMD gene is also termed as DMD transcript in the disclosure.

In some embodiments, the DMD gene is human DMD gene. Human wild type DMD gene has NCBI Gene ID: 1756. In some embodiments, the transcript is human DMD mRNA. In some embodiments, the expression product is human dystrophin. Human wild type dystrophin has NCBI Accession No.: XP_054182593.1 for isoform X1.

As an example, a nonsense mutation of human DMD gene may lead to abnormal expression of DMD gene that fails to generate functional dystrophin and therefore causes DMD associated diseases.

The disclosure provides, in some embodiments, tools for modifying the expression of such a mutated DMD gene by cleaving (e.g., relying on the function of Cas endonuclease) and repairing (e.g., relying on the self-repairing mechanism of cells) the mutated DMD gene, thereby incorporating an indel mutation into the mutated DMD gene that leads to the expression of a functional dystrophin, which may be a dystrophin mutant that is sufficiently functional as wild type dystrophin. The tools can be delivered, for example, by rAAV vectors to subjects in need. In some embodiments, the tools include guide nucleic acids comprising a guide sequence designed to be capable of hybridizing to the mutated DMD gene, thereby guiding a complex comprising the guide nucleic acid and a napBP to the mutated DMD gene and modifying the mutated DMD gene, leading to the generation of functional dystrophin from the modified DMD gene. In the case that the abnormal DMD expression is properly modified, it is reasonably expected that the syndromes (e.g., syndromes of Duchenne muscular dystrophy) caused by the abnormal DMD expression can be alleviated.

Guide nucleic acid

In an aspect, the disclosure provides a guide nucleic acid comprising a guide sequence capable of hybridizing to a target sequence on a target strand of a DMD gene or a target sequence on a transcript of a DMD gene.

In some embodiments, the guide nucleic acid comprises a scaffold sequence capable of forming a complex with a nucleic acid programmable binding protein (napBP) , and wherein the hybridization of the guide sequence to the target sequence guides the complex to the DMD gene or the transcript.

The components of the guide nucleic acid are described more specifically in the other sub-sections herein.

System

In another aspect, the disclosure provides a system comprising:

(1) a guide nucleic acid, or a polynucleotide (e.g., a DNA, an RNA, a DNA/RNA mixture) encoding the guide nucleic acid, comprising:

(a) a scaffold sequence capable of forming a complex with a nucleic acid programmable binding protein (napBP) , and

(b) a guide sequence capable of hybridizing to a target sequence on a target strand of a DMD gene or a target sequence on a transcript of a DMD gene, thereby guiding the complex to the DMD gene or the transcript; and

(2) the napBP or a polynucleotide encoding the napBP.

In some embodiments, the system is a complex comprising the napBP complexed with the guide nucleic acid. In some embodiments, the complex further comprises the DMD gene or transcript thereof hybridized with the guide sequence.

In some embodiments, the system is a composition comprising the component (1) and the component (2) .

The components of the system are described more specifically in the other sub-sections herein.

rAAV vector genome

The disclosure provides various delivery of the system of the disclosure, for example, delivery via a rAAV vector. In some embodiments, the disclosure provides a recombinant adeno-associated virus (rAAV) vector genome encoding or comprising the system of the disclosure. In some embodiments, the rAAV vector genome is a DNA (e.g., a ssDNA, a dsDNA) or an RNA.

Thus, in yet another aspect, the disclosure provides a recombinant adeno-associated virus (rAAV) vector genome (e.g., a DNA rAAV vector genome, an RNA rAAV vector genome) comprising:

(1) a first polynucleotide sequence comprising a sequence encoding a guide nucleic acid comprising:

(2) a second polynucleotide sequence comprising a sequence encoding the napBP,

wherein the rAAV vector genome is adapted to be encapsulated into a rAAV particle (e.g., a DNA-encapsulated rAAV particle, an RNA-encapsulated rAAV particle) .

The components of the rAAV vector genome are described more specifically in the other sub-sections herein.

Modification method

In yet another aspect, the disclosure provides a method of modifying expression of a DMD gene, comprising contacting the DMD gene or transcript thereof with a system comprising:

(b) a guide sequence capable of hybridizing to a target sequence on a target strand of a DMD gene or a target sequence on a transcript of a DMD gene, thereby guiding the complex to the DMD gene or the transcript and modifying the expression of the DMD gene; and

(2) the napBP or a polynucleotide encoding the napBP.

In some embodiments, the modification of the expression of the DMD gene treats, detects, or diagnose a DMD associated disease. In some embodiments, the modification of the expression of the DMD gene detects or diagnose the presence, progress, or development of a DMD associated disease.

The components of the method are described more specifically in the other sub-sections herein.

Mechanism

The disclosure provides various mechanisms for modifying expression of a DMD gene for various purposes, including, but not limited to, treatment or diagnosis of DMD associated diseases. The modification could be on gene level, transcription level, transcript level, or translation level. In some embodiments, the modification leads to expression of a functional expression product from the DMD gene so as to treat DMD associated diseases.

As used herein, the phrase “modifying expression of a DMD gene” includes, but not limited to, modifications of any one or more stages of the transcription and translation procedure starting from the DMD gene to an expression product of the DMD gene, e.g., dystrophin. Non-limiting examples of the modifications include:

(1) modifying the DMD gene, such as, cleaving the DMD gene (and thereby introducing an indel mutation into the DMD gene) , changing (e.g., adding, deleting, substituting) any one or more nucleotides of the DMD gene by, for example, base editing, prime editing;

(2) modifying the transcription of the DMD gene, such as, increasing or decreasing the transcription by, for example, methylating or demethylating the DMD gene by epigenetic modification;

(3) modifying the transcript of the DMD gene, such as, a DMD mRNA, for example, cleaving (and typically degrading) the transcript, changing (e.g., adding, deleting, substituting) any one or more nucleotides of the transcript by, for example, base editing; and

(4) modifying the translation of the transcript (e.g., a DMD mRNA) of the DMD gene, such as, increasing or decreasing the translation of the transcript by, for example, epigenetic modification.

DNA modification

Provided in the disclosure are multiple ways to modify the expression of DMD gene on DNA level. In some embodiments, the DMD gene comprises a mutation. In some embodiments, the DMD gene is a human DMD gene comprising a mutation. Various DMD mutation that causes a DMD associated disease is known in the art. In some embodiments, the DMD gene comprises a nonsense mutation. In some embodiments, the napBP is a nucleic acid programmable DNA binding protein (napDNAbp) .

DNA cleavage

Provided in the disclosure are tools and methods using DNA cleavage to modify the expression of DMD gene.

In some embodiments, the napDNAbp is a nucleic acid programmable dsDNA endonuclease (napDNAn) .

In some embodiments, guiding the complex to the DMD gene enables the napDNAn to specifically cleave the DMD gene in a guide sequence-specific manner.

In some embodiments, the specific cleavage of the DMD gene leads to incorporation of an insertion and/or deletion (an indel) mutation into the DMD gene.

In some embodiments, the specific cleavage of the DMD gene or the insertion and/or deletion mutation generates an in-frame stop codon in the DMD gene. As is well understood, the presence of an in-frame stop codon in a gene and/or in the transcript transcribed from a gene typically stops the translation of a protein from the transcript. Typically, stop codon is TAG, TAA, or TGA in DNA, and UAG, UAA, or UGA in RNA.

In some embodiments, the specific cleavage of the DMD gene or the insertion and/or deletion mutation generates a 3n+1 frameshift mutation, a 3n+2 frameshift mutation, a 3n-1 frameshift mutation, or a 3n-2 frameshift mutation in the DMD gene, wherein n is 0 or a positive integer (e.g., 1, 2, 3) .

In some embodiments, the frameshift mutation decreases or eliminates transcription of the DMD gene and/or translation of a transcript (e.g., mRNA) of the DMD gene. In some embodiments, the frameshift mutation generates an in-frame stop codon in the DMD gene.

In some embodiments, the frameshift mutation leads to expression of a mutant of dystrophin. In some embodiments, the mutant is functional. In some embodiments, the mutant is non-functional.

In some embodiments, the frameshift mutation leads to expression of a functional mutant of dystrophin. As used herein, the term “functional mutant” of a protein refers to a mutant of the protein that substantially has one or more functions (e.g., all known functions) of the protein, for example, a mutant having at least about 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%of the function of the protein or having about 100%function of the protein.

In some embodiments, the frameshift mutation leads to expression of a non-functional mutant of dystrophin. As used herein, the term “non-functional mutant” of a protein refers to a mutant of the protein that substantially lacks one or more functions (e.g., all known functions) of the protein, for example, a mutant having at most about 35%, 30%, 20%, 15%, 10%, 5%, or 1%of the function of the protein or having about 0%function of the protein.

In some embodiments, the frameshift mutation leads to expression of a non-pathogenic mutant of dystrophin. As used herein, the term “non-pathogenic mutant” of a protein refers to a mutant of the protein that is not pathogenic or has not been reported to cause any disease.

In some embodiments, the frameshift mutation does not lead to expression of a pathogenic mutant of dystrophin. As used herein, the term “pathogenic mutant” of a protein refers to a mutant of the protein that is pathogenic or has been reported to cause any disease.

In some embodiments, the system of disclosure is combined with a supplement to deliver the wild type expression product of the DMD gene. For example, in some embodiments, a cDNA encoding the wild type expression product of DMD gene is delivered in combination with the system of disclosure delivered in rAAV particles in the same or separate rAAV particles.

DNA base editing

Also provided in the disclosure are tools and methods using DNA base editing to modify the expression of DMD gene.

In some embodiments, the napDNAbp is a nickase and further comprises a base editing domain. As used herein, the term “nickase” means that it substantially lacks dsDNA endonuclease activity (i.e., it is substantially incapable of cleaving both strands of a dsDNA) and is substantially capable of nicking one strand of a dsDNA. In some embodiments, the one strand is the target strand of the dsDNA. In some embodiments, the one strand is the sense strand or the antisense strand of the dsDNA.

In some embodiments, the napDNAbp is an endonuclease deficient napDNAbp and further comprises a base editing domain. As used herein, the term “endonuclease deficient napDNAbp” is used interchangeably with “endonuclease inactive napDNAbp” or “dead napDNAbp” , meaning that it substantially lacks dsDNA endonuclease activity (i.e., it is substantially incapable of cleaving double strands of a dsDNA) , and it substantially lacks nickase activity (i.e., it is substantially incapable of nicking either strand of a dsDNA) .

In some embodiments, guiding the complex to the DMD gene enables the napDNAbp that is a nickase or dead napDNAbp and further comprises the base editing domain to specifically base edit the DMD gene in a guide sequence-specific manner.

As used herein, the term “base edit (ing) ” means editing the base of a nucleotide (e.g., a ribonucleotide, a deoxyribonucleotide) , including but not limited to inserting (adding) , deleting (excising) , and substituting the base.

In some embodiments, the base editing of the DMD gene substitutes a nucleotide of the DMD gene to a different nucleotide. In some embodiments, the base editing of the DMD gene substitutes one nucleotide of the DMD gene to one different nucleotide, which is also known as single base editing. In some embodiments, the base editing of the DMD gene substitutes two or three or more nucleotides, whether consecutive or not, of the DMD gene to two or three or more different nucleotides, which is also known as multiple or multiplexed base editing.

As well known in the art and shown in the tables below, codon includes 61 non-stop codons encoding 22 amino acids and three stop codons (i.e., TAA, TAG, TGA) . Each codon is composed of three nucleotides. A nucleotide change in a non-stop codon may convert the non-stop codon to another non-stop codon encoding the same or different amino acid, or to a stop codon. Anucleotide change in a stop codon may convert the stop codon or another stop codon, or to a non-stop codon. Unless otherwise indicated, the codon in the disclosure, e.g., a non-stop codon, a stop codon, a start codon, is an in-frame codon.

From a view of process, in some embodiments, the base editing substitutes A in a codon of the DMD gene with T, C, or G. In some embodiments, base editing substitutes T in a codon of the DMD gene with A, C, or G. In some embodiments, base editing substitutes C in a codon of the DMD gene with A, T, or G. In some embodiments, base editing substitutes G in a codon of the DMD gene with A, T, or C.

From a view of outcome, in some embodiments, the base editing substitutes a stop codon of the DMD gene with a non-stop codon. In some embodiments, base editing substitutes a non-stop codon of the DMD gene with a different non-stop codon. In some embodiments, base editing substitutes a non-stop codon of the DMD gene with a stop codon.

The DMD gene may contain a pathogenic mutation, which may be a pathogenic codon substitution where a wild type codon of the DMD gene is substituted with a pathogenic codon. In some embodiments, the wild type codon is a stop codon, and the pathogenic codon is a non-stop codon. The pathogenic codon substitution of a wild type stop codon to a non-stop codon may lead to the expression of a mutant of the expression product of the DMD gene containing additional C-terminal amino acids, which mutant may be pathogenic (e.g., non-functional) . In some embodiments, the wild type codon is a non-stop codon, and the pathogenic codon is a stop codon or a non-stop codon different from the wild type non-stop codon. The pathogenic codon substitution of a wild type non-stop codon with a stop codon may leads to prematurely terminated (partial) translation of the transcript of the DMD gene, which may generate a N-terminal truncation of the expression product of the DMD gene or generate no expression product. The N-terminal truncation of the expression product may be pathogenic (e.g., non-functional) and may be stable or unstable (e.g., quickly degraded by cellular enzyme) . The pathogenic codon substitution of a wild type non-stop codon with a non-stop codon different from the wild type non-stop codon may lead to the expression of a mutant of the expression product of the DMD gene, which mutant may be pathogenic (e.g., non-functional) , and which mutant may be stable or unstable (e.g., quickly degraded by cellular enzyme) .

In some embodiments, the base editing of the DMD gene decreases or eliminates transcription of the DMD gene and/or translation of a transcript (e.g., mRNA) of the DMD gene. For example, the base editing of the DMD gene eliminates full expression of the DMD gene by substituting a non-stop codon in the DMD gene with a stop codon, leaving no translation or prematurely terminated (partial) translation of the transcript of the DMD gene.

In some embodiments, the base editing of the DMD gene generates an in-frame stop codon in the DMD gene.

In some embodiments, the base editing of the DMD gene leads to expression of a functional expression product of the DMD gene, e.g., a functional mutant of dystrophin.

In some embodiments, the base editing of the DMD gene leads to expression of a mutant of dystrophin, for example, by substituting a non-wild type stop codon with a non-wild type, non-stop codon or substituting a non-wild type, non-stop codon with a different non-wild type, non-stop codon. In some embodiments, the mutant of dystrophin is functional. In some embodiments, the mutant of dystrophin is non-functional.

In some embodiments, the base editing of the DMD gene leads to expression of a non-functional mutant of dystrophin, for example, by substituting a non-wild type stop codon with a non-wild type, non-stop codon or substituting a non-wild type, non-stop codon with a different non-wild type, non-stop codon, and the resulting expression product of the base edited DMD gene is a non-functional mutant of dystrophin.

In some embodiments, the base editing of the DMD gene leads to expression of wild type dystrophin, for example, by substituting a non-wild type stop codon or a non-wild type, non-stop codon with a wild type non-stop codon, and the resulting expression product of the base edited DMD gene is wild type dystrophin.

In some embodiments, the base editing of the DMD gene leads to expression of a functional mutant of dystrophin, for example, by substituting a non-wild type stop codon with a non-wild type, non-stop codon or substituting a non-wild type, non-stop codon with a different non-wild type, non-stop codon, and the resulting expression product of the base edited DMD gene is a functional mutant of dystrophin.

In some embodiments, the base editing of the DMD gene leads to expression of a non-pathogenic mutant of dystrophin, for example, by substituting a non-wild type stop codon with a non-wild type, non-stop codon or substituting a non-wild type, non-stop codon with a different non-wild type, non-stop codon, and the resulting expression product of the base edited DMD gene is non-pathogenic.

In some embodiments, the base editing of the DMD gene does not lead to expression of a pathogenic mutant of dystrophin.

DNA prime editing

Also provided in the disclosure are tools and methods using DNA prime editing to modify the expression of DMD gene.

In some embodiments, the napDNAbp is a nickase and further comprises a reverse transcriptase domain, and the guide nucleic acid is a prime editing guide nucleic acid (e.g., pegRNA) . As used herein, the term “nickase” means that it substantially lacks dsDNA endonuclease activity (i.e., it is substantially incapable of cleaving both strands of a dsDNA) and is substantially capable of nicking one strand of a dsDNA. In some embodiments, the one strand is the nontarget strand of the dsDNA. In some embodiments, the one strand is the sense strand or the antisense strand of the dsDNA.

In some embodiments, guiding the complex to the DMD gene enables the napDNAbp that is a nickase and further comprises a reverse transcriptase domain to specifically prime edit the DMD gene in a guide sequence-specific manner.

A general introduction of the prime editing technique may be referred to “Prime Editing: Adding Precision and Flexibility to CRISPR Editing” (blog dot addgene dot org slash prime-editing-crisp-cas-reverse-transcriptase) .

Prime editing may allow insertion (addition) , deletion, or substitution of one or more nucleotides of the DMD gene.

In some embodiments, the prime editing of the DMD gene decreases or eliminates transcription of the DMD gene and/or translation of a transcript (e.g., mRNA) of the DMD gene.

In some embodiments, the prime editing of the DMD gene leads to expression of a functional expression product of the DMD gene, e.g., a functional mutant of dystrophin.

In some embodiments, the prime editing of the DMD gene generates an in-frame stop codon in the DMD gene, for example, by inserting a stop codon, by deleting one or more nucleotides to generate a frameshift mutation that further generates a stop codon, or substituting a non-stop codon with a stop codon.

In some embodiments, the prime editing of the DMD gene leads to expression of a mutant of dystrophin, for example, by substituting a non-wild type stop codon with a non-wild type, non-stop codon or substituting a non-wild type, non-stop codon with a different non-wild type, non-stop codon. In some embodiments, the mutant of dystrophin is functional. In some embodiments, the mutant of dystrophin is non-functional.

In some embodiments, the prime editing of the DMD gene leads to expression of a non-functional mutant of dystrophin, for example, by substituting a non-wild type stop codon with a non-wild type, non-stop codon or substituting a non-wild type, non-stop codon with a different non-wild type, non-stop codon, and the resulting expression product of the prime edited DMD gene is a non-functional mutant of dystrophin.

In some embodiments, the prime editing of the DMD gene leads to expression of wild type dystrophin, for example, by substituting a non-wild type stop codon or a non-wild type, non-stop codon with a wild type non-stop codon, and the resulting expression product of the prime edited DMD gene is wild type dystrophin.

In some embodiments, the prime editing of the DMD gene leads to expression of a functional mutant of dystrophin, for example, by substituting a non-wild type stop codon with a non-wild type, non-stop codon or substituting a non-wild type, non-stop codon with a different non-wild type, non-stop codon, and the resulting expression product of the prime edited DMD gene is a functional mutant of dystrophin.

In some embodiments, the prime editing of the DMD gene leads to expression of a non-pathogenic mutant of dystrophin, for example, by substituting a non-wild type stop codon with a non-wild type, non-stop codon or substituting a non-wild type, non-stop codon with a different non-wild type, non-stop codon, and the resulting expression product of the prime edited DMD gene is non-pathogenic.

In some embodiments, the prime editing of the DMD gene does not lead to expression of a pathogenic mutant of dystrophin.

DNA epigenomic modification

In some embodiments, the napDNAbp is an endonuclease deficient napDNAbp and further comprises epigenomic modification domain, e.g., methylation domain, demethylation domain. As used herein, the term “endonuclease deficient napDNAbp” is used interchangeably with “endonuclease inactive napDNAbp” or “dead napDNAbp” , meaning that it substantially lacks dsDNA endonuclease activity (i.e., it is substantially incapable of cleaving double strands of a dsDNA) , and it substantially lacks nickase activity (i.e., it is substantially incapable of nicking either strand of a dsDNA) .

In some embodiments, guiding the complex to the DMD gene enables the napDNAbp that is an endonuclease deficient napDNAbp and further comprises epigenomic modification domain, e.g., methylation domain, demethylation domain, to specifically epigenetically modify the DMD gene in a guide sequence-specific manner.

In some embodiments, the epigenomic modification of the DMD gene decreases or eliminates transcription of the DMD gene and/or translation of a transcript (e.g., mRNA) of the DMD gene.

In some embodiments, the epigenomic modification of the DMD gene increases transcription of the DMD gene and/or translation of a transcript (e.g., mRNA) of the DMD gene.

RNA modification

Provided in the disclosure are multiple ways to modify the expression of DMD gene on RNA level. In some embodiments, the transcript of the DMD gene is a DMD pre-mRNA or a DMD mRNA. In some embodiments, the transcript comprises a mutation. In some embodiments, the transcript is a transcript of human DMD gene comprising a mutation. In some embodiments, the DMD gene comprises a nonsense mutation. In some embodiments, the transcript comprises a nonsense mutation. In some embodiments, the napBP is a nucleic acid programmable RNA binding protein (napRNAbp) .

RNA cleavage

Provided in the disclosure are tools and methods using RNA cleavage to modify the expression of DMD gene.

In some embodiments, the napRNAbp is a nucleic acid programmable RNA endonuclease (napRNAn) .

In some embodiments, guiding the complex to the transcript of the DMD gene enables the napRNAn to specifically cleave the transcript of the DMD gene in a guide sequence-specific manner. Typically, the cleavage of a transcript by an RNA endonuclease leads to degradation of the transcript.

RNA base editing

Also provided in the disclosure are tools and methods using RNA base editing to modify the expression of DMD gene.

In some embodiments, the napRNAbp is an endonuclease deficient napRNAbp and further comprises a base editing domain. As used herein, the term “endonuclease deficient napRNAbp” is used interchangeably with “endonuclease inactive napRNAbp” or “dead napRNAbp” , meaning that it substantially lacks RNA endonuclease activity (i.e., it is substantially incapable of cleaving an RNA) .

In some embodiments, guiding the complex to the transcript of the DMD gene enables the napRNAbp that is an endonuclease deficient napRNAbp and further comprises the base editing domain to specifically base edit the transcript in a guide sequence-specific manner.

In some embodiments, the base editing of the transcript substitutes a nucleotide of the transcript to a different nucleotide. In some embodiments, the base editing of the transcript substitutes one nucleotide of the transcript to one different nucleotide, which is also known as single base editing. In some embodiments, the base editing of the transcript substitutes two or three or more nucleotides, whether consecutive or not, of the transcript to two or three or more different nucleotides, which is also known as multiple or multiplexed base editing.

From a view of process, in some embodiments, the base editing substitutes A in a codon of the transcript with U, C, or G; substitutes U in a codon of the transcript with A, C, or G; substitutes C in a codon of the transcript with A, U, or G; or substitutes G in a codon of the transcript with A, U, or C.

From a view of outcome, in some embodiments, the base editing substitutes a stop codon of the transcript with a non-stop codon; substitutes a non-stop codon of the transcript with a different non-stop codon; or substitutes a non-stop codon of the transcript with a stop codon.

The transcript may contain a pathogenic mutation, which may be a pathogenic codon substitution where a wild type codon of the transcript is substituted with a pathogenic codon. In some embodiments, the wild type codon is a stop codon, and the pathogenic codon is a non-stop codon. The pathogenic codon substitution of a wild type stop codon to a non-stop codon may lead to the translation of a mutant of the expression product of the DMD gene containing additional C-terminal amino acids, which mutant may be pathogenic (e.g., non-functional) . In some embodiments, the wild type codon is a non-stop codon, and the pathogenic codon is a stop codon or a non-stop codon different from the wild type non-stop codon. The pathogenic codon substitution of a wild type non-stop codon with a stop codon may leads to prematurely terminated (partial) translation of the transcript, which may generate a N-terminal truncation of the expression product of the DMD gene or generate no expression product. The N-terminal truncation of the expression product may be pathogenic (e.g., non-functional) and may be stable or unstable (e.g., quickly degraded by cellular enzyme) . The pathogenic codon substitution of a wild type non-stop codon with a non-stop codon different from the wild type non-stop codon may lead to the translation of a mutant of the expression product of the DMD gene, which mutant may be pathogenic (e.g., non-functional) , and which mutant may be stable or unstable (e.g., quickly degraded by cellular enzyme) .

In some embodiments, the base editing of the transcript decreases or eliminates translation of the transcript. For example, the base editing of the transcript eliminates full translation of the transcript by substituting a non-stop codon in the transcript with a stop codon, leaving no translation or prematurely terminated (partial) translation of the transcript.

In some embodiments, the base editing of the transcript generates an in-frame stop codon in the transcript.

In some embodiments, the base editing of the transcript leads to translation of a functional expression product of the transcript, e.g., a functional mutant of dystrophin.

In some embodiments, the base editing of the transcript leads to expression of a mutant of dystrophin, for example, by substituting a non-wild type stop codon with a non-wild type, non-stop codon or substituting a non-wild type, non-stop codon with a different non-wild type, non-stop codon. In some embodiments, the mutant of dystrophin is functional. In some embodiments, the mutant of dystrophin is non-functional.

In some embodiments, the base editing of the transcript leads to expression of a non-functional mutant of dystrophin, for example, by substituting a non-wild type stop codon with a non-wild type, non-stop codon or substituting a non-wild type, non-stop codon with a different non-wild type, non-stop codon, and the resulting expression product of the transcript is a non-functional mutant of dystrophin.

In some embodiments, the base editing of the transcript leads to expression of wild type dystrophin, for example, by substituting a non-wild type stop codon or a non-wild type, non-stop codon with a wild type non-stop codon, and the resulting expression product of the transcript is wild type dystrophin.

In some embodiments, the base editing of the transcript leads to expression of a functional mutant of dystrophin, for example, by substituting a non-wild type stop codon with a non-wild type, non-stop codon or substituting a non-wild type, non-stop codon with a different non-wild type, non-stop codon, and the resulting expression product of the transcript is a functional mutant of dystrophin.

In some embodiments, the base editing of the transcript leads to expression of a non-pathogenic mutant of dystrophin, for example, by substituting a non-wild type stop codon with a non-wild type, non-stop codon or substituting a non-wild type, non-stop codon with a different non-wild type, non-stop codon, and the resulting expression product of the transcript is non-pathogenic.

In some embodiments, the base editing of the transcript gene does not lead to expression of a pathogenic mutant of dystrophin.

napBP

For the purpose of the disclosure, the napBP is capable of forming a complex with the guide nucleic acid of the disclosure by complexing with the scaffold sequence of the guide nucleic acid and is thereby guided to the DMD gene or transcript thereof via the hybridization of the guide sequence of the guide nucleic acid to the target sequence of the DMD gene or transcript. When the napBP is guided to the DMD gene or transcript, the activity of the napBP (or the functional domain associated with (e.g., bound to) the napBP) functions to modify the DMD gene or transcript thereof, leading to modification of expression of the DMD gene. Generally, any such napBP can be used with the guide nucleic acid of the disclosure, and when a napBP is selected, the scaffold sequence compatible to the napBP for complexing with the napBP can also be selected accordingly. The scaffold sequence is generally conserved.

In some embodiments, the napBP (e.g., napDNAbp) is capable of recognizing a protospacer adjacent motif (PAM) on the nontarget strand of the DMD gene, wherein the PAM is immediately 5’ or 3’ to a protospacer sequence on the nontarget strand of the DMD gene, and wherein the protospacer sequence is fully reversely complementary to the target sequence.

In some embodiments, the PAM comprises sequence 5’-NN-3’, 5’-NNN-3’, 5’-NNNN-3’, 5’-NNNNN-3’, or 5’-NNNNNN-3’, wherein N is A, T, G, or C.

Non-limiting examples of the napBP include CRISPR-associated (Cas) protein, IscB, TAL nuclease, meganuclease, and zinc-finger nuclease. Non-limiting examples of CRISPR-associated (Cas) protein include Cas9 (e.g., dCas9 and nCas9) , Cas12a/Cpf1, Cas12b/C2c1, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12f/Cas14, Cas12g, Cas12h, Cas12i, and Cas12k. Non-limiting examples of Cas protein include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5d, Cas5t, Cas5h, Cas5a, Cas6, Cas7, Cas8, Cas8a, Cas8b, Cas8c, Cas9 (also known as Csn1 or Csx12) , Cas10, Cas10d, Cas12a/Cpf1, Cas12b/C2c1, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12f/Cas14, Cas12g, Cas12h, Cas12i, Cas12k, Csy1, Csy2, Csy3, Csy4, Cse1, Cse2, Cse3, Cse4, Cse5e, Csc1, Csc2, Csa5, Csn1, Csn2, Csm1, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx1S, Csx11, Csf1, Csf2, CsO, Csf4, Csd1, Csd2, Cst1, Cst2, Csh1, Csh2, Csa1, Csa2, Csa3, Csa4, Csa5, Type II Cas effector proteins, Type V Cas effector proteins, Type VI Cas effector proteins, CARF, DinG, IscB, homologues thereof, or modified or engineered versions thereof. Other napDNAbp are also within the scope of this disclosure, e.g., IscB, IsrB, although they may not be specifically listed in this disclosure. See, e.g., Makarova et al. “Classification and Nomenclature of CRISPR-Cas Systems: Where from Here? ” CRISPR J. 2018 October; 1: 325-336. doi: 10.1089/crispr. 2018.0033; Yan et al., “Functionally diverse type V CRISPR-Cas systems” Science. 2019 Jan. 4; 363 (6422) : 88-91. doi: 10.1126/science. aav7271, the entire contents of each are hereby incorporated by reference.

In some embodiments, the Cas protein is an endonuclease, a nickase, or a dead Cas.

In some embodiments, the PAM comprises sequence 5’-NTN-3’, wherein N is A, T, G, or C, and wherein the PAM is immediately 5’ to the protospacer sequence. In some embodiments, the PAM comprises sequence 5’-TTN-3’, wherein N is A, T, G, or C. For example, in some embodiments, the napBP is a Class 2, Type V CRISPR-associated protein (Cas12) . In some embodiments, the Cas12 is Cas12a (Cpf1) , Cas12b (C2c1) , Cas12c (C2c3) , Cas12d (CasY) , Cas12e (CasX) , Cas12f (Cas14) , Cas12i, or Cas12k (C2c10, C2C7) , e.g., Cas12i1, Cas12i1, Cas12i3, Cas12i4, xCas12i (SiCas12i) , Cas12Max, hfCas12Max, or a mutant thereof.

In some embodiments, the PAM comprises sequence 5’-NGG-3’, wherein N is A, T, G, or C, and wherein the PAM is immediately 3’ to the protospacer sequence. For example, in some embodiments, the napBP is a Class 2, Type II CRISPR-associated protein (Cas9) , e.g., SaCas9, SpCas9, or a mutant thereof.

In some embodiments, the PAM comprises sequence 5’-NNNGAN-3’, wherein N is A, T, G, or C, and wherein the PAM is immediately 3’ to the protospacer sequence. For example, in some embodiments, the napBP is a IscB protein, e.g., OgeuIscB or a mutant thereof.

In some cases, the recognizing ability of the napBP to a target nucleic acid may not be limited to any specific PAM, which means that the napBP can recognize any PAM, such that the PAM is not a substantial restriction on the selection of a protospacer sequence or a target sequence. Such a napBP is called “PAMless” , for example, a PAMless SpCas9 mutant. In some embodiments, the napBP is PAMless.

In some embodiments, the napBP is a Class 2, Type VI CRISPR-associated protein (Cas13) . Cas13 can be targeted to RNA by a guide nucleic acid. Cas13 is particularly useful since there is no PAM restriction for eukaryotic transcripts when Cas13 is used as the napBP. In some embodiments, the Cas13 is Cas13a (C2c2) , Cas13b (such as, Cas13b1, Cas13b2) , Cas13c, Cas13d, Cas13e (Cas13X) , Cas13f (Cas13Y) , or a mutant thereof.

Typically, a Cas protein (e.g., Cas9, Cas12, Cas13) can associate with a CRISPR RNA (crRNA) comprising a spacer sequence that guides the Cas protein to a specific sequence that is reversely complementary and capable of hybridizing to the spacer sequence of the crRNA. The crRNA also comprises a scaffold sequence capable of complexing with the Cas protein.

In some embodiments, the napBP comprises a sequence of SEQ ID NO: 57.

In some embodiments, the sequence encoding the napBP comprises a sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of SEQ ID NO: 63 or a 5’ end truncation thereof without the first 5’ ATG codon. In some embodiments, the napBP encoded by the sequence retains at least 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) of the guide sequence-specific DNA endonuclease activity of the sequence of SEQ ID NO: 57.

In some embodiments, the sequence encoding the napBP comprises a sequence of SEQ ID NO: 63.

Base editing

In some embodiments, the napBP further comprises a base editing domain.

In some embodiments, the base editing domain is capable of substituting a base of a nucleotide with a different base.

In some embodiments, the base editing domain is capable of deaminating a base of a nucleotide.

In some embodiments, the base editing domain comprises a deaminase domain capable of deaminating a base (e.g., an adenine, a guanine, a cytosine, a thymine, an uracil) of a nucleotide. In some embodiments, the deaminase domain is capable of deaminating an adenine (A) to a hypoxanthine (I) . In some embodiments, the deamination of the adenine to the hypoxanthine converts the adenosine (A) or deoxyadenosine (dA) containing the adenine to a guanosine (G) or deoxyguanosine (dG) . In some embodiments, the deaminase domain is capable of deaminating a cytosine (C) to an uracil (U) . In some embodiments, the deamination of the cytosine to the uracil converts the cytidine (C) or deoxycytidine (dC) containing the cytosine to a uridine (U) or a deoxythymidine (dT) .

In some embodiments, the base editing domain is capable of excising a base (e.g., an adenine, a guanine, a cytosine, a thymine, an uracil) of a nucleotide.

In some embodiments, the base editing domain comprises a base excising domain capable of excising a base of a nucleotide.

In some embodiments, the base editing domain comprises a deaminase domain and a base excising domain.

In some embodiments, the deaminase domain is a double-stranded RNA-specific adenosine deaminase (ADAR) , such as, ADAR1, ADAR2 (ADARB1) , ADAR3 (ADARB2) , or the deaminase domain thereof, or a mutant thereof.

In some embodiments, the ADAR is human ADAR or ADAR of non-human species. In some embodiments, the ADAR is octopus ADAR.

In some embodiments, the deaminase domain is human ADAR2 deaminase domain (hADAR2DD) .

In some embodiments, the deaminase domain is a mutant of hADAR2DD, e.g., hADAR2DD-E488Q, hADAR2DD-E488Q+T375G, hADAR2DD-E488Q+V351G+S486A+T375S+S370C+P462A+N597I+L332I+I398V+K350I+M383L+D619G+S582 T+V440I+S495N+K418E+S661T, hADAR2DD-E488Q+V351G+S486A+T375A+S370C+P462A+N597I+L332I+I398V+K350I+M383L+D619G+S582 T+V440I+S495N+K418E+S661T.

In some embodiments, the deaminase domain is apolipoprotein B mRNA editing enzyme, catalytic polypeptide (APOBEC) , such as, APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3E, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4, or the deaminase domain thereof, or a mutant thereof.

In some embodiments, the deaminase domain is activation-induced cytidine deaminase (AID, AICDA, single-stranded DNA cytosine deaminase) , or the deaminase domain thereof, or a mutant thereof.

In some embodiments, the deaminase domain is tRNA adenosine deaminase (TadA) , or the deaminase domain thereof, or a mutant thereof.

Epigenomic modification

In some embodiments, the napBP further comprises a transcription inhibiting domain (e.g., KRAB domain or SID domain) . In some embodiments, the napBP further comprises a KRAB domain.

In some embodiments, the napBP further comprises a DNA methyltransferase, such as, DNMT3l, DNMT3a.

In some embodiments, the napBP further comprises a DNMT3l domain and a DNMT3a domain.

In some embodiments, the napBP further comprises a DNMT3l domain, a DNMT3a domain, and a KRAB domain.

In some embodiments, the napBP, the KRAB domain, the DNMT3l domain, and the DNMT3a domain are arranged from N-terminal to C-terminal.

Design of protospacer sequence/target sequence; Target site

For the purpose of the disclosure, in some embodiments, the protospacer sequence or target sequence is located such that the DMD gene or transcript thereof can be specifically modified by the napBP or a functional domain associated with the napBP.

To facilitate the evaluation of selected protospacer sequences or target sequence and designed guide sequences in mouse models, in some embodiments, the protospacer sequence or target sequence is located such that a mouse DMD gene or transcript thereof can be specifically modified by the napBP or a functional domain associated with the napBP. In some embodiments, the protospacer sequence or target sequence is located such that both a human DMD gene or transcript thereof and a mouse DMD gene or transcript thereof can be specifically modified by the napBP or a functional domain associated with the napBP. That is, the protospacer sequence or target sequence is selected to be cross-reactive to both human and mouse species.

In some embodiments, the protospacer sequence is a stretch of contiguous nucleotides identified from the nontarget strand of the DMD gene by identifying the stretch of contiguous nucleotides immediately 5’ or 3’, and optionally, 3’, to the PAM on the nontarget strand.

In some embodiments, the protospacer sequence is a stretch of about, at least about, or at most about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more contiguous nucleotides on the nontarget strand of the DMD gene, or a stretch of contiguous nucleotides on the nontarget strand of the DMD gene in a numerical range between any two of the preceding values, e.g., a stretch of fromabout 16 to about 50 contiguous nucleotides. In some embodiments, the protospacer sequence is a stretch of about 20, 30, or 50 contiguous nucleotides on the nontarget strand of the DMD gene.

In some embodiments, the protospacer sequence on the nontarget strand of the DMD gene corresponding to the target sequence is immediately 3’ to PAM of 5’-NTN-3’, wherein N is A, T, G, or C.

In some embodiments, the protospacer sequence on the nontarget strand of the DMD gene corresponding to the target sequence is immediately 5’ to PAM of 5’-NGG-3’, wherein N is A, T, G, or C.

In some embodiments, the protospacer sequence on the nontarget strand of the DMD gene corresponding to the target sequence is immediately 5’ to PAM of 5’-NNNGAN-3’, wherein N is A, T, G, or C.

In some embodiments, the protospacer sequence is a stretch of about 20, 30, or 50 contiguous nucleotides of the nontarget strand of the DMD gene immediately 3’ to PAM of 5’-NTN-3’, wherein N is A, T, G, or C.

In some embodiments, the protospacer sequence is a stretch of about 20, 30, or 50 contiguous nucleotides of the nontarget strand of the DMD gene immediately 5’ to PAM of 5’-NGG-3’, wherein N is A, T, G, or C.

In some embodiments, the protospacer sequence is a stretch of about 20, 30, or 50 contiguous nucleotides of the nontarget strand of the DMD gene immediately 5’ to PAM of 5’-NNNGAN-3’, wherein N is A, T, G, or C.

In some embodiments, the target sequence is a stretch of contiguous nucleotides identified from the target strand of the DMD gene or from the transcript thereof.

In some embodiments, the target sequence is a stretch of about, at least about, or at most about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more contiguous nucleotides on the target strand of the DMD gene or on the transcript thereof, or a stretch of contiguous nucleotides on the target strand of the DMD gene or on the transcript thereof in a numerical range between any two of the preceding values, e.g., a stretch of from about 16 to about 50 contiguous nucleotides. In some embodiments, the target sequence is a stretch of about 20, 30, or 50 contiguous nucleotides on the target strand of the DMD gene or on the transcript thereof.

In some embodiments, the nontarget strand is the sense strand of the DMD gene.

In some embodiments, the nontarget strand is the antisense strand of the DMD gene.

In some embodiments, the target strand is the sense strand of the DMD gene.

In some embodiments, the target strand is the antisense strand of the DMD gene.

In some embodiments, the protospacer sequence on the nontarget strand of the DMD gene corresponding to the target sequence or target sequence on the target strand of the DMD gene is located at or within an exon of the DMD gene or transcript thereof, or at or within a splice donor or a splice acceptor of the exon.

In some embodiments, the exon is selected from the group consisting of Exon 43, Exon 44, Exon 45, Exon 46, Exon 51, Exon 53, Exon 55.

In some embodiments, the DMD gene is human DMD gene, non-human primate DMD gene, or mouse DMD gene.

In some embodiments, the DMD gene is in a eukaryotic cell, for example, a human cell, a non-human primate cell, or a mouse cell. In some embodiments, the DMD gene is in a muscle cell, such as, a myocardial cell, a diaphragm muscle, a tibialis anterior muscle cell.

Design of guide sequence according to protospacer/target sequence

In some embodiments, the guide sequence is in a length of about, at least about, or at most about 14 nucleotides, e.g., about, at least about, or at most about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more nucleotides, or in a length of nucleotides in a numerical range between any two of the preceding values, e.g., in a length of from about 16 to about 50 nucleotides. In some embodiments, the guide sequence is in a length of about 20, 30, or 50 nucleotides.

In some embodiments, (1) the guide sequence is at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100% (fully) , optionally about 100% (fully) , reversely complementary to the target sequence; (2) the guide sequence contains no more than 5, 4, 3, 2, or 1 mismatch or contains no mismatch with the target sequence; or (3) the guide sequence comprises no mismatch with the target sequence in the first 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 nucleotides at the 5’ end of the guide sequence when the PAM is immediately 5’ to the protospacer sequence or at the 3’ end of the guide sequence when the PAM is immediately 3’ to the protospacer sequence. In some embodiments, the guide sequence is about 100% (fully) , reversely complementary to the target sequence.

In some embodiments, the guide sequence contains 1 mismatch with the target sequence. In some embodiments, the guide sequence is about98%reversely complementary to the target sequence. In some embodiments, the 1 mismatch in the guide sequence is at a position corresponding the nucleotide of the target sequence that is intended to be substituted.

Selection of protospacer/target/guide sequence; Effect of system

In some embodiments, the protospacer sequence, the target sequence, or the guide sequence is selected according to the editing efficiency of the napDNAn guided to the DMD gene or transcript by the guide sequence. The editing efficiency can be represented by indels %and measured by sequencing of the edited DMD gene. The higher the editing efficiency is, the more suitable the guide sequence is.

In some embodiments, the protospacer sequence, the target sequence, or the guide sequence is selected according to the level of transcription of the DMD gene and/or level of translation of a transcript (e.g., mRNA) of the DMD gene (level of expression of the DMD gene) in in vivo animal model. The higher the level is, the more suitable the guide sequence is.

In some embodiments, the level of the transcript (e.g., mRNA) of the DMD gene is increased in a cell model (e.g., HEK293T cell model) or an animal model (e.g., a mouse model, a non-human primate model) by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more, upon administration of the systemof the disclosure to the cell model or the animal model, compared to the level of the transcript (e.g., mRNA) of the DMD gene in the same cell model or animal model that does not receive the administration.

In some embodiments, the level of the expression product (e.g., dystrophin) of the DMD gene is increased in a cell model (e.g., HEK293T cell model) or an animal model (e.g., a mouse model, a non-human primate model) by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more, upon administration of the system of the disclosure to the cell or the animal model, compared to the level of the expression product (e.g., dystrophin) of the DMD gene in the same cell model or animal model that does not receive the administration. In some embodiments, the expression product is a functional mutant of the expression product (e.g., dystrophin) of the DMD gene.

Overall structure of guide nucleic acid

In some embodiments, the guide nucleic acid comprises a scaffold sequence 5’ to a guide sequence. In some embodiments, the guide nucleic acid comprises a scaffold sequence 3’ to a guide sequence.

In some embodiments, the guide nucleic acid comprises one scaffold sequence and one guide sequence.

In some embodiments, the guide nucleic acid comprises one scaffold sequence 5’ to one guide sequence. In some embodiments, the guide nucleic acid comprises one scaffold sequence 3’ to one guide sequence.

In some embodiments, the guide nucleic acid comprises one or more scaffold sequence and/or one or more guide sequence, provided that the guide nucleic acid does not comprise one scaffold sequence and one guide sequence.

In some embodiments, the guide nucleic acid comprises, from 5’ to 3’, one scaffold sequence, one guide sequence, and one scaffold sequence, wherein scaffold sequences are the same or different.

In some embodiments, the guide nucleic acid comprises, from 5’ to 3’, one guide sequence, one scaffold sequence, and one guide sequence, wherein guide sequences are the same or different.

In some embodiments, the guide nucleic acid comprises, from 5’ to 3’, one scaffold sequence, one guide sequence, one scaffold sequence, and one guide sequence, wherein scaffold sequences are the same or different, and wherein guide sequences are the same or different.

In some embodiments, the guide nucleic acid comprises, from 5’ to 3’, one guide sequence, one scaffold sequence, one guide sequence, and one scaffold sequence, wherein scaffold sequences are the same or different, and wherein guide sequences are the same or different.

In some embodiments, the guide nucleic acid comprises, from 5’ to 3’, one scaffold sequence, one guide sequence, one scaffold sequence, one guide sequence, and one scaffold sequence, wherein scaffold sequences are the same or different, and wherein guide sequences are the same or different.

In some embodiments, the guide nucleic acid comprises, from 5’ to 3’, one guide sequence, one scaffold sequence, one guide sequence, one scaffold sequence, and one guide sequence, wherein scaffold sequences are the same or different, and wherein guide sequences are the same or different.

In some embodiments, the guide nucleic acid comprises, from 5’ to 3’, one scaffold sequence, one guide sequence, one scaffold sequence, one guide sequence, one scaffold sequence, and one guide sequence, wherein scaffold sequences are the same or different, and wherein guide sequences are the same or different.

In some embodiments, the guide nucleic acid comprises, from 5’ to 3’, one guide sequence, one scaffold sequence, one guide sequence, one scaffold sequence, one guide sequence, and one scaffold sequence, wherein scaffold sequences are the same or different, and wherein guide sequences are the same or different.

In some embodiments, the guide nucleic acid comprises a linker or no linker between any adjacent scaffold sequence and guide sequence. In some embodiments, the guide nucleic acid comprises no linker between any adjacent scaffold sequence and guide sequence.

Multiple guide nucleic acid

The systemor rAAV vector genome of the disclosure may comprise or encode one guide nucleic acid or comprise or encode multiple (e.g., 2, 3, 4, or more) guide nucleic acids, e.g., for the purpose of improving the editing efficiency of the system.

In some embodiments, the system further comprises one or more additional guide nucleic acids, or the first polynucleotide sequence further comprises one or more additional sequences encoding one or more additional guide nucleic acids, each of the additional guide nucleic acids comprising:

(1) an additional scaffold sequence capable of forming a complex with the napBP, and

(2) an additional guide sequence capable of hybridizing to an additional target sequence on a target strand of the DMD gene or an additional target sequence on the transcript thereof, thereby guiding the complex to the DMD gene or the transcript.

In some embodiments, the additional protospacer sequence is on the same strand as the protospacer sequence.

In some embodiments, the additional protospacer sequence is on the different strand from the protospacer sequence.

In some embodiments, the additional protospacer sequence is the same or different from the protospacer sequence.

In some embodiments, the additional target sequence is the same or different from the target sequence.

In some embodiments, the additional guide sequence is the same or different from the guide sequence.

In some embodiments, the additional scaffold sequence is the same or different from the scaffold sequence. In some embodiments wherein the system comprises the same napBP and multiple guide nucleic acids, the scaffold sequences of the multiple guide nucleic acids may be the same or different (e.g., different by no more than 5, 4, 3, 2, or 1 nucleotide) to be compatible to the same napBP. In some embodiments wherein that the systemcomprises different napBP (e.g., a Cas12i and a Cas9) and multiple guide nucleic acids, the scaffold sequences of the multiple guide nucleic acids may be different to be compatible to the different napBP, respectively.

In some embodiments, the additional guide nucleic acid and the guide nucleic acid are operably linked to or under the regulation of the same regulatory element (e.g., promoter) or separate regulatory elements (e.g., separate promoters) .

Nature and modification of guide nucleic acid

In some embodiments, the guide nucleic acid (e.g., the guide nucleic acid, the additional guide nucleic acid) is an RNA, i.e., a guide RNA (gRNA) . In some embodiments, the guide nucleic acid is an unmodified guide RNA. In some embodiments, the guide nucleic acid is a modified guide RNA. In some embodiments, the guide nucleic acid comprises a modification. In some embodiments, the guide nucleic acid is a modified RNA containing a modified ribonucleotide. In some embodiments, the guide nucleic acid is a modified RNA containing a deoxyribonucleotide. In some embodiments, the guide nucleic acid is a modified RNA containing a modified deoxyribonucleotide. In some embodiments, the guide nucleic acid comprises a modified or unmodified deoxyribonucleotide and a modified or unmodified ribonucleotide.

Specific target and guide sequences

In some embodiments, the guide sequence or the additional guide sequence comprises (1) a sequence of any one of SEQ ID NOs: 1-51 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to any one of SEQ ID NOs: 1-51 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to any one of SEQ ID NOs: 1-51.

In some embodiments, the guide sequence or the additional guide sequence comprises a sequence of SEQ ID NO: 36.

Specific scaffold sequence

For the purpose of the disclosure, the scaffold sequence is compatible with the napBP of the disclosure and is capable of complexing with the napBP. The scaffold sequence may be a naturally occurring scaffold sequence identified along with the napBP, or a variant thereof maintaining the ability to complex with the napBP. Generally, the ability to complex with the napBP is maintained as long as the secondary structure of the variant is substantially identical to the secondary structure of the naturally occurring scaffold sequence. A nucleotide deletion, insertion, or substitution in the primary sequence of the scaffold sequence may not necessarily change the secondary structure of the scaffold sequence (e.g., the relative locations and/or sizes of the stems, bulges, and loops of the scaffold sequence do not significantly deviate from that of the original stems, bulges, and loops) . For example, the nucleotide deletion, insertion, or substitution may be in a bulge or loop region of the scaffold sequence so that the overall symmetry of the bulge and hence the secondary structure remains largely the same. The nucleotide deletion, insertion, or substitution may also be in the stems of the scaffold sequence so that the lengths of the stems do not significantly deviate from that of the original stems (e.g., adding or deleting one base pair in each of two stems correspond to 4 total base changes) .

In some embodiments, the scaffold sequence or the additional scaffold sequence has substantially the same secondary structure as the secondary structure of the sequence of SEQ ID NO: 52 or 53.

In some embodiments, the scaffold sequence or the additional scaffold sequence comprises (1) a sequence of SEQ ID NO: 52 or 53 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: 52 or 53 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: 52 or 53.

In some embodiments, the scaffold sequence or the additional scaffold sequence comprises the sequence of SEQ ID NO:53.

Specific guide nucleic acid sequence

In some embodiments, the guide nucleic acid comprises a sequence of any one of SEQ ID NOs: 71-74.

Regulation of guide nucleic acid

Also provided in the disclosure is a polynucleotide comprising or encoding the guide nucleic acid.

In some embodiments, the polynucleotide comprising or encoding the guide nucleic acid is a DNA, a RNA, or a DNA/RNA mixture. By “DNA/RNA mixture” it refers to a nucleic acid comprising both one or more modified or unmodified ribonucleotides and one or more modified or unmodified deoxyribonucleotides, whether consecutive or not. However, by “DNA” or “RNA” it may also refer to a DNA containing one or more modified or unmodified ribonucleotides, whether consecutive or not, or an RNA containing one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.

In some embodiments, the guide nucleic acid is operably linked to or under the regulation of a promoter.

In some embodiments, the first polynucleotide sequence comprises a promoter operably linked to the sequence encoding the guide nucleic acid.

In some embodiments, the promoter is a ubiquitous, tissue-specific, cell-type specific, constitutive, or inducible promoter.

Suitable promoters are known in the art and include, for example, a Cbh promoter, a Cba promoter, a pol I promoter, a pol II promoter, a pol III promoter, a T7 promoter, a U6 promoter, a H1 promoter, a retroviral Rous sarcoma virus LTR promoter, a cytomegalovirus (CMV) promoter, a SV40 promoter, a dihydrofolate reductase promoter, a β-actin promoter, an elongation factor 1α short (EFS) promoter, a β glucuronidase (GUSB) promoter, a cytomegalovirus (CMV) immediate-early (Ie) enhancer and/or promoter, a chicken β-actin (CBA) promoter or derivative thereof such as a CAG promoter, CB promoter, a (human) elongation factor 1α-subunit (EF1α) promoter, a ubiquitin C (UBC) promoter, a prion promoter, a neuron-specific enolase (NSE) , a neurofilament light (NFL) promoter, a neurofilament heavy (NFH) promoter, a platelet-derived growth factor (PDGF) promoter, a platelet-derived growth factor B-chain (PDGF-β) promoter, a synapsin (Syn) promoter, a synapsin 1 (Syn1) promoter, a methyl-CpG binding protein 2 (MeCP2) promoter, a Ca2+/calmodulin-dependent protein kinase II (CaMKII) promoter, a metabotropic glutamate receptor 2 (mGluR2) promoter, a neurofilament light (NFL) promoter, a neurofilament heavy (NFH) promoter, a β-globin minigene nβ2 promoter, a preproenkephalin (PPE) promoter, an enkephalin (Enk) promoter, an excitatory amino acid transporter 2 (EAAT2) promoter, a glial fibrillary acidic protein (GFAP) promoter, and a myelin basic protein (MBP) promoter.

In some embodiments, the promoter is a U6 promoter.

In some embodiments, the promoter comprises a sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of SEQ ID NO: 59; or a sequence having at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotide differences, whether consecutive or not, compared to the sequence of SEQ ID NO: 59.

In some embodiments, the first polynucleotide sequence comprises, from 5’ to 3’, a promoter of SEQ ID NO: 59, and a first sequence encoding the guide nucleic acid of any one of SEQ ID NOs: 71-74.

Regulation of napBP

In some embodiments, the polynucleotide encoding the napBP is a DNA, a RNA, or a DNA/RNA mixture. By “DNA/RNA mixture” it refers to a nucleic acid comprising both one or more modified or unmodified ribonucleotides and one or more modified or unmodified deoxyribonucleotides, whether consecutive or not. However, by “DNA” or “RNA” it may also refer to a DNA containing one or more modified or unmodified ribonucleotides, whether consecutive or not, or an RNA containing one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.

In some embodiments, the polynucleotide encoding the napBP is operably linked to or under the regulation of a promoter.

In some embodiments, the second polynucleotide sequence comprises a promoter operably linked to the sequence encoding the napBP.

Suitable promoters are known in the art and include, for example, a Cbh promoter, a Cba promoter, a pol I promoter, a pol II promoter, a pol III promoter, a T7 promoter, a U6 promoter, a H1 promoter, a retroviral Rous sarcoma virus LTR promoter, a cytomegalovirus (CMV) promoter, a SV40 promoter, a dihydrofolate reductase promoter, a β-actin promoter, an elongation factor 1α short (EFS) promoter, a β glucuronidase (GUSB) promoter, a cytomegalovirus (CMV) immediate-early (Ie) enhancer and/or promoter, a chicken β-actin (CBA) promoter or derivative thereof such as a CAG promoter, CB promoter, a (human) elongation factor 1α-subunit (EF1α) promoter, a ubiquitin C (UBC) promoter, a prion promoter, a neuron-specific enolase (NSE) , a neurofilament light (NFL) promoter, a neurofilament heavy (NFH) promoter, a platelet-derived growth factor (PDGF) promoter, a platelet-derived growth factor B-chain (PDGF-β) promoter, a synapsin (Syn) promoter, a human synapsin (hSyn) promoter, a synapsin 1 (Syn1) promoter, a methyl-CpG binding protein 2 (MeCP2) promoter, a Ca2+/calmodulin-dependent protein kinase II (CaMKII) promoter, a metabotropic glutamate receptor 2 (mGluR2) promoter, a neurofilament light (NFL) promoter, a neurofilament heavy (NFH) promoter, a β-globin minigene nβ2 promoter, a preproenkephalin (PPE) promoter, an enkephalin (Enk) promoter, an excitatory amino acid transporter 2 (EAAT2) promoter, a glial fibrillary acidic protein (GFAP) promoter, a myelin basic protein (MBP) promoter, a DMD promoter, a GRK1 promoter, a CRX promoter, a NRL promoter, a MECP2 promoter, a mMECP2 promoter, a hMECP2 promoter, an APP promoter, and a RCVRN promoter.

In some embodiments, the promoter is a muscle specific promoter.

In some embodiments, the promoter is a EFS promoter.

In some embodiments, the second polynucleotide sequence comprises a Kozak sequence (gccacc) 5’ to the sequence encoding the napBP.

In some embodiments, the second polynucleotide sequence comprises a sequence encoding a nuclear localization signal (NLS) 5’ a nd/or 3’ to the sequence encoding the napBP.

In some embodiments, the second polynucleotide sequence comprises a sequence encoding a nuclear export signal (NES) 5’ a nd/or 3’ to the sequence encoding the napBP.

In some embodiments, the second polynucleotide sequence comprises a sequence encoding a first NLS 5’ to the sequence encoding the napBP and a second sequence encoding a second NLS 3’ to the sequence encoding the napBP.

In some embodiments, the NLS, the first NLS, and/or the second NLS is a SV40 NLS, a bpSV40 NLS (SEQ ID NO: 62) , or a Nucleoplasmin NLS (npNLS) (SEQ ID NO: 65) .

In some embodiments, the napBP comprises a bpSV40 NLS at the N-terminal of the napBP and a npNLS at the C-terminal of the napBP.

In some embodiments, the second polynucleotide sequence comprises a WPRE sequence downstream of the sequence encoding the napBP.

In some embodiments, the WPRE sequence is selected from the group consisting of Woodchuck Hepatitis Virus (WHP) Posttranscriptional Regulatory Element (WPRE) , WPRE3 (ashortened WPRE) , and a functional variant (e.g., a functional truncation) thereof.

In some embodiments, the second polynucleotide sequence comprises a sequence encoding a polyadenylation (polyA) signal downstream of the sequence encoding the napBP.

In some embodiments, the second polynucleotide sequence comprises, downstream of the sequence encoding the napBP, a WPRE sequence followed by a sequence encoding a polyadenylation (polyA) signal.

In some embodiments, the polyAsignal is selected from a group consisting of a bovine growth hormone polyadenylation (bGH polyA) signal, a small polyA (SPA) signal, a human growth hormone polyadenylation (hGH polyA) signal, a SV40 polyA (SV40 polyA) signal, a rabbit beta globin polyA (rBG polyA) signal, a combination of SV40 late polyadenylation signal upstream element and SV40 late polyadenylation signal, and a functional variant (e.g., a functional truncation) thereof.

In some embodiments, the polyA signal is a combination of SV40 late polyadenylation signal upstream element and SV40 late polyadenylation signal.

In some embodiments, the second polynucleotide sequence comprises, from 5’ to 3’, the promoter, the Kozak sequence, the first sequence encoding the first NLS, the sequence encoding the napBP, the second sequence encoding the second NLS, the WPRE sequence, and the sequence encoding the polyA signal.

In some embodiments, the promoter comprises a sequence encoding a polypeptide having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of SEQ ID NO: 60; or a sequence having at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotide differences, whether consecutive or not, compared to the sequence of SEQ ID NO: 60.

In some embodiments, the Kozak sequence is gccacc.

In some embodiments, the NLS, the first NLS, and/or the second NLS comprises a sequence encoding a polypeptide having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of SEQ ID NO: 62 or 65; or a sequence having at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acid differences, whether consecutive or not, compared to the sequence of SEQ ID NO: 62 or 65.

In some embodiments, the sequence encoding the NLS, the first sequence encoding the first NLS, and/or the second sequence encoding the second NLS comprises a sequence encoding a polypeptide having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of SEQ ID NO: 61 or 64; or a sequence having at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotide differences, whether consecutive or not, compared to the sequence of SEQ ID NO: 61 or 64.

In some embodiments, the WPRE sequence comprises a sequence encoding a polypeptide having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of SEQ ID NO: 66; or a sequence having at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotide differences, whether consecutive or not, compared to the sequence of SEQ ID NO: 66.

In some embodiments, the sequence encoding the polyA signal comprises a sequence encoding a polypeptide having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of SEQ ID NO: 67; or a sequence having at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotide differences, whether consecutive or not, compared to the sequence of SEQ ID NO: 67.

In some embodiments, the second polynucleotide sequence comprises, from 5’ to 3’, the promoter of SEQ ID NO: 60, the Kozak sequence of gccacc, an in-frame start codon ATG, the first sequence encoding the first NLS of SEQ ID NO: 62, the sequence encoding the napBP of SEQ ID NO: 57, the second sequence encoding the second NLS of SEQ ID NO: 65, an in-frame stop codon, the WPRE sequence of SEQ ID NO: 66, and the sequence encoding the polyA signal of SEQ ID NO: 67.

In some embodiments, the second polynucleotide sequence comprises, from 5’ to 3’, the promoter of SEQ ID NO: 60, the Kozak sequence of gccacc, an in-frame start codon ATG, the first sequence of SEQ ID NO: 61 encoding the first NLS, the sequence of SEQ ID NO: 63 encoding the napBP, the second sequence of SEQ ID NO: 64 encoding the second NLS, an in-frame stop codon, the WPRE sequence of SEQ ID NO: 66, and the sequence encoding the polyA signal of SEQ ID NO: 67.

Full length rAAV vector genome

In some embodiments, the rAAV vector genome comprises a 5’ inverted terminal repeat (ITR) sequence and a 3’ ITR sequence.

In some embodiments, the 5’ ITR sequence and the 3’ ITR sequence are both wild-type ITR sequences from AAV1, AAV2, AAV3A, AAV3B, AAV4, AAV5, AAV6, AAV7, AAVrh74, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV-DJ, AAV PHP. eB, or a member of the Clade to which any of the AAV1-AAV13 belong, or a functional variant (e.g., a functional truncation) thereof.

In some embodiments, the 5’ ITR sequence comprises a sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of SEQ ID NO: 58.

In some embodiments, the 3’ ITR sequence comprises a sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of SEQ ID NO: 68.

In some embodiments, the rAAV vector genome comprises, from 5’ to 3’, the first polynucleotide sequence, and the second polynucleotide sequence.

In some embodiments, the rAAV vector genome comprises, from 5’ to 3’, the second polynucleotide sequence, and the first polynucleotide sequence.

In some embodiments, the rAAV vector genome comprises, from the 5’ to 3’,

(1) the 5’ ITR;

(2) the promoter;

(3) the scaffold sequence;

(4) the guide sequence;

(5) optionally, the scaffold sequence;

(6) optionally, the guide sequence;

(7) optionally, the scaffold sequence;

(8) optionally, the guide sequence;

(9) the promoter;

(10) optionally, the Kozak sequence of gccacc;

(11) the in-frame start codon ATG;

(12) optionally, the first sequence encoding the first NLS;

(13) the sequence encoding the napBP;

(14) optionally, the second sequence encoding the second NLS;

(15) an in-frame stop codon,

(16) optionally, the WPRE sequence;

(17) the sequence encoding a polyA signal; and

(18) the 3’ ITR.

In some embodiments, the rAAV vector genome comprises a sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the sequence of SEQ ID NO: 69.

In some embodiments, the rAAV vector genome comprises, from the 5’ to 3’,

(1) the 5’ ITR of SEQ ID NO: 58;

(2) the promoter of SEQ ID NO: 59;

(3) the scaffold sequence of SEQ ID NO: 53;

(4) the guide sequence of SEQ ID NO: 36;

(5) the scaffold sequence of SEQ ID NO: 53;

(6) the guide sequence of SEQ ID NO: 36;

(7) the scaffold sequence of SEQ ID NO: 53;

(8) the guide sequence of SEQ ID NO: 36;

(9) the promoter of SEQ ID NO: 60;

(10) the Kozak sequence of gccacc;

(11) the in-frame start codon ATG;

(12) the first sequence encoding the first NLS of SEQ ID NO: 62;

(13) the sequence encoding the napBP of SEQ ID NO: 57;

(14) the second sequence encoding the second NLS of SEQ ID NO: 65;

(15) an in-frame stop codon,

(16) the WPRE sequence of SEQ ID NO: 66;

(17) the sequence encoding a polyA signal of SEQ ID NO: 67; and

(18) the 3’ ITR;

wherein the rAAV vector genome comprises a sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the sequence of SEQ ID NO: 69;

In some embodiments, the rAAV vector genome comprises the sequence of SEQ ID NO: 69.

In some embodiments, the rAAV vector genome of the disclosure is a DNA rAAV vector genome or an RNA rAAV vector genome. By “DNA rAAV vector genome” it means that the rAAV vector genome is a DNA that can be encapsulated into a rAAV particle. By “RNA rAAV vector genome” it means that the rAAV vector genome is an RNA that can be encapsulated into a rAAV particle.

rAAV particle

In yet another aspect, the disclosure provides a recombinant AAV (rAAV) particle comprising the rAAV vector genome of the disclosure. A simple introduction of AAV for delivery may refer to “Adeno-associated Virus (AAV) Guide” (addgene. org/guides/aav/) .

Adeno-associated virus (AAV) , when engineered to delivery, e.g., a protein-encoding sequence of interest, may be termed as a (r) AAV vector, a (r) AAV vector particle, or a (r) AAV particle, where “r” stands for “recombinant” . And the genome packaged in AAV vectors for delivery may be termed as a (r) AAV vector genome, vector genome, or vg for short, while viral genome may refer to the original viral genome of natural AAVs.

The serotypes of the capsids of rAAV particles can be matched to the types of target cells. For example, Table 2 of WO2018002719A1 lists exemplary cell types that can be transduced by the indicated AAV serotypes (incorporated herein by reference) .

In some embodiments, the rAAV particle comprising a capsid with a serotype suitable for delivery into muscle cells. In some embodiments, the rAAV particle comprising a capsid with a serotype of AAV1, AAV2, AAV3A, AAV3B, AAV4, AAV5, AAV6, AAV7, AAVrh74, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV-DJ, or AAV. PHP. eB, a member of the Clade to which any of the AAV1-AAV13 belong, or a functional variant (e.g., a functional truncation) thereof, encapsidating the rAAV vector genome. In some embodiments, the serotype of the capsid is wild type AAV9 or a functional variant thereof.

General principles of rAAV particle production are known in the art. In some embodiments, rAAV particles may be produced using the triple transfection method (described in detail in U.S. Pat. No. 6,001,650) .

The vector titers are usually expressed as vector genomes per ml (vg/ml) . In some embodiments, the vector titer is above 1×10⁹, above 5×10¹⁰, above 1×10¹¹, above 5×10¹¹, above 1×10¹², above 5×10¹², or above 1×10¹³ vg/ml.

Instead of packaging a single strand (ss) DNA sequence as a vector genome of a rAAV particle, systems and methods of packaging an RNA sequence as a vector genome into a rAAV particle is recently developed and applicable herein. See PCT/CN2022/075366, which is incorporated herein by reference in its entirety.

When the vector genome is RNA as in, for example, PCT/CN2022/075366, for simplicity of description and claiming, sequence elements described herein for DNA vector genomes, when present in RNA vector genomes, should generally be considered to be applicable for the RNA vector genomes except that the deoxyribonucleotides in the DNA sequence are the corresponding ribonucleotides in the RNA sequence (e.g., dT is equivalent to U, and dA is equivalent to A) and/or the element in the DNA sequence is replaced with the corresponding element with a corresponding function in the RNA sequence or omitted because its function is unnecessary in the RNA sequence and/or an additional element necessary for the RNA vector genome is introduced.

As used herein, a coding sequence, e.g., as a sequence element of rAAV vector genomes herein, is construed, understood, and considered as covering and covers both a DNA coding sequence and an RNA coding sequence. When it is a DNA coding sequence, an RNA sequence can be transcribed from the DNA coding sequence, and optionally further a protein can be translated from the transcribed RNA sequence as necessary. When it is an RNA coding sequence, the RNA coding sequence per se can be a functional RNA sequence for use, or an RNA sequence can be produced from the RNA coding sequence, e.g., by RNA processing, or a protein can be translated from the RNA coding sequence.

For example, a Cas13 coding sequence encoding a Cas13 polypeptide covers either a Cas13 DNA coding sequence from which a Cas13 polypeptide is expressed (indirectly via transcription and translation) or a Cas13 RNA coding sequence from which a Cas13 polypeptide is translated (directly) .

For example, a gRNA coding sequence encoding a gRNA covers either a gRNA DNA coding sequence from which a gRNA is transcribed or a gRNA RNA coding sequence (1) which per se is the functional gRNA for use, or (2) from which a gRNA is produced, e.g., by RNA processing.

In some embodiments for rAAV RNA vector genomes, 5’-ITR and/or 3’-ITR as DNA packaging signals may be unnecessary and can be omitted at least partly, while RNA packaging signals can be introduced.

In some embodiments for rAAV RNA vector genomes, a promoter to drive transcription of DNA sequences may be unnecessary and can be omitted at least partly.

In some embodiments for rAAV RNA vector genomes, a sequence encoding a polyAsignal may be unnecessary and can be omitted at least partly, while a polyAtail can be introduced.

Similarly, other DNA elements of rAAV DNA vector genomes can be either omitted or replaced with corresponding RNA elements and/or additional RNA elements can be introduced, in order to adapt to the strategy of delivering an RNA vector genome by rAAV particles.

In yet another aspect, the disclosure provides a method for production of the rAAV particle of the disclosure, comprising culturing in a host cell a transgene plasmid comprising the rAAV vector genome of the disclosure. In yet another aspect, the disclosure provides a cell comprising a transgene plasmid comprising the rAAV vector genome of the disclosure.

Pharmaceutical composition

In yet another aspect, the disclosure provides a pharmaceutical composition comprising (1) the system of the disclosure or the rAAV particle of the disclosure and (2) a pharmaceutically acceptable excipient.

In some embodiments, the pharmaceutical composition comprises the rAAV particle in a concentration selected from the group consisting of about 1×10¹⁰ vg/mL, 2×10¹⁰ vg/mL, 3×10¹⁰ vg/mL, 4×10¹⁰ vg/mL, 5×10¹⁰ vg/mL, 6×10¹⁰ vg/mL, 7×10¹⁰ vg/mL, 8×10¹⁰ vg/mL, 9×10¹⁰ vg/mL, 1×10¹¹ vg/mL, 2×10¹¹ vg/mL, 3×10¹¹ vg/mL, 4×10¹¹ vg/mL, 5×10¹¹ vg/mL, 6×10¹¹ vg/mL, 7×10¹¹ vg/mL, 8×10¹¹ vg/mL, 9×10¹¹ vg/mL, 1×10¹² vg/mL, 2×10¹² vg/mL, 3×10¹² vg/mL, 4×10¹² vg/mL, 5×10¹² vg/mL, 6×10¹² vg/mL, 7×10¹² vg/mL, 8×10¹² vg/mL, 9×10¹² vg/mL, 1×10¹³ vg/mL, or in a concentration of a numerical range between any of two preceding values, e.g., in a concentration of fromabout 9×10¹⁰ vg/mL to about 8×10¹¹ vg/mL.

In some embodiments, the pharmaceutical composition is an injection formulation, such as i. v. injection formulation.

In some embodiments, the volume of the injection is selected from the group consisting of about 1 microliter, 10 microliters, 50 microliters, 100 microliters, 150 microliters, 200 microliters, 250 microliters, 300 microliters, 350 microliters, 400 microliters, 450 microliters, 500 microliters, 550 microliters, 600 microliters, 650 microliters, 700 microliters, 750 microliters, 800 microliters, 850 microliters, 900 microliters, 950 microliters, 1000 microliters, and a volume of a numerical range between any of two preceding values, e.g., in a concentration of from about 10 microliters to about 750 microliters.

Cells

In yet another aspect, the disclosure provides a cell or a progeny thereof comprising the guide nucleic acid of the disclosure, the system of the disclosure, or the rAAV particle of the disclosure. In some embodiments, the cell is a eukaryote. In some embodiments, the cell is a human cell. In some embodiments, the cell is a human muscle cell.

In yet another aspect, the disclosure provides a cell or a progeny thereof comprising DMD gene or transcript thereof modified by the system of the disclosure, the rAAV particle of the disclosure, or the method of the disclosure. In some embodiments, the cell is a eukaryote. In some embodiments, the cell is a human cell.

In some embodiments, the cell is a human muscle cell.

In some embodiments, the cell is not within the body of an organism, such as, human or animal. In some embodiments, the cell is not a human embryonic stem cell. In some embodiments, the cell is not a human germ cell.

Treatment method

In yet another aspect, the disclosure provides a method for preventing, diagnosing, and/or treating a DMD associated disease in a subject in need thereof, comprising administering to the subject the system of the disclosure, the rAAV particle of the disclosure, or the pharmaceutical composition of the disclosure, wherein the napBP modifies DMD gene or transcript thereof, and wherein the modification of the DMD gene or transcript thereof treats the disease.

In some embodiments, the DMD associated disease is Duchenne muscular dystrophy.

In some embodiments, DMD gene is in a eukaryotic cell, for example, a human cell, a non-human primate cell, or a mouse cell, such as, a human muscle cell.

In some embodiments, the administrating comprises local administration or systemic administration.

In some embodiments, the administrating comprises intrathecal administration, intramuscular administration, intravenous administration, transdermal administration, intranasal administration, oral administration, mucosal administration, intraperitoneal administration, intracranial administration, intracerebroventricular administration, or stereotaxic administration.

In some embodiments, the administration is injection or infusion.

In some embodiments, the subject is a human, a non-human primate, or a mouse.

In some embodiments, the level of the transcript (e.g., mRNA) of the DMD gene is increased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the transcript (e.g., mRNA) of the DMD gene in the subject prior to the administration.

In some embodiments, the level of the expression product (e.g., dystrophin) of the DMD gene is increased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the expression product (e.g., dystrophin) of the DMD gene in the subject prior to the administration. In some embodiments, the expression product is a functional mutant of the expression product of the DMD gene.

In some embodiments, the median survival of the subject suffering from the disease but receiving the administration is 5 days, 10 days, 20 days, 30 days, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, 12 months, 1.5 year, 2 years, 2.5 years, 3 years, 4 years, 5 years, 6 years, 7 years, 8 years, 9 years, 10 years or more longer than that of a subject or a population of subjects suffering from the disease and not receiving the administration.

The dose of the rAAV particle for treatment of the DMD associated diseases may be either via a single dose, or multiple doses. One skilled in the art understands that the actual dose may vary greatly depending upon a variety of factors, such as the vector choices, the target cells, organisms, tissues, the general conditions of the subject to be treated, the degrees of transformation/modification sought, the administration routes, the administration modes, the types of transformation/modification sought, etc.

In some embodiments, the rAAV particle is administrated in a therapeutically effective dose. For example, the therapeutically effective dose of the rAAV particle may be about 1.0E+8 (1.0 × 10⁸) , 2.0E+8, 3.0E+8, 4.0E+8, 6.0E+8, 8.0E+8, 1.0E+9, 2.0E+9, 3.0E+9, 4.0E+9, 6.0E+9, 8.0E+9, 1.0E+10, 2.0E+10, 3.0E+10, 4.0E+10, 6.0E+10, 8.0E+10, 1.0E+11, 2.0E+11, 3.0E+11, 4.0E+11, 6.0E+11, 8.0E+11, 1.0E+12, 2.0E+12, 3.0E+12, 4.0E+12, 6.0E+12, 8.0E+12, 1.0E+13, 2.0E+13, 3.0E+13, 4.0E+13, 6.0E+13, 8.0E+13, 1.0E+14, 2.0E+14, 3.0E+14, 4.0E+14, 6.0E+14, 8.0E+14, 1.0E+15, 2.0E+15, 3.0E+15, 4.0E+15, 6.0E+15, 8.0E+15, 1.0E+16, 2.0E+16, 3.0E+16, 4.0E+16, 6.0E+16, 8.0E+16, or 1.0E+17 vg, or within a range of any two of the those point values. vg stands for vector genomes of rAAV particles for administration.

Kit

In yet another aspect, the disclosure provides a kit comprising the system of the disclosure, the rAAV particle of the disclosure, or the pharmaceutical composition of the disclosure, or any one, two, or all components of the same.

In some embodiments, the kit further comprises an instruction to use the component (s) contained therein, and/or instructions for combining with additional component (s) that may be available or necessary elsewhere.

In some embodiments, the kit further comprises one or more buffers that may be used to dissolve any of the component (s) contained therein, and/or to provide suitable reaction conditions for one or more of the component (s) . Such buffers may include one or more of PBS, HEPES, Tris, MOPS, Na₂CO₃, NaHCO₃, NaB, or combinations thereof. In some embodiments, the reaction condition includes a proper pH, such as a basic pH. In some embodiments, the pH is between 7-10.

In some embodiments, any one or more of the kit components may be stored in a suitable container or at a suitable temperature, e.g., 4 degree Celsius.

Further embodiments are illustrated in the following Examples which are given for illustrative purposes only and are not intended to limit the scope of the disclosure.

EXAMPLES

The following examples are provided to further illustrate some embodiments of the disclosure but are not intended to limit the scope of the invention; it will be understood by their exemplary nature that other procedures, methodologies, or techniques known to those skilled in the art may alternatively be used.

Methods and Materials

Study Approval

All applicable institutional and/or national guidelines for the care and use of animals were followed. All experimental procedures on the mice were approved by the institute committee of Animal Care and Use (IACUC) of HuidaGene Therapeutics. All mice were housed in a constant temperature (24-26℃) and humidity (40-60%) room with a 12-hour light-dark cycle, and fed with the standard food.

Cell culture and transfection

The HEK293T cells from American Type Culture Collection (ATCC) were cultured in Dulbecco’s modified Eagle’s medium (DMEM) (#11965092, Gibco) supplemented with 10%fetal bovine serum (#04-001-1ACS, Biological Industries) and 1%penicillin/streptomycin (#15140122, Thermo Fisher Scientific) . For gRNA screening, HEK293T cells were seeded on 12-well culture plates (#3513, Corning) at the same amount. The expressing plasmid was then transfected using the polyethylenimine (PEI) reagent (#101000029, Polyplus) .

Establishment and maintenance of human DMD iPSCs

Human DMD fibroblasts were reprogrammed into DMD iPSCs using the CytoTune-iPS Sendai Reprogramming Kit (#A16517, Thermo Fisher Scientific) . Human DMD iPSCs were plated in cell-culture dishes coated with matrigel (#354277, Corning) and grew in the ncTarget medium (#RP01020, Nuwacell Biotechnologies) at 37 ℃, 5%CO₂. The iPSCs were passaged at 80%confluence using hPSC Dissociation Buffer (#RP01007, Nuwacell Biotechnologies) .

Human iPSC nucleofection and sorting

One hour before nucleofection, human DMD iPSCs were treated with 10 mM ROCK inhibitor Y-27632 (#10005583, Cayman) . The iPSCs were dissociated into single cells with Accutase (#7920, STEMCELL Technologies) . After 3 × 10⁶ cells were mixed with 6 μg plasmids in nucleofection buffer (#RP01005, Nuwacell Biotechnologies) , the nucleofection process was performed with Lonza 2B Nucleofector, employing program B016.48 hours later, fluorescence-positive cells were sorted out by BD FACSAria^TM III Sorter. For gRNA screening, 5 × 10³ cells were collected, and their lysis was amplified with different primer sets (Table S2) . For obtaining single iPSC clone, the cells were immediately seeded on matrigel-coated 100-mm culture dish (#430167, Corning) and maintained in the ncTarget medium with 10 mM Y-27632. After seven days, single cell colony was picked and transferred to the 12-well culture plate. After being subjected to genome sequencing, the desired cells were expanded in the ncTarget medium.

AAV administration

The AAV9 particles were produced by PackGene Biotech (Guangzhou, China) . In brief, the pHelper, pRepCap, and transgene (GOI) plasmids at the ratio 2: 1: 1 was co-transfected into host cells when the confluency was between 70–90%. After 72 hours, the iodixanol density gradient centrifugation was used to purify AAV9 particles. For intramuscular injection, mice were anesthetized and their TA muscles were injected with AAV9 particles at the dose of 2.5 × 10¹¹ vg/leg/AAV or with the same volume saline solution. For intraperitoneal infusion, the P3 mice were administrated with AAV9 particles at the dose of 2.5 × 10¹³ vg/kg or with saline solution. For tail vein injection, 2-weeks old mice were administrated with AAV9 particles at the dose of 5 × 10¹³ vg/kg or with saline solution. Mouse HE (heart) , DI (diaphragm) , and TA (tibialis anterior) muscles were isolated at indicated time points and then cut into small pieces for further experiments.

Genomic DNA extraction and deep sequencing

Human cells or mouse tissues were digested in the lysis buffer containing proteinase K, and their genomic DNA was extracted following the manufacturer’s instructions. The DNA was amplified with Phanta max super-fidelity DNA polymerase (#P505-d1, Vazyme) and specific primer sets before Sanger or deep sequencing. For the construction of deep-sequencing products, Illumina flow cell binding sequences and barcodes were added to the 5′and 3’ ends of primer sequences. The DNA products were purified with Gel extraction kit (Omega) and analyzed by 150-bp paired-end reads Illumina NovaSeq 6000 platform (Genewiz Co. Ltd. ) . The deep sequencing data were first de-multiplexed by Cutadapt (v. 2.8) based on sample barcodes. The de-multiplexed reads were then processed by CRISPResso2 for the quantification of editing efficiency.

RNA isolation, cDNA synthesis, and sequencing

Total mRNAs were isolated from mouse tissues by TRIzol reagent (#15596-018, Ambion) according to the manufacturer’s instructions. The mRNAs were reverse transcribed into cDNAs with HiScript II One Step RT-PCR Kit (#P611-01, Vazyme) . The cDNAs were amplified using Phanta max super-fidelity DNA polymerase and performed with Sanger and deep sequencing to analyze editing efficiency.

Western blotting

Mouse tissues were crushed into powder and homogenized in RIPA lysis buffer (#P0013B, Beyotime) containing protease inhibitor cocktail. Protein concentrations were determined with BCA Protein Assay Kit (#23225, Thermo Fisher Scientific) and then adjusted to an identical concentration. 10 μg total proteins were loaded into each lane of the 3%-8%tris-acetate gels (#EA03752BOX, Invitrogen) and electrophoresed for 1 hour. The proteins were transferred onto the PVDF membranes under a wet condition. Later, the membranes were blocked in 5%non-fat milk/TBST buffer and then incubated with primary antibodies overnight at 4℃. Primary antibodies were listed as below: dystrophin (#D8168, Sigma) and vinculin (#13901S, Cell Signaling Technology) . Following three-time washes, the membranes were probed with HRP-conjugated secondary antibodies at RT for 1 hour. After the incubation with Chemiluminescent substrates (#WP20005, Invitrogen) , the membranes were viewed by the Image Lab^TM Software 5.2.

Immunohistochemical analyses

For immunofluorescence staining, mouse tissues were embedded in the O. C. T. compound and then snap-frozen in liquid nitrogen-cooled isopentane. They were cut into 10 μm sections with a microtome and transferred onto the slides. The sections were fixed in 4%paraformaldehyde for 2 hours and permeabilized with 0.4% Triton-X/PBS for 30 mins at RT. After being blocked in 10%goat serum/PBS, the sections were probed with primary antibody against dystrophin (#ab15277, Abcam) at 4℃ overnight. Following three washes, the sections were stained with secondary antibodies and DAPI for 3 hours at RT. After the wash in PBS, the coverslips were sealed with permanent synthetic mounting media. All pictures were observed and captured under an inverted Olympus FV3000 microscope.

Rotarod and rodent treadmill test

The mice were trained daily for a week prior to the experiment. Three mice were put simultaneously on the rotarod (Ugo Basile Inc. ) with an accelerating speed from 4 to 40 rpm over 30 seconds. When the mice fell off and onto the lever, the test was stopped and the time was recorded. Each mouse was tested five times and the average value of these five times was used for further comparison. The mice were trained daily for a week prior to the experiment. Three mice were put simultaneously on the rodent treadmill (Shanghai TOW Intelligent Technology Co., Ltd, AT-5MR) with an accelerating speed from 0 to 15 m/sover 30 seconds and recorded the running time before first falling.

Forelimb grip strength test and rotarod test

Muscle strength was assessed at various time points. Briefly, mice were removed from the cage, weighed and held from the tip of the tail, causing the forelimbs to grasp the pull-bar assembly connected to the 47200-grip strength meter. The mouse was drawn along a straight line leading away from the sensor until the mouse could no longer grasp the gridiron and the peak amount of force in grams was recorded. The assessment was repeated 7-10 times with 10-second intervals between.

Serum CK analysis

To measure CK levels, a blood sample was collected in an Eppendorf tube via cardiac puncture under ketamine anesthesia prior to euthanasia. Samples were centrifuged at 3,000 × g for 10 min and then the serum was collected. CK activity was measured with creatine kinase (CK10) reagent (Pointe Scientific, 23-666-208) according to the manufacturer’s instructions.

Statistical analysis

Quantitative data were derived from at least three independent experiments. The statistical significance of group differences was calculated by the unpaired Student's t-test between two groups among multiple groups. Differences in means were considered statistically significant when they reached P < 0.05. Significance levels are: *P < 0.05. **P < 0.01. ***P < 0.001.

Example 1. Screening of human DMD gene-targeting guide sequences by in vitro evaluation

This Example demonstrates the screening of human DMD gene-targeting guide sequences by in vitro evaluation of editing efficiency of a Cas12 endonuclease directed by such guide sequences to human DMD gene.

Plasmid Construction:

An expression plasmid was constructed for the in vitro screening. The expression plasmid comprised, from 5’ to 3’, a DMD-targeting gRNA expression cassette, a Cas12 expression cassette, and a mCherry expression cassette. The DMD-targeting gRNA expression cassette comprised U6 promoter and a sequence encoding a DMD-targeting gRNA (one of SEQ ID NOs: 75-125; see Table 1) under the regulation of the U6 promoter. The Cas12 expression cassette comprised, from 5’ to 3’, CBApromoter, Kozak sequence (gccacc) , start codon ATG, a sequence encoding 3xFlag tag, a sequence encoding SV40 NLS, a sequence encoding a Cas12 endonuclease (SEQ ID NO: 57; also named as “hfCas12Max” ) under the regulation of the CBA promoter, a sequence encoding npNLS (SEQ ID NO: 65) , and a sequence encoding bGH polyA signal. The mCherry expression cassette comprised, from 5’ to 3’, CMV enhancer, CMV promoter, and a sequence encoding mCherry under the regulation of the CMV enhancer and the CMV promoter. The Cas12 endonuclease and the DMD-targeting gRNA composed a CRISPR-Cas12 system targeting human DMD gene. The Cas12 endonuclease (SEQ ID NO: 57) is capable of recognizing 5’-NTN-3’ PAM (N = A, T, G, or C) , and preferably 5’-TTN-3’ PAM (N = A, T, G, or C) , 5’ to the protospacer sequence on the nontarget strand of a target dsDNA.

The DMD-targeting gRNA (one of SEQ ID NOs: 75-125; see Table 1) was composed of a scaffold sequence (direct repeat sequence) (SEQ ID NO: 52) capable of forming a complex with the Cas12 endonuclease (SEQ ID NO: 57) and a guide sequence (one of G1-G51 (SEQ ID NOs: 1-51; see Table 1) ) (3’ to the scaffold sequence) designed to be capable of hybridizing to human DMD gene. The target site on the human DMD gene for each of the guide sequences G1-G51 is listed in Table 2. It is noted that although SEQ ID NOs: 1-51 and 75-125 are denoted as RNA in the electronic sequencing of the disclosure, they denote both (1) the protospacer sequences on human DMD gene corresponding to the guide sequences and the gRNA coding sequences as DNA, respectively; and (2) the guide sequences and the gRNAs as RNA, respectively, in which case “t” denotes “u” .

As a negative control, a non-targeting guide sequence incapable of hybridizing to human DMD gene was used in place of the DMD-targeting guide sequence in the expression plasmid.

Results:

Referring to Tables 2-8 and FIG. 5 and FIG. 8, the experiment results show that for most of the DMD-targeting guide sequences G1-G51, high editing efficiency (indels %) (e.g., above 20%, 30%, or 40%) in vitro was achieved, making thempromising candidates for the development of DMD gene therapy. In particular, the DMD-targeting guide sequence G36 (SEQ ID NO: 36) targeting SD of Exon 51 of human DMD gene that achieved significantly high editing efficiency (FIG. 8B and 8C) and low off-target editing (FIG. 8D and 8E) was selected for further testing in the subsequence Examples.

Table 2. Average (n=3) editing efficiency (indels %) for guide sequences G1-G4 targeting SD (splice donor) of Exon 43 of human DMD gene (FIG. 5A)

Table 3. Average (n=3) editing efficiency (indels %) for guide sequences G5-G6 targeting SA (splice acceptor) of Exon 44 and G7-G11 targeting SD (splice donor) of Exon 44 of human DMD gene (FIG. 5B)

Table 4. Average (n=3) editing efficiency (indels %) for guide sequences G12-G19 targeting SA (splice acceptor) of Exon 45 and G20-G23 targeting SD (splice donor) of Exon 45 of human DMD gene (FIG. 5C)

Table 5. Average (n=3) editing efficiency (indels %) for guide sequences G24-G27 targeting SA (splice acceptor) of Exon 46 and G28-G30 targeting SD (splice donor) of Exon 46 of human DMD gene (FIG. 5D)

Table 6. Average (n=3) editing efficiency (indels %) for guide sequences G31-G34 targeting SA (splice acceptor) of Exon 51 and G35-G38 targeting SD (splice donor) of Exon 51 of human DMD gene in HEK293T cells (FIG. 8B) and iPS cells (FIG. 8C) , respectively.

Table 7. Average (n=3) editing efficiency (indels %) for guide sequences G39-G45 targeting SA (splice acceptor) of Exon 53 and G46-G47 targeting SD (splice donor) of Exon 53 of human DMD gene (FIG. 5E)

Table 8. Average (n=3) editing efficiency (indels %) for guide sequences G48-G51 targeting SA (splice acceptor) of Exon 55 of human DMD gene (FIG. 5F)

Example 2. Testing of in vivo editing efficiency for human DMD gene

This Example demonstrates the in vivo editing efficiency of AAV-delivered CRISPR-Cas12 system containing the DMD-targeting guide sequence G36 (SEQ ID NO: 36) .

Transgene Plasmid Construction:

A rAAV transgene plasmid for packaging into wild type AAV9 capsid to prepare all-in-one rAAV9 particles was designed to comprise, from 5’ to 3’, the elements in Table 9 below, especially the three consecutive copies of gRNA (SEQ ID NO: 71) composed of scaffold sequence (SEQ ID NO: 53) 5’ to guide sequence G36 (SEQ ID NO: 36) . The three consecutive copies of gRNA (SEQ ID NO: 71) may also be regarded as a single gRNA (SEQ ID NO: 74) with gRNA configuration of “DgDgDg” , wherein “D” denotes a direct repeat sequence (or a scaffold sequence) and “g” denotes a guide sequence. Three kinds of additional all-in-one rAAV9 particles were also prepared with all the same elements but different gRNA configurations of “Dg” (SEQ ID NO: 71) , “DgD” (SEQ ID NO: 72) , and “DgDgD” (SEQ ID NO: 73) , respectively.

Table 9. Elements of the rAAV transgene plasmid

The full length of the 5’ ITR-to-3’ ITR sequence of the transgene plasmid is set forth in SEQ ID NO: 69.

Results:

A novel humanized DMD mouse model ΔmE51E52, hE51KI were generated for in vivo evaluation of DMD editing efficiency, using targeted deletion of mouse DMD Exon 51 and Exon 52 via replacement with human DMD Exon51, which disrupted the dystrophin ORFs. FIG. 7 shows the generation of the mouse model. FIG. 7A shows sirius red staining and HE staining of TA, DI, and heart muscle of WT and ΔmE51E52, hE51KI mice. FIG. 7B shows dystrophin immunohistochemistry from indicated muscles of WT and ΔmE51E52, hE51KI mice. Dystrophin and spectrin are shown in green and magenta, respectively. FIG. 7C shows that serum CK, a marker of muscle damage and membrane leakage, was measured in WT and ΔmE51E52, hE51KI mice (n=6) . FIG. 7D shows that WT and ΔmE51E52, hE51KI mice were subjected to forelimb grip strength testing to measure muscle performance (n=6) . All mice were 8 weeks old at the time of the experiment. Data are represented as mean ± SEM. Each dot represents an individual mouse. **P < 0.01 using unpaired two-tailed Student’s t test. Scale bar, 200 μm. As noted, the expression of dystrophin (FIG. 7B) in the ΔmE51E52, hE51KI mouse model was clearly lower than in the WT, and so was the significantly lower muscle performance (FIG. 7C) .

The experiment results show that dystrophin expression was rescued by intramuscular (IM) injection in tibialis anterior (TA) muscle 4-week post-injection of the rAAV9 particles with gRNA configurations “Dg” , “DgD” , “DgDgD” , or “DgDgDg” into the ΔmE51E52, hE51KI mouse model in a fixed dose of 2.5E11 vg (FIG. 9) . FIG. 9A shows overview for the in vivo intramuscular (IM) injection of rAAV particles into the TA muscle. Left leg was injected with saline as a control. Black arrows indicate time points for tissue collection after injection. FIG. 9B-9C. Genomic indels (B) and RNA indels (C) were analyzed by NGS sequencing (n=3) . Different rAAV particles with various gRNA configurations (Dg, DgD, DgDgD, and DgDgDg having different combinations of Direct repeat (D) and guide sequence (g) ) were tested, showing that the rAAV particles with gRNA configuration “DgDgDg” achieved the best editing efficiency on both DNA and RNA editing levels. “Productive editing” denotes that the introduction of indel mutation into DMD gene leads to expression of functional dystrophin (mutant) as needed. “Nonproductive editing” denotes that the introduction of indel mutation into DMD gene does not lead to expression of functional dystrophin (mutant) as needed. “Exon framed” denotes that the introduction of indel mutation into DMD gene generates frameshift mutation in the RNA transcribed from DMD gene that leads to expression of functional dystrophin (mutant) as needed. “Exon skipping” denotes that the introduction of indel mutation into DMD gene generates a mutation in the RNA transcribed from DMD gene that leads to exon skipping of Exon 51 that leads to expression of functional dystrophin (mutant) as needed. “Out of frame” denotes that the introduction of indel mutation into DMD gene generates nonsense mutation (stop codon) in the RNA transcribed fromDMD gene that pre-terminates translation of the transcribed RNA. FIG. 9D shows western blot analysis of dystrophin and vinculin expression in TA muscles 4 weeks after injection with rAAV particles or saline, showing that the rAAV particles with gRNA configuration “DgDgDg” achieved the highest expression of dystrophin. FIG. 9E shows comparison of dystrophin expression of different rAAV particles by immunofluorescence. Dystrophin is shown in green. Scale bar, 200 μm. Data are represented as mean ± SEM.

Additional experiments were conducted to test systemic and intravenous delivery of the rAAV particles with gRNA configuration “DgDgDg” .

The experiment results show that dystrophin expression was rescued by systemic delivery of the rAAV particles with gRNA configuration “DgDgDg” into the ΔmE51E52, hE51KI mouse model (FIG. 10) . FIG. 10A shows schematic of intraperitoneal injection of the rAAV particles into postnatal-day-3 (P3) ΔmE51E52, hE51KI mice in a dose of 2.5E13 vg/kg. Saline was injected as mock-treated controls. Black arrows indicate time points for tissue collection. FIG. 10B-10C show measurement by deep sequencing of genomic indels (B) and RNA indels (C) in TA, DI, and heart after systemic delivery (n=3) . FIG. 10D shows western blot analysis, exhibiting restoration of dystrophin expression in the TA, DI, and heart of ΔmE51E52, hE51KI mice 4 weeks, 8 weeks, and 16 weeks after injection. Dilutions of protein extract from WT mice were used to standardize dystrophin expression (10%, 25%, and 50%) . FIG. 10E shows immunohistochemistry for dystrophin in TA, DI, and heart of ΔmE51E52, hE51KI mice performed at indicated time points. Dystrophin is shown in green. Scale bar, 200 μm. FIG. 10F shows quantification of Dys⁺ (positive) fibers in cross sections of TA, DI, and heart muscles (n=3) . FIG. 10G-10H show rotarod running time (G) and forelimb grip strength (H) measured in treated ΔmE51E52, hE51KI mice, control (saline injected ΔmE51E52, hE51KI mice, ) , or untreated wild type mice (n=3) . Dots and bars represent biological replicates and are mean ± SEM. Different asterisks represent statistical significance (P< 0.01) using unpaired two-tailed Student’s t test. TA = tibialis anterior muscle. DI = diaphragm muscle. As noted, the saline-injected ΔmE51E52, hE51KI mice (control) exhibited significantly lower muscle performance compared with wild type mice, whereas the treated ΔmE51E52, hE51KI mice exhibited muscle performance that was NOT significantly different from wild type mice, indicating that by the systemic administration of the rAAV particles, the muscle performance of ΔmE51E52, hE51KI mice was restored to the normal level of wild type mice.

The experiment results show that dystrophin expression was rescued by tail vein injection of the rAAV particles with gRNA configuration “DgDgDg” into the ΔmE51E52, hE51KI mouse model (FIG. 11 and FIG. 12) . FIG. 11A shows schematic of tail vein injection of the rAAV particles into 2-weeks old ΔmE51E52, hE51KI mice in a dose of 5E13 vg/kg. Saline was injected as mock-treated controls. Black arrows indicate time points for tissue collection. FIG. 11B-11C show measurement by deep sequencing of genomic indels (B) and RNA indels (C) in TA, DI, and heart 4 weeks and 6 months after injection (n=3) . FIG. 11D shows immunohistochemistry for dystrophin in TA, DI, and heart of ΔmE51E52, hE51KI mice performed at indicated time points. Dystrophin is shown in green. Scale bar, 200 μm. FIG. 11E shows quantification of Dys⁺ (positive) fibers in cross sections of TA, DI, and heart muscles (n=3) . FIG. 12A-12B show rotarod running time (A) and forelimb grip strength (B) measured in treated ΔmE51E52, hE51KI mice, untreated ΔmE51E52, hE51KI mice (saline injected control) , or untreated wild type mice (n=3) . Dots and bars represent biological replicates and are mean ± SEM. Different asterisks represent statistical significance (P< 0.01) using unpaired two-tailed Student’s t test. As noted, the untreated ΔmE51E52, hE51KI mice exhibited significantly lower muscle performance compared with wild type mice, whereas the treated ΔmE51E52, hE51KI mice exhibited muscle performance that was NOT significantly different from wild type mice at 6 months post-injection, indicating that by the intravenous injection of the rAAV particles, the muscle performance of ΔmE51E52, hE51KI mice was restored to the normal level of wild type mice.

Overall, it was demonstrated that by using the DMD-targeting guide sequence and the CRISPR-Cas12 system of the disclosure, dystrophin expression in vivo can be rescued and even restored to the level of wild type, indicating a promising strategy to treat diseases caused by deficient dystrophin expression, e.g., Duchenne muscular dystrophy (DMD) .

Various modifications and variations of the described products, methods, and uses of the disclosure will be apparent to those skilled in the art without departing from the scope and spirit of the disclosure. Although the disclosure has been described in connection with specific embodiments, it will be understood that it is capable of further modifications and that the disclosure as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the disclosure that are obvious to those skilled in the art are intended to be within the scope of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure come within known customary practice within the art to which the disclosure pertains and may be applied to the essential features herein before set forth.

EXAMPLARY SEQUENCES

Table 1.

Table 1 (continue) .

Claims

A guide nucleic acid comprising a guide sequence capable of hybridizing to a target sequence on a target strand of a DMD gene; wherein the protospacer sequence on the nontarget strand of the DMD gene corresponding to the target sequence is located at or within an exon of the DMD gene, or at or within a splice donor or a splice acceptor of the exon; wherein the exon is selected from the group consisting of Exon 43, Exon 44, Exon 45, Exon 46, Exon 51, Exon 53, Exon 55.
The guide nucleic acid of any preceding claim, wherein the protospacer sequence is immediately 3’ to a protospacer adjacent motif (PAM) of 5’-NTN-3’, wherein N is A, T, G, or C.
The guide nucleic acid of any preceding claim, wherein the guide sequence comprises (1) a sequence of any one of SEQ ID NOs: 36, 1-35, and 37-51 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to any one of SEQ ID NOs: 36, 1-35, and 37-51 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to any one of SEQ ID NOs: 36, 1-35, and 37-51.
The guide nucleic acid of any preceding claim, wherein the guide nucleic acid comprises a scaffold sequence capable of forming a complex with a nucleic acid programmable binding protein (napBP) , and wherein the hybridization of the guide sequence to the target sequence guides the complex to the DMD gene.
The guide nucleic acid of any preceding claim, wherein the scaffold sequence has substantially the same secondary structure as the secondary structure of the sequence of SEQ ID NO: 52 or 53; or wherein the scaffold sequence comprises (1) a sequence of SEQ ID NO: 52 or 53 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: 52 or 53 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: 52 or 53.
The guide nucleic acid of any preceding claim, wherein the guide nucleic acid comprises a sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of any one of SEQ ID NOs: 71-125; or a sequence having at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotide differences, whether consecutive or not, compared to the sequence of any one of SEQ ID NOs: 71-125.
A polynucleotide comprising or encoding one or more (e.g., two, three) copies of the guide nucleic acid of any preceding claim.
A system comprising:

(1) the guide nucleic acid of any preceding claim, or a polynucleotide encoding the guide nucleic acid; and

(2) a nucleic acid programmable binding protein (napBP) , or a polynucleotide encoding the napBP.
The system of any preceding claim, wherein the napBP is capable of recognizing a protospacer adjacent motif (PAM) of 5’-NTN-3’ immediately 5’ to a protospacer sequence on the nontarget strand of the DMD gene, wherein N is A, T, G, or C.
The system of any preceding claim, wherein the napBP is a nucleic acid programmable DNA endonuclease (napDNAn) .
The system of any preceding claim, wherein the napDNAn is a Cas9 endonuclease, a Cas12 endonuclease, or an IscB endonuclease.
The system of any preceding claim, wherein the napBP comprises a sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the sequence of SEQ ID NO: 55 or 57 or a N-terminal truncation thereof without the first N-terminal Methionine. In some embodiments, the napBP retains at least 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) of the guide sequence-specific DNA endonuclease activity of the sequence of SEQ ID NO: 55 or 57.
A vector comprising the polynucleotide of any preceding claim.
The vector of any preceding claim, wherein the vector is a plasmid, an adeno-associated virus (AAV) vector, a retroviral vector, an adenoviral vector, or a lentiviral vector.
A recombinant adeno-associated virus (rAAV) vector genome comprising:

(1) a first polynucleotide sequence comprising a sequence encoding a guide nucleic acid of any preceding claim; and

(2) a second polynucleotide sequence comprising a sequence encoding a nucleic acid programmable binding protein (napBP) ,

wherein the rAAV vector genome is adapted to be encapsulated into a rAAV particle.
The rAAV vector genome of any preceding claim, wherein the rAAV vector genome comprises a sequence having a sequence identity of at least about 80%to the sequence of SEQ ID NO: 69; or comprises the sequence of SEQ ID NO: 69.
The rAAV vector genome of any preceding claim, wherein the rAAV vector genome comprises, from the 5’ to 3’,

(1) the 5’ ITR of SEQ ID NO: 58;

(2) the promoter of SEQ ID NO: 59;

(3) the scaffold sequence of SEQ ID NO: 53;

(4) the guide sequence of SEQ ID NO: 36;

(5) the scaffold sequence of SEQ ID NO: 53;

(6) the guide sequence of SEQ ID NO: 36;

(7) the scaffold sequence of SEQ ID NO: 53;

(8) the guide sequence of SEQ ID NO: 36;

(9) the promoter of SEQ ID NO: 60;

(10) the Kozak sequence of gccacc;

(11) the in-frame start codon ATG;

(12) the first sequence encoding the first NLS of SEQ ID NO: 62;

(13) the sequence encoding the napBP of SEQ ID NO: 57;

(14) the second sequence encoding the second NLS of SEQ ID NO: 65;

(15) an in-frame stop codon,

(16) the WPRE sequence of SEQ ID NO: 66;

(17) the sequence encoding a polyAsignal of SEQ ID NO: 67; and

(18) the 3’ ITR;

wherein the rAAV vector genome comprises a sequence having a sequence identity of at least about 80%to the sequence of SEQ ID NO: 69;

wherein the rAAV vector genome does not contain any nucleotide difference from the sequence of SEQ ID NO: 69 in any one of the components (1) to (18) ;

wherein the rAAV vector genome contains one or more nucleotide difference or does contain any nucleotide difference from the sequence of SEQ ID NO: 69 between any two adjacent components of the components (1) to (18) .
A recombinant AAV (rAAV) particle comprising the rAAV vector genome of any preceding claim.
The rAAV particle of any preceding claim, wherein the rAAV particle comprising a capsid with serotype of wild type AAV9.
A method for production of the rAAV particle of any preceding claim, comprising culturing in a host cell a transgene plasmid comprising the rAAV vector genome of any preceding claim.
A cell comprising a transgene plasmid comprising the rAAV vector genome of any preceding claim for the production of a rAAV particle.
A pharmaceutical composition comprising (1) the system of any preceding claim or the rAAV particle of any preceding claim, and (2) a pharmaceutically acceptable excipient.
A method of modifying expression of a DMD gene, comprising contacting the DMD gene with the system of any preceding claim, the rAAV particle of any preceding claim, or the pharmaceutical composition of any preceding claim.
A cell or a progeny thereof comprising the guide nucleic acid of any preceding claim, the system of any preceding claim, or the rAAV particle of any preceding claim.
A cell or a progeny thereof comprising DMD gene modified by the system of any preceding claim, the rAAV particle of any preceding claim, or the method of any preceding claim.
A method for preventing, diagnosing, or treating a DMD associated disease in a subject inneed thereof, comprising administering to the subject the system of any preceding claim, the rAAV particle of any preceding claim, or the pharmaceutical composition of any preceding claim, wherein the napBP modifies DMD gene, and wherein the modification of the DMD gene treats the disease.
The method of any preceding claim, wherein the DMD associated disease is Duchenne muscular dystrophy.