WO2024231820A1

WO2024231820A1 - Treatment of pompe disease

Info

Publication number: WO2024231820A1
Application number: PCT/IB2024/054397
Authority: WO
Inventors: Simon Moore; Ruby BOYANAPALLI
Original assignee: Takeda Pharmaceutical Co Ltd
Current assignee: Takeda Pharmaceutical Co Ltd
Priority date: 2023-05-05
Filing date: 2024-05-06
Publication date: 2024-11-14
Anticipated expiration: 2025-11-05

Abstract

Variant acid alpha-glucosidase (GAA) polypeptides, codon-optimized polynucleotides encoding GAA, and GAA gene therapy methods and constructs are provided..

Description

ACID ALPHA-GLUCOSIDASE GENE THERAPY

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to United States Provisional Patent Application No. 63/500,524, filed May 5, 2023, the content of which is hereby incorporated by reference in its entirety for all purposes.

FIELD OF THE DISCLOSURE

[0002] The disclosure relates to acid alpha-glucosidase (GAA) gene therapy.

BACKGROUND

[0003] Acid alpha-glucosidase (GAA) is an enzyme that is responsible for the critical degradation of glycogen in lysosomes of cells. Loss of its activity leads to progressive intralysosomal accumulation of undegraded glycogen and lysosomal distention. Pompe disease (PD) is caused by mutations and reduced activity of the GAA gene (gaa). PD can be broadly classified into infantile-onset (IOPD) or late-onset (LOPD) PD. IOPD patients have under 1% GAA activity, develop cardiomegaly, muscle weakness with hypotonia, hepatomegaly, breathing problems and die within the first year of life if left not treated. LOPD patients have at least 1% GAA activity and manifest a less severe phenotype but present with progressive limb muscle weakness and respiratory insufficiency.

[0004] Despite the availability of recombinant enzyme replacement therapy (ERT), such as Lumizyme® (marketed as Myozyme® outside of the United States; Sanofi Genzyme), Pompe disease remains a devastating illness. For IOPD, although ERT has extended the lives of patients, many are wheelchair bound and require ventilator assistance, especially at night. Life expectancy in patients with IOPD is still severely shortened and the median age of survival for patients on ERT has not been determined. Although LOPD is heterogenous and the severity falls along a spectrum, many patients still lose independent mobility and/or require ventilator support as their symptoms progress. Patients reach a clinical plateau within 2-3 years of treatment and some show a decline over time (Harfouche (2020) J. Patient Rep. Outcomes 4(1): 83). The primary deficiencies of ERT therapy are: (1) detrimental immune responses including neutralizing antibodies against recombinant GAA enzyme, especially in cross-reactive immune- material (CRIM) negative patients; (2) poor uptake of the GAA enzyme by muscle cells from circulation; (3) limited availability of the GAA enzyme in circulation (85% taken up by the liver); (4) reduced stability of the GAA enzyme at neutral pH; and (4) progressive endosomal dysfunction reducing efficacy of the endogenous enzyme delivery to lysosomes. Other complications associated with ERT include infusion site reactions and the requirement of biweekly or even weekly (in severe cases) infusions. These deficiencies of ERT translate to a poor quality of life, indicating a sustained unmet need for these patients.

[0005] Next-generation ERTs are being developed to address some of these problems and to improve the current standard of care (SOC). Such strategies are focused on improving uptake and bioavailability of GAA into muscles and include: (1) development of a GAA with high mannose 6-phosphate (M6P) content to improve uptake from circulation; (2) chimeric GAA variants with synthetic uptake domains; (3) administering beta-2 agonists to upregulate the expression of the cation-independent M6P receptor (CI-MPR) to improve cellular uptake (Farah et al. (2014) FASEB J. 28(5):2272-2280); and (4) combining ERT with pharmacological chaperones to improve GAA enzyme stability in plasma (Okumiya et al. (2007) Mol. Genet. Metab. 90:49057). Studies in gaa-/- knockout (KO) mice showed that these next-generation ERTs are more efficient than the current ERT treatment in clearing lysosomal glycogen accumulation in muscles (Xu et al. (2019) JCI Insight 4(5):el25358) which may translate into better short-term efficacy in patients. However, the key challenges with ERT are also expected to persist for the next generation ERTs, as evidenced by the demonstration of only an incremental benefit over current SOC in Pompe patients (see generally, Diaz-Manera, J., et al (2021). The Lancet. Neurology, 20(12), 1012-1026; Schoser, B., et al. (2021). The Lancet. Neurology, 20(12), 1027-1037; and , Kishnam, P. S., et al. (2023). Genetics in Medicine, 25(2), 100328. ). It is also not clear if any of these next-generation approaches will be able to address the loss of the ERT benefit over time.

[0006] While glycogen accumulates in virtually all tissues of PD patients, the clinical manifestations are predominantly observed in the skeletal, cardiac, and respiratory muscles. The major unmet needs in PD are due to limited availability of GAA enzyme in the respiratory and deep skeletal muscles that is a consequence of not only low levels of circulating GAA enzyme and poor GAA enzyme uptake into muscle cells, but also of reduced enzyme stability and exaggerated immune responses.

[0007] GAA gene therapy (GT) has the potential to have a lasting therapeutic effect on patients suffering from PD by delivering continuous, high exposure of the missing enzyme to the affected tissues. Currently, there are three companies testing Pompe GT candidates in Phase I/II clinical trials and several additional companies with candidates in the preclinical stage. Two programs currently in clinical trials, (1) ACT-CS101 (CT Identifier: NCT03533673) sponsored by Actus Therapeutics and Asklepios Biopharmaceutical, Inc. and (2) RESOLUTESM (CT Identifier: NCT03893240) sponsored by SparkTM Therapeutics, are testing adeno-associated virus (AAV)-delivered liver-directed therapies that rely on cross-correction from circulation and therefore run the same risks as current ERT of low serum stability, poor uptake and inefficient deep tissue distribution. In addition, neither of the Actus and Spark candidates have accounted for the reduced transduction efficiency of liver seen in primates vs. that seen in mouse models of PD and, given also the high turn-over rate of liver cells, it is predicted that poor efficacy will be observed at their selected clinical doses. Another program in clinical trials, Audentes Therapeutics, Inc.'s FORTIS study (CT Identifier: NCT04174105) sponsored by Audentes’ acquirer Astellas Gene Therapies, is in phase 1/2 for LOPD. While the Audentes GT candidate, AT845, is utilizing a muscle-directed promoter, the AAV8 capsid being used does not have high muscle tropism in primates and a high dose of virus of 1x1014 vg/kg was required for efficient transduction and skeletal muscle glycogen normalization in Pompe mouse models. Also, in trials for a different rare muscle disease, the Audentes AAV8-delivered ATI 32 candidate has seen three clinical fatalities using doses greater than 1x1014 vg/kg. Similarly, with regard to their Pompe gene therapy, a high dose of vector (> 2x1014 vg/kg) was required to demonstrate increased GAA levels in muscles, casting serious doubt on the potential of the Audentes candidate to achieve therapeutic levels of muscle GAA expression in patients at a safe dose. This clinical trial has been currently put on hold due to serious adverse events (S AEs) of peripheral sensory neuropathy in one of the trial participants.

[0008] A need therefore exists for a therapy that effectively and safely delivers GAA to skeletal, cardiac, and respiratory muscles for the treatment of Pompe disease and other GAA- related disorders. BRIEF SUMMARY OF THE DISCLOSURE

[0009] The disclosure relates to a gene therapy that addresses the shortcomings of ERTs and competitor gene therapy (GT) candidates, thereby providing a transformative therapy for Pompe disease (PD) patients. The present therapy targets hard to treat muscle tissues by combining (1) an engineered AAV capsid that efficiently delivers the genetic payload to muscles and can be dosed into environmentally seropositive patients (2) transcriptional elements that drive maximal expression of GAA in muscles, and (3) an engineered GAA protein variant that has enhanced stability, uptake, catalytic activity, and a reduced immunogenic profile. The present GAA GT efficiently delivers, and provides a high, continuous exposure, of missing GAA enzyme to GAA- deficient muscle tissues by optimizing tissue-specific delivery, expression and function of the GAA enzyme and as a result will significantly improve patient quality of life and extend the lifespan of Pompe disease patients.

[0010] The present disclosure concerns methods and compositions to alleviate the primary unmet need of Pompe patients that leads to morbidity and mortality by achieving therapeutic GAA protein levels in the lysosomes of diaphragm, cardiac, skeletal, and smooth muscle cells. The muscle-tropic AAV9-based or chimeric capsid influences delivery (i.e., transduction) of the recombinant genome comprising the engineered GAA transgene to the targeted muscle tissue. The muscle-specific promoter and enhancer elements drive strong expression of the GAA transgene in muscle cells, as well as non-muscle cells. Upon expression, the engineered GAA protein is processed and trafficked to cellular lysosomes where it can act to break down glycogen. A portion of the engineered GAA that is expressed in transduced cells is secreted and taken up by surrounding non-transduced cells. This local cross-correction within muscle allows for better treatment of the difficult-to-reach deep skeletal muscle tissues. Importantly, the therapeutic mechanism of action does not require transport into the serum and passive uptake by muscle cells, a major limitation of liver-directed GT approaches. Further, as the average lifespan of skeletal myocytes is 15 years (Spalding (2005) Cell 122(1): 133-43) and >50% cardiomyocytes persist throughout an adult’s lifetime (Lazar (2017) Eur. Heart J. 38(30): 2333 -2342) compared to the high turnover of liver cells, by targeting muscle tissue the GAA GT of the disclosure is more durable than the liver-directed GTs in the art. [0011] In one aspect, the disclosure provides a nucleic acid encoding an acid alphaglucosidase (GAA) protein, the nucleic acid comprising a first polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical to CO3-MP- 6-dNA (SEQ ID NO:36).

[0012] In one aspect, the disclosure provides a nucleic acid encoding an acid alphaglucosidase (GAA) protein, the nucleic acid comprising a first polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical to CO3-MP- WT-NA (SEQ ID NO: 34).

[0013] In one aspect, the disclosure provides an expression cassette comprising a GAA nucleic acid disclosed herein and at least one regulatory nucleic acid sequence operably linked to the sequence encoding the GAA protein.

[0014] In one aspect, the disclosure provides a mammalian expression vector comprising an expression cassette described herein.

[0015] In one aspect, the disclosure provides a recombinant acid alpha-glucosidase (GAA) variant protein, wherein the GAA variant protein comprises an amino acid substitution selected from the group consisting of T151I, L650G, L650S, L650T, L650E, L650Y, L650F, S676D, L678H, and L868F, numbered relative to the full-length wild type GAA protein sequence of FL- WT-AA (SEQ ID NO:2). In some embodiments, the recombinant GAA variant protein of claim 59, comprising an amino acid substitution selected from the group consisting of T1511, L650G, S676D, and L678H, numbered relative to the full-length wild type GAA protein sequence of FL- WT-AA (SEQ ID NO:2).

[0016] In one aspect, the disclosure provides a recombinant acid alpha-glucosidase (GAA) variant protein, wherein the GAA variant protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical to MP-6-AA (SEQ ID NO:37) and wherein the GAA variant comprises one or more variant amino acids selected from the group consisting of T151I, L650G, S676D, and L678H.

[0017] In one aspect, the disclosure provides a method for treating Pompe disease in a subject in need thereof, the method comprising administering to the subject a therapeutically effective amount of a composition described herein. BRIEF DESCRIPTION OF THE DRA WINGS

[0018] Figure 1A, IB, 1C, ID, IE, IF, 1G, and 1H collectively illustrate (A) a vector comprising a representative GAA transgene driven by a muscle specific promoter and enhancer packaged within an AAV9 capsid. The following naming convention will be used to describe each vector: [Capsid.Enhancer.Promoter.Transgene], For instance, a candidate with an AAV9 capsid, the Dph-CRE04 enhancer, the SPc512 promoter and a human wild-type GAA will be abbreviated as “AAV9.Dph-CRE04.SPc512.WT hGAA;” GAA protein expression measured using GAA activity in the (B) heart, (C) diaphragm, (D) quadriceps and (E) triceps of GAA-/- mice dosed with 3x1013 vg/kg AAV9 capsid vectors comprising constructs containing various enhancer elements (SKSH4, CSK-SH5 and/or Dph-CRE04) and/or promoters (CBA or SPc512 or Desmin) and codon optimized human GAA (CO3 GAA) or WT GAA using a 4-MUG assay. gaa+/+ = wild type (WT) vehicle-treated mouse; gaa-/- = gaa knock-out (KO) vehicle-treated mouse. (F) collectively illustrate the GAA activity in heart tissue lysates, quadriceps tissue lysates, and diaphragm tissue lysates of GAA KO mice that received either vehicle or a vector expressing WT hGAA. (G) PAS staining of heart tissue from GAA KO mice after dosing with a 3x1013 vg/kg rAAV9.GAA construct containing SKSH4.Desmin.WT human GAA compared to control gaa+/+ and gaa-/- mice and (H) western blot analysis of heart tissue protein from GAA KO mice that received 3x1013 vg/kg AAV9 capsid vector comprising constructs containing WT hGAA , with or without an enhancer, or a buffer group showing trafficking and lysosomal processing of human GAA (lane 1 — WT hGAA; Lanes 2 and 10 — molecular weight markers; lanes 3 and 4 — heart protein from GAA KO (gaa-/-) mice treated with vectors comprising AAV9. desmin. CO3 GAA; lanes 5 and 6 — heart protein from GAA KO mice treated with vectors comprising AAV9.Sk-SH4.desmin.CO3 GAA; lanes 7 and 8 — heart protein from GAA KO mice treated with vehicle-only; and lane 9 — no transgene control).

[0019] Figures 2A, 2B, and 2C collectively illustrate (A) GAA activity and (B) reduction in glycogen levels in heart, quadriceps, triceps, and diaphragm of GAA KO mice after dosing with 3x10¹³ vg/kg AAV9 capsid vectors comprising constructs with or without enhancer elements using a 4-MUG assay and (C) tissue section images of (i) GAA protein by immunohistochemical staining (IHC); (ii) glycogen by Periodic Acid Schiff (PAS) staining; (iii) lysosomal-associated membrane protein 1 (LAMP-1) levels by IHC; and (iv) tissue morphology by haematoxylin and eosin (H&E) staining, in representative tissue sections of quadriceps of control and vector- treated mice dosed with 3x10¹³ vg/kg AAV9 capsid vectors comprising constructs with or without an enhancer element.

[0020] Figure 3 illustrates codon optimized GAA constructs tested, containing 5’ and 3’ inverted terminal repeats (ITRs) from AAV2, the Sk-SH4 enhancer, the human desmin promoter, the Minute Virus of Mice (MVM) intron, an SV40 polyadenylation (poly A) signal (SV40pA), and a DNA ID tag, for expressing WT human GAA (AAV9.Sk-SH4.desmin.WT GAA), CO1 GAA (AAV9.Sk-SH4. desmin. CO 1 GAA), CO2GAA (AAV9.Sk-SH4. desmin. CO2GAA), and CO3 GAA AAV9.Sk-SH4. desmin. CO3 GAA)

[0021] Figures 4A and 4B collectively illustrate (A) GAA activity levels or (B) reduction in glycogen levels in heart, quadriceps, and diaphragm in GAA KO mice after dosing with 3x10¹³ vg/kg AAV9 capsid vectors comprising constructs containing a codon optimized human GAA (CO1, CO2, or CO3) using a 4-MUG assay. gaa^+/+ = WT vehicle-treated mouse; gaa ^!~ = gaa KO vehicle-treated mouse; wt GAA = AAV9.Sk-SH4.desmin.WT GAA; CO1 = AAV9.Sk- SH4. desmin. CO 1 GAA; CO2 = AAV9.Sk-SH4. desmin. CO2 GAA; and CO3 = AAV9.Sk- SH4. desmin. CO3 GAA.

[0022] Figure 5 illustrates the activity of GAA variants (Var 6, Var 7, Var 8, Var 9, Var 10, Var 11, Var 12, and Var 13) as compared to controls (WT hGAA, control plasmid, and no plasmid) in C2C12 gaa ¹' mouse muscle cells following transfection of the cells with plasmids and activity measured in cell extracts using the 4-MUG assay.

[0023] Figures 6A and 6B collectively illustrate (A) kinetic activity using 4-MUG as the substrate and (B) activity on glycogen of GAA variant 6 (Var 6) compared to WT hGAA.

[0024] Figures 11 A, 11B, 11C, 11D, HE, 11F, HG, 11H, and HI collectively illustrate SEQ ID NOs: 1, 2, and 13-30, representing numerous human GAA polynucleotides and protein variants.

[0025] Figures 12A, 12B, and 12C collectively illustrate SEQ ID NOs: 31, 60, and 61, representing codon-altered human GAA polynucleotides and protein variants. [0026] Figures 13A and 13B collectively illustrate SEQ ID NOs: 34-45, representing polynucleotides and polypeptides associated with human GAA protein variants.

[0027] Figure 14 illustrates SEQ ID NOs: 46-49, representing polynucleotides and polypeptides of promoters, enhancers, and mammalian expression vectors.

[0028] Figures 15A, 15B, and 15C illustrate SEQ ID NO: 60; representing the polynucleotide associated with a human GAA protein variant (CO3-FL-6-dNA).

[0029] Figure 16 illustrates the list of IUPAC degenerate nucleotide codes.

[0030] Figures 17A, 17B, and 17C illustrate SEQ ID NO: 36; representing the polynucleotide associated with a human GAA protein variant (CO3-MP-6-dNA).

DETAILED DESCRIPTION OF THE DISCLOSURE

I. INTRODUCTION

[0031] As stated above, there are numerous drawbacks associated with currently available ERT regimens, including: (1) detrimental immune responses including neutralizing antibodies against recombinant GAA enzyme, especially in cross-reactive immune-material (CRIM) negative patients; (2) poor uptake of the GAA enzyme by muscle cells from circulation; (3) limited availability of the GAA enzyme in circulation (85% taken up by the liver); (4) reduced stability of the GAA enzyme at neutral pH; and (4) progressive endosomal dysfunction reducing efficacy of the endogenous enzyme delivery to lysosomes. AAV therapy for Pompe disease, while promising, have to this point encountered similar difficulties in the case of liver-directed therapies (e.g., low serum stability, poor uptake and inefficient deep tissue distribution). And AAV8 therapies with muscle-specific expression have been forced to rely upon dangerously high doses.

D. DEFINITIONS

[0032] Unless otherwise defined herein, scientific, and technical terms used herein have the meanings that are commonly understood by those of ordinary skill in the art. In the event of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition. Unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular. [0033] As used herein, the terms "Alpha-glucosidase" and “GAA” are used interchangeably and refer a protein with glucosidase activity for hydrolyzing terminal, non-reducing (1— >4)- linked a-D-glucose residues in polysaccharides with release of D-glucose (e.g., active GAA, also referred to herein as “GAA mature polypeptide,” “GAA MP,” or simply “MP”) or a protein precursor thereof (e.g., a pro-protein or a pre-pro-protein, often referred to as pGAA and ppGAA), e.g., as measured by quantification of glucose release from glycogen following incubation with the GAA polypeptide.

[0034] GAA is translated as an inactive, single-chain polypeptide that includes a signal peptide and a propeptide, often referred to as a GAA pre-pro-protein. The GAA pre-pro-protein undergoes post-translational processing to form an active GAA protein. This processing includes removal (e.g., by cleavage) of the signal peptide, followed by removal (e.g., by cleavage) of the propeptide, to form a mature GAA polypeptide.

[0035] Generally, polynucleotides encoding the wild-type human GAA encode for an inactive single-chain polypeptide (e.g., a pre-pro-protein; amino acids 1-952 of GAA-FL-WT-AA (SEQ ID NO:2)) that undergoes post-translational processing to form an active GAA protein. For example, the GAA pre-pro-protein is first cleaved with a signal peptidase to release the encoded signal peptide (amino acids 1-27 of GAA-FL-WT-AA (SEQ ID NO:2)), forming a GAA precursor (amino acids 28-952 of SEQ ID NO:2; 110 kDa). The GAA precursor is cleaved by additional proteases to release a first associated polypeptide of 19.4 kD (amino acids 792-952 of SEQ ID NO:2), a second associated polypeptide of 3.9 kD (amino acids 78-113 of SEQ ID NO:2), and a third associated polypeptide of 10.4 kDa (amino acids 122-200 of SEQ ID NO:2), forming a mature GAA (amino acids 203-782 of SEQ ID NO:2; 70 kDa). As used herein, the “MP” designation refers to a precursor polypeptide that includes the first associated polypeptide, the second associated polypeptide, the third associated polypeptide, and the mature GAA polypeptide. For additional information on the structure, function, and activation of GAA see, e.g., Roig-Zamboni V, et al., Structure of human lysosomal acid a-glucosidase-a guide for the treatment of Pompe disease, Nat Commun., 8(1): 1111 (2017) and Moreland et al., Speciesspecific differences in the processing of acid a-glucosidase are due to the amino acid identity at position 201, Gene, 491:25-30 (2012), the contents of which is hereby incorporated by reference herein in their entirety. [0036] As used herein, the terms "GAA polypeptide” refers to a polypeptide having GAA glucosidase activity under particular conditions, e.g., as measured by quantification of glucose release from glycogen following incubation with the GAA polypeptide. GAA polypeptides include precursor polypeptides (e.g., GAA pre-pro-polypeptides and pro-polypeptides) which, when activated by the post-translational processing described above, become active GAA polypeptides with GAA glucosidase activity, as well as the active GAA polypeptides (e.g., GAA-MP) themselves. In an exemplary embodiment, a human GAA polypeptide refers to a polypeptide that includes an amino acid sequence with high sequence identity (e.g., at least 85%, 90%, 95%, 99%, or more) to the portion of the wild-type human GAA polypeptide that includes the mature GAA polypeptide, GAA-MP-AA (SEQ ID NO:35) or to the portions of the disclosed variant GAA polypeptides (variants 6-13, shown in Figures 1 IB-1 II. Specifically included in the definition of GAA polypeptides are GAA polypeptides with one or more of the amino acid substitutions T151I, L650G, L650S, L650T, L650E, L650Y, L650F, S676D, and L678H, relative to the wild-type human GAA polypeptide, present in variants 6-13 described herein.

[0037] Non-limiting examples of wild type GAA polypeptides include human GAA polypeptides (e.g., GenBank accession nos. NP_000143.2 (GAA-FL-WT-AA (SEQ ID NO:2)) and UniProt accession no. P10253), and natural variants thereof; bovine GAA (e.g., UniProt accession no. Q9MYM4); murine GAA (e.g., UniProt accession no. P70699); rat GAA (e.g., UniProt accession no. Q6P7A9), and natural variants thereof; and other mammalian GAA homologues (e.g., chimpanzee, ape, hamster, guinea pig, etc.).

[0038] As used herein, a GAA polypeptide includes natural variants and artificial constructs. As used in the present disclosure, GAA encompasses any natural variants, alternative sequences, isoforms, or mutant proteins that retain some basal GAA glucosidase activity (e.g., at least 5%, 10%, 25%, 50%, 75%, or more of the corresponding wild type activity as assayed), including one or more variant amino acids found in the human population, such as S46P, C103G, C103R, C127F, R190H, Y191C, L208P, P217L, G219R, R224P, R224Q, R224W, T234K, T234R, A237, S251L, S254L, E262K, P266S, P285R, P285S, L291F, L291P, Y292C, G293R, L299R, H308L, H308P, G309R, L312R, N316I, M318K, M318T, P324L, W330G, G335E, G335R, P347R, L355P, P361L, C374R, R375L, G377R, P397L, Q401R, W402R, D404N, L405P, M408V, D419V, R437H, A445P, Y455F, P457H, P457L, G478R, W481R, P482R, G483V, A486P, D489N, M519T, M519V, E521K, E521Q, P522A, P522S, S523Y, F525Y, S529V, P545L, G549R, L552P, I557F, C558S, S566P, H568L, N570K, H572Q, Y575C, Y575S, G576R, E579K, R585M, R594H, R594P, S599Y, R600C, R600H, S601L, T602A, G607D, A610V, H612Q, H612Y, T614K, G615R, S619R, S627P, N635K, G638V, G638W, L641V, G643R, D645E, D645H, D645N, C647W, G648D, G648S, R660H, R672Q, R672T, R672W, R702C, R702L, L705P, R725W, T737N, Q743K, W746C, W746G, W746S, Y766C, P768R, R819P, A880D, L901Q, P913R, V916F, L935P, or V949D. As discussed more fully below, this numbering is relative to the wild type human GAA. Other amino acid variations identified in the human population are known and can be found, for example, using the National Center for Biotechnology Information's ("NCBI") variation viewer, accession number GCF 000001405.40.

[0039] As described herein, GAA protein (i.e., as translated with a signal peptide and propeptide) can include one or more variants, with the Variant 6 finding particular use in some embodiments. This is referred to as “GAA-FL-Var6-AA” (SEQ ID NO: 14) with the nucleic acid sequence being referred to herein as "GAA-FL-Var6-NA.” It should be noted that codon- optimized sequences CO1-FL-WT-AA, CO2-FL-WT-AA, and CO3-FL-WT-AA, exemplified herein, also encode the full-length GAA protein. Thus, specifically included in the definition of GAA is all such variants exemplified herein.

[0040] Unless otherwise specified herein, the numbering of GAA amino acids refers to the corresponding amino acid in the full-length, wild-type human GAA pre-pro-polypeptide sequence (GAA-FL-WT-AA), presented as SEQ ID NO: 2 in Figure 11 A. As such, when referring to an amino acid substitution in a GAA polypeptide disclosed herein, the recited amino acid number refers to the analogous (e.g., structurally or functionally equivalent) and/or homologous (e.g., evolutionarily conserved in the primary amino acid sequence) amino acid in the full-length, wild-type GAA pre-pro-polypeptide sequence. For example, a T151I amino acid substitution refers to an threonine to isoleucine substitution at position 151 of the full-length, wild-type human GAA pre-pro-peptide sequence (GAA-FL-WT-AA (SEQ ID NO:2)), as well as a T to I substitution at position 151 of the mature, wild-type GAA single-chain polypeptide (GAA-MP-WT-AA (SEQ ID NO:35)). Both of these nomenclatures describe the same T to I amino acid substitution, in different GAA polypeptides.

[0041] As used herein, the term “GAA polynucleotide” refers to a polynucleotide encoding a GAA polypeptide having GAA glucosidase activity under particular conditions, e.g., as measured by quantification of glucose release from glycogen following incubation with the GAA polypeptide. GAA polynucleotides include polynucleotides encoding GAA precursor polypeptides, including GAA pre-pro-polypeptides, GAA pro-polypeptides, and mature, singlechain GAA polypeptides. Specifically included in the definition of GAA polynucleotides are polynucleotides encoding a GAA polypeptide that includes one or more of the amino acid substitutions T151I, L650G, L650S, L650T, L650E, L650Y, L650F, S676D, L678H, S676D, and L678H, relative to the wild-type human GAA polypeptide. In an exemplary embodiment, a human GAA polynucleotide refers to a polynucleotide that encodes a polypeptide that includes an amino acid sequence with high sequence identity (e.g., at least 85%, 90%, 95%, 99%, or more) to the portion of the wild-type human GAA polypeptide that includes the mature GAA polypeptide, GAA-MP-AA (SEQ ID NO:35) or to the portions of the disclosed variant GAA polypeptides (variants 6-13), shown in Figures 1 IB- 1 II.

[0042] As described herein, GAA polynucleotides can include regulatory elements, such as promoters, enhancers, terminators, polyadenylation sequences, and introns, as well viral packaging elements, such as inverted terminal repeats (“ITRs”), and/or other elements that support replication of the polynucleotide in a non-viral host cell, e.g., a replicon supporting propagation of the polynucleotide, e.g., in a bacterial, yeast, or mammalian host cell.

[0043] Of particular use in the present disclosure are codon-altered GAA polynucleotides. As described herein, the codon-altered GAA polynucleotides provide increased expression of transgenic GAA in vivo, as compared to the level of GAA expression provided by a natively- coded GAA construct (e.g., a polynucleotide encoding the same GAA amino acid sequence using the wild-type human codons). As used herein, the term “increased expression” refers to an increased level of transgenic GAA protein in a tissue (e.g., a muscular tissue) of an animal administered the codon-altered polynucleotide encoding GAA, as compared to the level of transgenic GAA protein in the same tissue of an animal administered a natively-coded GAA construct. Increased expression of the protein leads to an increase in GAA activity; thus, increased expression leads to increased activity.

[0044] In some embodiments, increased expression refers to at least 25% greater transgenic GAA polypeptide in a tissue of an animal administered the codon-altered GAA polynucleotide, as compared to the level of transgenic GAA polypeptide in the same tissue of an animal administered a natively-coded GAA polynucleotide. For the purpose of the present disclosure, increased expression refers to an effect generated by the alteration of the codon sequence, rather than hyperactivity caused by an underlying amino acid substitution. That is, the expression level obtained from a codon-optimized sequence encoding a GAA variant described herein is compared relative to the expression level obtained from a natively-coded GAA variant protein. In some embodiments, increased expression refers to at least 50% greater, at least 75% greater, at least 100% greater, at least 3 -fold greater, at least 4-fold greater, at least 5-fold greater, at least 6- fold greater, at least 7-fold greater, at least 8-fold greater, at least 9-fold greater, at least 10-fold greater, at least 15-fold greater, at least 20-fold greater, at least 25-fold greater, at least 30-fold greater, at least 40-fold greater, at least 50-fold greater, at least 60-fold greater, at least 70-fold greater, at least 80-fold greater, at least 90-fold greater, at least 100-fold greater, at least 125-fold greater, at least 150-fold greater, at least 175-fold greater, at least 200-fold greater, at least 225- fold greater, or at least 250-fold greater transgenic GAA polypeptide in a tissue of an animal administered the codon-altered GAA polynucleotide, as compared to the level of transgenic GAA polypeptide in the same tissue of an animal administered a natively coded GAA polynucleotide. GAA polypeptide levels in a tissue of an animal can be measured, for example, using an ELISA assay specific for GAA polypeptide.

[0045] By “GAA activity” or “GAA glucosidase activity” herein is meant the ability to hydrolyzing terminal, non-reducing (1 — >4)-linked a-D-glucose residues in polysaccharides with release of D-glucose. The activity levels can be measured using any GAA activity known in the art. An exemplary assay for determining GAA activity is quantification of glucose release from glycogen following incubation with the GAA polypeptide.

[0046] Because certain GAA variants have enhanced specific activities as compared to wild type GAA in vivo, in some embodiments, the therapeutic potential of a GAA polynucleotide composition is evaluated by the increase in GAA activity in a tissue of an animal administered a GAA polynucleotide, e.g., instead of or in addition to increased GAA expression in the tissue. In some embodiments, as used herein, increased GAA activity refers to a greater increase in GAA activity in a tissue of an animal administered a codon-altered GAA polynucleotide, relative to a baseline GAA activity in the tissue of the animal prior to administration of the codon-altered GAA polynucleotide, as compared to the increase in GAA activity in the same tissue of an animal administered a natively-coded GAA polynucleotide, relative to a baseline GAA activity in the tissue of the animal prior to administration of the natively- coded GAA polynucleotide. In some embodiments, increased GAA activity refers to at least a 25% greater increase in GAA activity in a tissue of an animal administered the codon-altered GAA polynucleotide, relative to a baseline level of GAA activity in the tissue of the animal prior to administration of the codon- altered GAA polynucleotide, as compared to the increase in the level GAA activity in the blood of an animal administered a natively-coded GAA polynucleotide, relative to the baseline level of GAA activity in the animal prior to administration of the natively-coded GAA polynucleotide. In some embodiments, increased GAA activity refers to at least 50% greater, at least 75% greater, at least 100% greater, at least 3-fold greater, at least 4-fold greater, at least 5-fold greater, at least 6-fold greater, at least 7-fold greater, at least 8-fold greater, at least 9-fold greater, at least 10-fold greater, at least 15-fold greater, at least 20-fold greater, at least 25-fold greater, at least 30-fold greater, at least 40-fold greater, at least 50-fold greater, at least 60-fold greater, at least 70-fold greater, at least 80-fold greater, at least 90-fold greater, at least 100-fold greater, at least 125-fold greater, at least 150-fold greater, at least 175-fold greater, at least 200- fold greater, at least 225-fold greater, or at least 250-fold greater increase in GAA activity in a tissue of an animal administered the codon-altered GAA polynucleotide, relative to a baseline level of GAA activity in the tissue of the animal prior to administration of the codon-altered GAA polynucleotide, as compared to the increase in the level GAA activity in the same tissue of an animal administered a natively-coded GAA polynucleotide, relative to the baseline level of GAA activity in the animal prior to administration of the natively-coded GAA polynucleotide. Activity is measured by quantification of glucose release from glycogen following incubation with the GAA polypeptide, as described herein.

[0047] As described herein, the GAA amino acid numbering system is dependent on whether the GAA pre-pro-peptide (e.g., amino acids 1-69 of the full-length, wild-type human GAA sequence, inclusive of the signal peptide and pro-peptide) is included. Where the pre-pro- peptide is included, the numbering is referred to as “pre-pro-peptide inclusive” or “PPI”. Where the pre-pro-peptide is not included, the numbering is referred to as “pre-pro-peptide exclusive” or “PPE.” For example, LI 17D is PPI numbering for the same amino acid substitution as L48D, in PPE numbering. Similarly, the GAA amino acid numbering is also dependent upon the size of the of the signal peptide and/or propeptide in the particular GAA polypeptide. [0048] As used herein, the term “GAA gene therapy” includes any therapeutic approach of providing an exogenous nucleic acid encoding GAA to a patient to relieve, diminish, or prevent the reoccurrence of one or more symptoms (e.g., clinical factors) associated with a GAA deficiency (e.g., Pompe disease). The term encompasses administering any compound, drug, procedure, or regimen comprising a nucleic acid encoding a GAA molecule, including any modified form of GAA (e.g., a GAA variant 6), for maintaining or improving the health of an individual with a GAA deficiency (e.g., Pompe disease). One skilled in the art will appreciate that either the course of GAA gene therapy or the dose of a GAA gene therapy therapeutic agent can be changed, e.g., based upon the results obtained in accordance with the present disclosure.

[0049] The terms “therapeutically effective amount or dose” or “therapeutically sufficient amount or dose” or “effective or sufficient amount or dose” refer to a dose that produces therapeutic effects for which it is administered. For example, a therapeutically effective amount of a drug useful for treating Pompe disease can be the amount that is capable of preventing or relieving one or more symptoms associated with Pompe disease. In some embodiments, a therapeutically effective treatment results in a decrease in the severity of musculoskeletal ailments (e.g., limb-girdle muscle weakness (LGMW)) in a subject.

[0050] As used herein, the term “gene” refers to the segment of a DNA molecule that codes for a polypeptide chain (e.g., the coding region). In some embodiments, a gene is positioned by regions immediately preceding, following, and/or intervening the coding region that are involved in producing the polypeptide chain (e.g., regulatory elements such as a promoter, enhancer, polyadenylation sequence, 5' -untranslated region, 3 ' -untranslated region, or intron).

[0051] As used herein, the term “regulatory elements” refers to nucleotide sequences, such as promoters, enhancers, terminators, polyadenylation sequences, introns, etc., that provide for the expression of a coding sequence in a cell.

[0052] As used herein, the term “promoter element” refers to a nucleotide sequence that assists with controlling expression of a coding sequence. Generally, promoter elements are located 5' of the translation start site of a gene. However, in certain embodiments, a promoter element may be located within an intron sequence, or 3' of the coding sequence. In some embodiments, a promoter useful for a gene therapy vector is derived from the native gene of the target protein (e.g., a GAA promoter). In some embodiments, a promoter useful for a gene therapy vector is specific for expression in a particular cell or tissue of the target organism (e.g., a muscle-specific promoter). In yet other embodiments, one of a plurality of well characterized promoter elements is used in a gene therapy vector described herein. Non-limiting examples of well-characterized promoter elements include the CMV early promoter, the (3 -actin promoter, and the methyl CpG binding protein 2 (MeCP2) promoter. In some embodiments, the promoter is a constitutive promoter, which drives substantially constant expression of the target protein. In other embodiments, the promoter is an inducible promoter, which drives expression of the target protein in response to a particular stimulus (e.g., exposure to a particular treatment or agent). For a review of designing promoters for AAV-mediated gene therapy, see Gray et al. (Human Gene Therapy 22: 1143-53 (2011)), the contents of which are expressly incorporated by reference in their entirety for all purposes.

[0053] As used herein an “MVM intron” refers to an intron sequence derived from minute virus of mice having high sequence identity to SEQ ID NO: 50. For further information on the MVM intron itself, see Haut and Pintel, J Virol. 72(3): 1834-43 (1998), and use of the MVM intron in AAV gene therapy vectors, see Wu Z et al., Mol Then, 16(2):280-9 (2008), both of which are hereby incorporated by reference.

[0054] As used herein, the term “operably linked” refers to the relationship between a first reference nucleotide sequence (e.g., a gene) and a second nucleotide sequence (e.g., a regulatory control element) that allows the second nucleotide sequence to affect one or more properties associated with the first reference nucleotide sequence (e.g., a transcription rate). In the context of the present disclosure, a regulatory control element is operably linked to a GAA transgene when the regulatory element is positioned within a gene therapy vector such that it exerts an effect (e.g., a promotive or tissue selective affect) on transcription of the GAA transgene.

[0055] As used herein, the term “vector” refers to any nucleic acid construct used to transfer a GAA nucleic acid into a host cell. In some embodiments, a vector includes a replicon, which functions to replicate the nucleic acid construct. Non-limiting examples of vectors useful for gene therapy include plasmids, phages, cosmids, artificial chromosomes, and viruses, which function as autonomous units of replication in vivo. In some embodiments, a vector is a viral vector for introducing a GAA nucleic acid into the host cell. Many modified eukaryotic viruses useful for gene therapy are known in the art. For example, adeno-associated viruses (AAVs) are particularly well suited for use in human gene therapy because humans are a natural host for the virus, the native viruses are not known to contribute to any diseases, and the viruses illicit a mild immune response.

[0056] As used herein, the term “GAA viral vector” refers to a recombinant virus comprising a GAA polynucleotide, encoding a GAA polypeptide, which is sufficient for expression of the GAA polypeptide when introduced into a suitable animal host (e.g., a human). Specifically included within the definition of GAA viral vector are recombinant viruses in which a codon- altered GAA polynucleotide, which encodes a GAA polypeptide, has been inserted into the genome of the virus. Also specifically included within the definition of GAA viral vectors are recombinant viruses in which the native genome of the virus has been replaced with a GAA polynucleotide, which encodes a GAA polypeptide. Included within the definition of GAA viral vectors are recombinant viruses comprising a GAA polynucleotide which encodes a GAA polypeptide with one or more of the amino acid substitutions T151I, L650G, L650S, L650T, L650E, L650Y, L650F, S676D, L678H, S676D, and L678H, relative to the wild-type human GAA polypeptide.

[0057] As used herein, the term “GAA viral particle” refers to a viral particle encapsidating a GAA polynucleotide, encoding a GAA polypeptide, which is specific for expression of the GAA polypeptide when introduced into a suitable animal host (e.g., a human). Specifically included within the definition of GAA viral particles are recombinant viral particles encapsidating a genome in which a codon-altered GAA polynucleotide, which encodes a GAA polypeptide, has been inserted. Also specifically included within the definition of GAA viral particles are recombinant viral particles encapsidating a GAA polynucleotide, which encodes a GAA polypeptide, which replaces the native genome of the virus. Included within the definition of GAA viral particles are recombinant viral particles encapsidating a GAA polynucleotide which encodes a GAA polypeptide with one or more of the amino acid substitutions T151I, L650G, L650S, L650T, L650E, L650Y, L650F, S676D, and L678H, relative to the wild-type human GAA polypeptide, present in variant 6 described herein.

[0058] The term “adeno-associated virus” or “AAV” refers to a small (20 nm) replicationdefective, nonenveloped virus that has a linear single-stranded DNA (ssDNA) genome of approximately 4.8 kilobases (kb) that infects humans and non-human species. AAVs belong to the genus Dependoparvo virus, which in turn belongs to the family Parvoviridae. AAVs are not currently known to cause disease and cause only a very mild immune response. Gene therapy vectors using AAVs can “infect” or transduce both dividing and quiescent cells and persist in an extrachromosomal state without integrating into the genome of the host cell or integrating the genome at a low frequency. The wt genome comprises inverted terminal repeats (ITRs) at both ends of the DNA strand, and two open reading frames (ORFs): rep and cap. The former is composed of four overlapping genes encoding Rep proteins required for the AAV life cycle, and the latter contains overlapping nucleotide sequences of capsid proteins: VP1, VP2 and VP3, which interact to form a capsid with icosahedral symmetry. With regard to gene therapy and transgene delivery, ITRs seem to be the only sequences required in cis next to the therapeutic gene: structural (cap) and packaging (rep) proteins can be delivered in trans. With this assumption many methods were established for efficient production of recombinant AAV (rAAV), or engineered AAV, vectors containing a heterologous sequence, e.g., a reporter or nucleic acid encoding a therapeutic gene product.

[0059] Included within the definition of AAV are AAV type 1 (AAV1), AAV type 2 (AAV2), AAV type 3 (AAV3), AAV type 4 (AAV4), AAV type 5 (AAV5), AAV type 6 (AAV6), AAV type 7 (AAV7), AAV type 8 (AAV8), and AAV type 9 (AAV9) viruses, e.g., encapsidating a GAA polynucleotide, and viruses formed by one or more variant AAV capsid proteins, e.g., encapsidating a GAA polynucleotide. Also included within the definition of AAV are recombinant viruses and viral particles formed using a non-naturally occurring, engineered capsid protein.

[0060] The terms “capsid protein,” “capsid polypeptide,” “cap protein,” or “cap polypeptide” refer to an expression product of a cap nucleic acid from an AAV serotype that forms a protein shell for an AAV virus, such as a wt capsid protein from serotypes 1, 6, 8, or 9; or a protein that shares at least 50% (alternatively at least 75, 80, 85, 90, 95, 96, 97, 98, 99%, or 99.5%) amino acid sequence identity with a wt capsid protein and displays a functional activity of a wt capsid protein. The capsid homology of commonly used AAV serotypes is described in, e.g., Daya and Berns (2008) Clin. Microbiol. Rev. 21 (4): 583-593, which is incorporated herein by reference in its entirety. A “functional activity” of a protein is any activity associated with the physiological function of the protein, whether in vitro, ex vivo, or in vivo. For example, functional activities of an AAV capsid protein may include its ability to form a capsid, evade host antibodies, recognize, and enter a cell, deliver DNA genome to the nucleus and transcription of its DNA genome. In some embodiments, the capsid protein is a variant of the wt capsid protein with an altered functional activity such as tissue transduction or tissue tropism, e.g., into or to muscle, respectively. The term “encapsidates” or “packaged” means encloses or surrounds a gene or virus in a protein shell or capsid. Unless otherwise indicated, the capsid polypeptides described herein refer to the VP1 form of the capsid polypeptide. However, it will be appreciated that these VP1 sequences also define the sequences of the VP2 form and the VP3 form.

[0061] As used herein, the term “CpG” refers to a cytosine-guanine dinucleotide along a single strand of DNA, with the “p” representing the phosphate linkage between the two.

[0062] As used herein, the term “CpG island: refers to a region within a polynucleotide having a statistically elevated density of CpG dinucleotides. As used herein, a region of a polynucleotide (e.g., a polynucleotide encoding a codon-altered GAA protein) is a CpG island if, over a 200- base pair window: (i) the region has GC content of greater than 50%, and (ii) the ratio of observed CpG dinucleotides per expected CpG dinucleotides is at least 0.6, as defined by the relationship:

For additional information on methods for identifying CpG islands, see Gardiner-Garden M. et al., J Mol Biol., 196(2):261-82 (1987), the content of which is expressly incorporated herein by reference, in its entirety, for all purposes.

[0063] As used herein, the term "nucleic acid" refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form and complements thereof. The term encompasses nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. Examples of such analogs include, without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, and peptide-nucleic acids (PNAs). However, particularly useful embodiments herein, for use in gene therapy in patients, use phosphodiester bonds.

[0064] By “nucleic acid compositions” herein is meant any molecule or formulation of a molecule that includes a GAA polynucleotide, encoding a GAA polynucleotide. Included within the definition of nucleic acid compositions are GAA polynucleotides, aqueous solutions of GAA polynucleotides, viral particles encapsidating a GAA polynucleotide, and aqueous formulations of viral particles encapsidating a GAA polynucleotide. A nucleic acid composition, as disclosed herein, includes a codon-altered GAA gene, that encodes a GAA polypeptide.

[0065] The term “amino acid” refers to naturally occurring and non-natural amino acids, including amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids. Naturally occurring amino acids include those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, y- carboxyglutamate, and O-phosphoserine. Naturally occurring amino acids can include, e.g., D- and L-amino acids. As to amino acid sequences, one of ordinary skill in the art will recognize that individual substitutions, deletions or additions to a nucleic acid or peptide sequence that alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a “conservatively modified variant” where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles of the disclosure.

[0066] The terms “identical” or percent “identity,” in the context of two or more nucleic acids or peptide sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same (i.e., about 60% identity, preferably 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or higher identity over a specified region, when compared and aligned for maximum correspondence over a comparison window or designated region) as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and visual inspection. [0067] As is known in the art, a number of different programs may be used to identify whether a protein (or nucleic acid as discussed below) has sequence identity or similarity to a known sequence. Sequence identity and/or similarity is determined using standard techniques known in the art, including, but not limited to, the local sequence identity algorithm of Smith & Waterman, Adv. Appl. Math., 2:482 (1981), by the sequence identity alignment algorithm of Needleman & Wunsch, J. Mol. Biol., 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Natl. Acad. Sci. U.S.A., 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Drive, Madison, WI), the Best Fit sequence program described by Devereux et al., Nucl. Acid Res., 12:387-395 (1984), preferably using the default settings, or by inspection. Preferably, percent identity is calculated by FastDB based upon the following parameters: mismatch penalty of 1; gap penalty of 1; gap size penalty of 0.33; and joining penalty of 30, “Current Methods in Sequence Comparison and Analysis,” Macromolecule Sequencing and Synthesis, Selected Methods and Applications, pp 127-149 (1988), Alan R. Liss, Inc, all of which are incorporated by reference.

[0068] An example of a useful algorithm is PILEUP. PILEUP creates a multiple sequence alignment from a group of related sequences using progressive, pair wise alignments. It may also plot a tree showing the clustering relationships used to create the alignment. PILEUP uses a simplification of the progressive alignment method of Feng & Doolittle, J. Mol. Evol. 35:351- 360 (1987); the method is similar to that described by Higgins & Sharp CABIOS 5:151-153 (1989), both incorporated by reference. Useful PILEUP parameters including a default gap weight of 3.00, a default gap length weight of 0.10, and weighted end gaps.

[0069] Another example of a useful algorithm is the BLAST algorithm, described in: Altschul et al., J. Mol. Biol. 215, 403-410, (1990); Altschul et al., Nucleic Acids Res. 25:3389-3402 (1997); and Karlin et al., Proc. Natl. Acad. Sci. U.S.A. 90:5873-5787 (1993), both incorporated by reference. A particularly useful BLAST program is the WU-BLAST-2 program which was obtained from Altschul et al., Methods in Enzymology, 266:460-480 (1996); http://blast.wustl/edu/blast/ README.html]. WU-BLAST-2 uses several search parameters, most of which are set to the default values. The adjustable parameters are set with the following values: overlap span =1, overlap fraction = 0.125, word threshold (T) = 11. The HSP S and HSP S2 parameters are dynamic values and are established by the program itself depending upon the composition of the particular sequence and composition of the particular database against which the sequence of interest is being searched; however, the values may be adjusted to increase sensitivity.

[0070] An additional useful algorithm is gapped BLAST, as reported by Altschul et al., Nucl. Acids Res., 25:3389-3402, incorporated by reference. Gapped BLAST uses BLOSUM-62 substitution scores; threshold T parameter set to 9; the two-hit method to trigger ungapped extensions; charges gap lengths of k a cost of 10+k; Xu set to 16, and Xg set to 40 for database search stage and to 67 for the output stage of the algorithms. Gapped alignments are triggered by a score corresponding to ~22 bits.

[0071] A % amino acid sequence identity value is determined by the number of matching identical residues divided by the total number of residues of the “longer” sequence in the aligned region. The “longer” sequence is the one having the most actual residues in the aligned region (gaps introduced by WU-Blast-2 to maximize the alignment score are ignored). In a similar manner, “percent (%) nucleic acid sequence identity” with respect to the coding sequence of the polypeptides identified is defined as the percentage of nucleotide residues in a candidate sequence that are identical with the nucleotide residues in the coding sequence of the cell cycle protein. A preferred method utilizes the BLASTN module of WU-BLAST-2 set to the default parameters, with overlap span and overlap fraction set to 1 and 0.125, respectively.

[0072] The alignment may include the introduction of gaps in the sequences to be aligned. In addition, for sequences which contain either more or fewer amino acids than the protein encoded by the wild-type GAA sequence of (SEQ ID NO:2), it is understood that in one embodiment, the percentage of sequence identity will be determined based on the number of identical amino acids or nucleotides in relation to the total number of amino acids or nucleotides. Thus, for example, sequence identity of sequences shorter than SEQ ID NO:2, as discussed below, will be determined using the number of nucleotides in the shorter sequence, in one embodiment. In percent identity calculations relative weight is not assigned to various manifestations of sequence variation, such as, insertions, deletions, substitutions, etc.

[0073] In one embodiment, only identities are scored positively (+1) and all forms of sequence variation including gaps are assigned a value of “0”, which obviates the need for a weighted scale or parameters as described below for sequence similarity calculations. Percent sequence identity may be calculated, for example, by dividing the number of matching identical residues by the total number of residues of the “shorter” sequence in the aligned region and multiplying by 100. The “longer” sequence is the one having the most actual residues in the aligned region.

[0074] The term “allelic variants” refers to polymorphic forms of a gene at a particular genetic locus, as well as cDNAs derived from mRNA transcripts of the genes, and the polypeptides encoded by them. The term “preferred mammalian codon” refers a subset of codons from among the set of codons encoding an amino acid that are most frequently used in proteins expressed in mammalian cells as chosen from the following list: Gly (GGC, GGG); Glu (GAG); Asp (GAC); Vai (GTG, GTC); Ala (GCC, GCT); Ser (AGC, TCC); Lys (AAG); Asn (AAC); Met (ATG); He (ATC); Thr (ACC); Trp (TGG); Cys (TGC); Tyr (TAT, TAC); Leu (CTG); Phe (TTC); Arg (CGC, AGG, AGA); Gin (CAG); His (CAC); and Pro (CCC).

[0075] As used herein, the term “codon-altered” or “codon-optimized” refers to a polynucleotide sequence encoding a polypeptide (e.g., a GAA protein), where at least one codon of the native polynucleotide encoding the polypeptide has been changed to improve a property of the polynucleotide sequence. In some embodiments, the improved property promotes increased transcription of mRNA coding for the polypeptide, increased stability of the mRNA (e.g., improved mRNA half-life), increased translation of the polypeptide, and/or increased packaging of the polynucleotide within the vector. Non-limiting examples of alterations that can be used to achieve the improved properties include changing the usage and/or distribution of codons for particular amino acids, adjusting global and/or local GC content, removing AT-rich sequences, removing repeated sequence elements, adjusting global and/or local CpG dinucleotide content, removing cryptic regulatory elements (e.g., TATA box and CCAAT box elements), removing of intron/exon splice sites, improving regulatory sequences (e.g., introduction of a Kozak consensus sequence), and removing sequence elements capable of forming secondary structure (e.g., stemloops) in the transcribed mRNA.

[0076] As discussed herein, there are various nomenclatures to refer to components of the disclosure herein. “CO-number” (e.g., “CO1,” “C02,” “C03,”) refer to codon altered polynucleotides encoding GAA polypeptides and/or the encoded polypeptides, including variants. For example, C03-FL refers to the Full Length codon altered CO3 polynucleotide sequence or amino acid sequence (sometimes referred to herein as “CO3-WT-FL-AA” for the Amino Acid sequence and “CO3-FL-NA” for the Nucleic Acid sequence) encoded by the CO3 polynucleotide sequence. As will be appreciated by those in the art, for constructs such as CO1, CO2, CO3, etc., that are only codon-altered (e.g. they do not contain additional amino acid substitutions as compared to the GAA variant), the amino acid sequences will be identical, as the amino acid sequences are not altered by the codon optimization. Thus, sequence constructs of the disclosure include, but are not limited to, C01-FL-WT-NA, C01-FL-6-NA, COl-FL-6-dNA, CO2-FL-6-NA, CO2-FL-6-dNA, CO3-FL-6-NA, CO3-FL-6-dNA, C01-FL-MP-NA, C01-MP- 6-NA, C01-MP-6-dNA, CO2-MP-6-NA, CO2-MP-6-dNA, CO3-MP-6-dNA, and CO3-MP-6- dNA. It should be noted that all “CO” constructs herein encode or contain the GAA amino acid sequence, although included within the definition of CO constructs are those that encode or contain the human wild type GAA amino acid sequence.

[0077] As used herein, the term “muscle-specific expression” refers to the preferential or predominant in vivo expression of a particular gene (e.g., a codon-altered, transgenic GAA gene) in musculoskeletal tissue, as compared to in other tissues. In some embodiments, musclespecific expression means that at least 50% of all expression of the particular gene occurs within hepatic tissues of a subject. In other embodiments, muscle-specific expression means that at least 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% of all expression of the particular gene occurs within musculoskeletal tissues of a subject. Accordingly, a musclespecific regulatory element is a regulatory element that drives muscle-specific expression of a gene in musculoskeletal tissue.

[0078] The terms “about” and “approximately” include being within a statistically meaningful range of a value. Such a range can be within an order of magnitude, e.g., within 50%, within 20%, within 10%, and within 5% of a given value or range. The allowable variation encompassed by the term “about” or “approximately” depends on the particular system under study, and can be readily appreciated by one of ordinary skill in the art.

[0079] The use of "or" means "and/or" unless stated otherwise. The terms “comprising,” “having,” “including (as well as other forms such as “includes” or “included”),” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to”) unless otherwise noted. Terms such as "element" or "component" encompass both elements and components comprising one unit and elements and components that comprise more than one subunit unless specifically stated otherwise. [0080] Generally, nomenclatures used in connection with genetics and protein and nucleic acid chemistry, amplification, and hybridization, cell and tissue culture, molecular biology, virology, immunology, and microbiology described herein are those well-known and commonly used in the art. The methods and techniques provided herein are generally performed according to conventional methods well known in the art and as described in various general and more specific references that are cited and discussed throughout the present specification unless otherwise indicated. Enzymatic reactions and purification techniques are performed according to manufacturer's specifications, as commonly accomplished in the art, or as described herein. The nomenclatures used in connection with, and the laboratory procedures and techniques of, analytical chemistry, synthetic organic chemistry, and medicinal and pharmaceutical chemistry described herein are those well-known and commonly used in the art. Standard techniques are used for chemical syntheses, chemical analyses, pharmaceutical preparation, formulation, and delivery, and treatment of patients.

[0081] Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range and each endpoint, unless otherwise indicated herein, and each separate value and endpoint is incorporated into the specification as if it were individually recited herein.

[0082] The technology illustratively described herein suitably may be practiced in the absence of any element(s) not specifically disclosed herein.

[0083] The terms and expressions which have been employed are used as terms of description and not of limitation, and use of such terms and expressions do not exclude any equivalents of the features shown and described or portions thereof, and various modifications are possible within the scope of the technology claimed.

HI. Codon-Altered Polynucleotides encoding GAA

[0084] In some embodiments, the present disclosure provides codon-altered polynucleotides encoding a GAA polypeptide, e.g., a wild- type or variant GAA polypeptide. These codon- altered polynucleotides provide markedly improved expression of GAA glucosidase activity in vivo, as demonstrated in Example 3. Specifically, Applicants have achieved these advantages through the discovery of several codon-altered polynucleotide schemas, referred to herein as CO1, CO2, and CO3, for encoding a GAA polypeptide. Accordingly, in some embodiments, a codon-altered polynucleotide provided herein has a nucleotide sequence with high sequence identity to C01-FL-WT-NA (SEQ ID NO: 60), CO2-FL-WT-NA (SEQ ID NO: 62), or CO3-FL- WT-NA (SEQ ID NO:31) encoding a human GAA pre-pro-polypeptide.

[0085] The wild-type human GAA gene encodes a pre-pro-polypeptide having a 27 amino acid signal peptide (1-27 of SEQ ID NO:2) and an 42 amino acid pro-peptide (aa 28-69 of SEQ ID NO:2), which are cleaved from the encoded polypeptide prior to activation of GAA. As appreciated by those in the art, signal peptides and/or pro-peptides may be mutated, replaced by signal peptides and/or pro-peptides from other genes or other organisms, or completely removed, without affecting the sequence of the mature polypeptide left after the signal and pro-peptide are removed by cellular processing. Accordingly, in some embodiments, a codon-altered polynucleotide provided herein has a nucleotide sequence with high sequence identity to CO1- FL-MP-NA (SEQ ID NO: 60), CO2-FL-MP-NA (SEQ ID NO: 62), or CO3-FL-MP-NA (SEQ ID NO:31), e.g., where the wild-type human GAA signal peptide and/or propeptide has been modified or replaced with an alternative signal peptide and/or propeptide.

[0086] As further demonstrated herein, the improved expression of GAA glucosidase activity provided by the CO1, CO2, and CO3 polynucleotide sequences are further improved when placed in operable communication with a muscle-specific regulatory control element, such as Dph-CRE04_NA (SEQ ID NO: 48) or sk-SH4-NA (SEQ ID NO: 49). Accordingly, in some embodiments, the disclosure provides polynucleotides having a codon-altered GAA polynucleotide that is operably linked to a muscle-specific regulatory control element.

[0087] Furthermore, many GAA amino acid substitutions that improve the properties of GAA are known in the art. See, for example, U.S. Patent Application Publication No. 2021/0189365, the content of which is incorporated herein by reference in its entirety. Accordingly, in some embodiments, the disclosure provides codon-altered polynucleotides encoding a GAA polypeptide containing one or more known amino acid substitution.

[0088] Additionally, as described herein, several new GAA variant polypeptides having advantageous properties, e.g., improved specific activity, improved thermostability, and/or reduced immunogenicity, have been discovered, e.g., GAA variants 6-13. Accordingly, in some embodiments, the disclosure provides codon-altered polynucleotides encoding a GAA polypeptide containing one or more amino acid substitutions present in any one of GAA variants 6-13: T151I, L650G, L650S, L650T, L650E, L650Y, L650F, S676D, L678H, L678T, T700G, A719H, A758P, A820E, Q838K, L868F, L879E, R891H, Q902G, V921R, and S940A. In a specific embodiment, the codon-altered polynucleotide encodes a GAA polypeptide having any combination of the amino acid substitutions present in the GAA variant 6: T151I, L650G, S676D, and L678H.

[0089] The GC content of human genes varies widely, from less than 25% to greater than 90%. However, in general, human genes with higher GC contents are expressed at higher levels. For example, Kudla et al. (PLoS Biol., 4(6):80 (2006)) demonstrate that increasing a gene’s GC content increases expression of the encoded polypeptide, primarily by increasing transcription and effecting a higher steady state level of the mRNA transcript. Generally, the desired GC content of a codon-optimized gene construct is thought to be equal or greater than 60%. However, native AAV genomes have GC contents of around 56%.

[0090] Accordingly, in some embodiments, the codon-altered polynucleotides provided herein have a CG content that more closely matches the GC content of native AAV virions (e.g., around 56% GC), which is lower than the preferred CG contents of polynucleotides that are conventionally codon-optimized for expression in mammalian cells (e.g., at or above 60% GC). For example, CO1-FL-WT-NA (SEQ ID NO:60) has a GC content of about 63.7%, CO2-FL- WT-NA (SEQ ID NO:62) has a GC content of about 59.1%, and CO3-FL-WT-NA (SEQ ID NO:31) has a GC content of about 57.5%. These constructs should provide improved virion packaging as compared to similarly codon-altered sequences with higher GC content.

[0091] Thus, in some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide (e.g., a polynucleotide having high sequence identity to one of the CO1, CO2, CO3 GAA coding sequences) is no more than 60%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is no more than 59%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is no more than 58%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is no more than 57%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is no more than 56%. In some embodiments, the overall GC content of a codon- altered polynucleotide encoding a GAA polypeptide is no more than 55%. [0092] In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is from 55% to 60%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is from 56% to 60%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is from 57% to 60%. In some embodiments, the overall GC content of a codon- altered polynucleotide encoding a GAA polypeptide is from 58% to 60%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is from 59% to 60%. In some embodiments, the overall GC content of a codon- altered polynucleotide encoding a GAA polypeptide is from 55% to 59%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is from 56% to 59%. In some embodiments, the overall GC content of a codon- altered polynucleotide encoding a GAA polypeptide is from 57% to 59%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is from 58% to 59%. In some embodiments, the overall GC content of a codon- altered polynucleotide encoding a GAA polypeptide is from 55% to 58%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is from 56% to 58%. In some embodiments, the overall GC content of a codon- altered polynucleotide encoding a GAA polypeptide is from 57% to 58%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is from 55% to 57%. In some embodiments, the overall GC content of a codon- altered polynucleotide encoding a GAA polypeptide is from 56% to 57%.

[0093] In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is 57.5±0.5%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is 57.5±0.4%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is 57.5±0.3%. In some embodiments, the overall GC content of a codon-altered polynucleotide en-coding a GAA polypeptide is 57.5±0.2%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is 57.5±0.1%. In some embodiments, the overall GC content of a codon-altered polynucleotide encoding a GAA polypeptide is 57.5%. [0094] It has been theorized that these CpG dinucleotides (i.e., a cytosine nucleotide followed by a guanine nucleotide) induce immune responses via toll-like receptors, in vivo. Some evidence suggests that CpG-depleted AAV vectors evade immune detection in mice, under certain circumstances (Faust et al., J. Clin. Invest. 2013; 123, 2994-3001). The wild type GAA coding sequence (SEQ ID NO: 1) contains over 120 CpG dinucleotides.

[0095] Accordingly, in some embodiments, the codon-altered polynucleotides provided herein are codon-altered to reduce the number of CpG dinucleotides in the GAA coding sequence. For example, CO3-FL-WT-NA (SEQ ID NO:31) has no CpG dinucleotides, CO1-FL-WT-NA (SEQ ID NO:60) has no CpG dinucleotides, and CO2-FL-WT-NA (SEQ ID NO:62) has no CpG dinucleotides. These constructs should illicit lower immunogenic responses than the wild type GAA coding sequence and similarly codon-altered sequences with higher numbers of CpG dinucleotides.

[0096] Thus, in some embodiments, a sequence of a codon-altered polynucleotide encoding a GAA polypeptide (e.g., a polynucleotide having high sequence identity to one of the CO1, CO2, or CO3 GAA coding sequences) has less than 20 CpG dinucleotides. In some embodiments, a sequence of a codon-altered polynucleotide encoding a GAA polypeptide has less than 15 CpG dinucleotides. In some embodiments, a sequence of a codon-altered polynucleotide encoding a GAA polypeptide has less than 12 CpG dinucleotides. In some embodiments, a sequence of a codon-altered polynucleotide encoding a GAA polypeptide has less than 10 CpG dinucleotides. In some embodiments, a sequence of a codon-altered polynucleotide encoding a GAA polypeptide has less than 5 CpG dinucleotides. In some embodiments, a sequence of a codon- altered polynucleotide encoding a GAA polypeptide has less than 3 CpG dinucleotides. In some embodiments, a sequence of a codon-altered polynucleotide encoding a GAA polypeptide has no CpG dinucleotides. In some embodiments, sequence of a codon-altered polynucleotide encoding a GAA polypeptide has no more than 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, or no CpG dinucleotides.

CO1 Codon-Altered polynucleotides

[0097] In one embodiment, a nucleic acid composition provided herein includes a GAA polynucleotide (e.g., a codon-altered polynucleotide) encoding a GAA polypeptide, where the GAA polynucleotide includes a nucleotide sequence having high sequence identity to all or a portion of the CO1 codon-optimized sequence.

[0098] In some embodiments, the GAA polynucleotide includes a sequence having high sequence identity to the portion of the CO1 codon- optimized sequence that encodes for the mature GAA polypeptide. Accordingly, in some embodiments, the sequence of the codon- altered polynucleotide has at least 95% identity to C01-MP-WT-NA (SEQ ID NO:63). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 96% identity to C01-MP-WT-NA (SEQ ID NO:63). In a specific embodiment, the sequence of the codon- altered polynucleotide has at least 97% identity to C01-MP-WT-NA (SEQ ID NO:63). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 98% identity to C01-MP-WT-NA (SEQ ID NO:63). In a specific embodiment, the sequence of the codon- altered polynucleotide has at least 99% identity to C01-MP-WT-NA (SEQ ID NO:63). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.5% identity to C01-MP-WT-NA (SEQ ID NO:63). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.9% identity to C01-MP-WT-NA (SEQ ID NO:63). In another specific embodiment, the sequence of the codon-altered polynucleotide is CO1-MP- WT-NA (SEQ ID NO:63). When determining the sequence identity between a GAA polypeptide and the portion of the CO1 codon-optimized sequence that encodes for the mature GAA polypeptide, only the portions of the sequence encoding the mature polypeptide should be considered. That is, the GAA polynucleotide may also encode for a signal peptide, a propeptide, and/or a purification/detection tag, but the sequence comparison should not include these sequences.

[0099] In some embodiments, a GAA polynucleotide having high sequence identity to CO1-MP- WT-NA (SEQ ID NO:63) further includes a polynucleotide sequence encoding a GAA signal peptide having the amino acid sequence of SP-WT-AA (SEQ ID NO:43). In some embodiments, the GAA signal polynucleotide has a nucleic acid sequence that is at least 90%, 95%, 96%, 97%, 98%, or 100% identical to CO1-SP-WT-NA (SEQ ID NO: 70).

[00100] In some embodiments, a GAA polynucleotide having high sequence identity to CO1-MP- WT-NA (SEQ ID NO:63) further includes a polynucleotide sequence encoding a GAA pro-peptide having the amino acid sequence of PP-WT-AA (SEQ ID NO:39). In some embodiments, the GAA pro-peptide polynucleotide has a nucleic acid sequence that is at least 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to CO1-PP-WT-NA (SEQ ID NO: 71).

[00101] In some embodiments, the GAA polynucleotide includes a sequence having high sequence identity to the entirety of the CO1 codon-optimized sequence, encoding for the GAA pre-pro-polypeptide. Accordingly, in some embodiments, the sequence of the codon-altered polynucleotide has at least 95% identity to CO1-FL-WT-NA (SEQ ID NO:60). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 96% identity to CO1- FL-WT-NA (SEQ ID NO: 60). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 97% identity to CO1-FL-WT-NA (SEQ ID NO:60). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 98% identity to CO1- FL-WT-NA (SEQ ID NO: 60). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99% identity to CO1-FL-WT-NA (SEQ ID NO:60). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.5% identity to CO1-FL-WT-NA (SEQ ID NO:60). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.9% identity to CO1-FL-WT-NA (SEQ ID NO:60). In another specific embodiment, the sequence of the codon-altered polynucleotide is CO1-FL-WT-NA (SEQ ID NO: 60).

[00102] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO1 codon-optimized sequence includes an amino acid sequencing having high sequence identity to the human wild-type mature GAA polypeptide (MP-WT-AA; SEQ ID NO:35). Accordingly, in some embodiments, the encoded GAA polypeptide has a sequence that is at least 90% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 95% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 96% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 97% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 98% identical to MP-WT-AA (SEQ ID NO: 35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.5% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence identical to MP-WT-AA (SEQ ID NO:35). When determining the sequence identity between an encoded GAA polypeptide and the mature GAA polypeptide, only the portions of the sequence corresponding to the mature polypeptide should be considered. That is, the GAA polypeptide may also include a signal peptide, a pro-peptide, and/or a purification/ detection tag, but the sequence comparison should not include these sequences.

[00103] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO1 codon-optimized sequence includes an amino acid sequencing having high sequence identity to the human wild-type GAA pre-pro-polypeptide (FL-WT-AA; SEQ ID NO:2). Accordingly, in some embodiments, the encoded GAA polypeptide has a sequence that is at least 90% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 95% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 96% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 97% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 98% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.5% identical to FL-WT-AA (SEQ ID NO: 2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence identical to FL-WT-AA (SEQ ID NO:2). When determining the sequence identity between an encoded GAA polypeptide and the GAA pre-pro- polypeptide, only the portions of the sequence corresponding to the pre-pro-polypeptide should be considered. That is, the GAA polypeptide may also include a purification/detection tag, but the sequence comparison should not include these sequences.

[00104] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO1 codon-optimized sequence includes one or more known amino acid substitutions, e.g., one or more amino acid substitutions described in U.S. Patent Application Publication No. 2021/0189365, the content of which is incorporated herein by reference in its entirety. In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO1 codon-optimized sequence includes one or more amino acid substitutions present in one of GAA variants 6-13 described herein.

[00105] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO1 codon-optimized sequence includes one or more amino acid substitutions present in GAA variant 6: T151I, L650G, S676D, and L678H. In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO1 codon-optimized sequence includes all of the amino acid substitutions present in GAA variant 6 described herein.

[00106] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO1 codon-optimized sequence includes an amino acid sequencing having high sequence identity to the human variant 6 mature GAA polypeptide (MP- 6-AA; SEQ ID NO:37). Accordingly, in some embodiments, the encoded GAA polypeptide has a sequence that is at least 90% identical to MP-6-AA (SEQ ID NO:37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 95% identical to MP-6-AA (SEQ ID NO:37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 96% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 97% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 98% identical to MP- 6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.5% identical to MP-6-AA (SEQ ID NO:37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence identical to MP-6-AA (SEQ ID NO:37). When determining the sequence identity between an encoded GAA polypeptide and the mature GAA polypeptide, only the portions of the sequence corresponding to the mature polypeptide should be considered. That is, the GAA polypeptide may also include a signal peptide, a pro-peptide, and/or a purification/ detection tag, but the sequence comparison should not include these sequences. [00107] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO1 codon-optimized sequence includes an amino acid sequencing having high sequence identity to the human variant 6 GAA pre-pro-polypeptide (FL- 6-AA; SEQ ID NO: 14). Accordingly, in some embodiments, the encoded GAA polypeptide has a sequence that is at least 90% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 95% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 96% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 97% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 98% identical to FL- 6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.5% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence identical to FL-6-AA (SEQ ID NO: 14). When determining the sequence identity between an encoded GAA polypeptide and the GAA pre-pro-polypeptide, only the portions of the sequence corresponding to the pre-pro- polypeptide should be considered. That is, the GAA polypeptide may also include a purification/detection tag, but the sequence comparison should not include these sequences.

[00108] In some embodiments, the nucleotide sequence of the GAA polynucleotide having high sequence identity to a CO1 codon-optimized sequence (e.g., SEQ ID NO:60 or 63) has a reduced GC content, as compared to the wild-type GAA coding sequence SEQ ID NO: 1, as described above. Accordingly, in some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has a GC content of no more than 66%. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has a GC content of no more than 63.5%. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has a GC content of no more than 65%, no more than 64%, no more than 63%, no more than 62%, or no more than 61%.

[00109] In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon-optimized sequence has a GC content of from 61% to 66%. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon-optimized sequence has a GC content of from 62% to 66%, from 63% to 66%, from 64% to 66%, from 65% to 66%, from 61% to 65%, from 62% to 65%, from 63% to 65%, from 64% to 65%, from 61% to 64%, from 62% to 64%, from 63% to 64%, from 61% to 63%, from 62% to 63%, or from 61% to 62%.

[00110] In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon-optimized sequence has a GC content of 63.5%±1.0. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon-optimized sequence has a GC content of 63.5%±0.8. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has a GC content of 63.5%±0.6. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon-optimized sequence has a GC content of 63.5%±0.5. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has a GC content of 63.5%±0.4. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon-optimized sequence has a GC content of 63.5%±0.3. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon-optimized sequence has a GC content of 63.5%±0.2. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon-optimized sequence has a GC content of 63.5%±0.1. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has a GC content of 63.5%.

[00111] In some embodiments, the nucleotide sequence of the GAA polynucleotide having high sequence identity to a CO1 codon-optimized sequence (e.g., SEQ ID NO:60 or 63) has a reduced number of CpG dinucleotides, as compared to the wild-type GAA coding sequence SEQ ID NO: 1, as described above. Accordingly, in some embodiments, the sequence of the codon- altered polynucleotide having high sequence identity to a CO1 codon-optimized sequence has no more than 15 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has no more than 10 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has no more than 5 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has no more than 4 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has no more than 3 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has no more than 2 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has no more than 1 CpG dinucleotide. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO1 codon- optimized sequence has no CpG dinucleotides.

CO2 Codon-Altered polynucleotides

[00112] In one embodiment, a nucleic acid composition provided herein includes a GAA polynucleotide (e.g., a codon-altered polynucleotide) encoding a GAA polypeptide, where the GAA polynucleotide includes a nucleotide sequence having high sequence identity to all or a portion of the CO2 codon-optimized sequence.

[00113] In some embodiments, the GAA polynucleotide includes a sequence having high sequence identity to the portion of the CO2 codon- optimized sequence that encodes for the mature GAA polypeptide. Accordingly, in some embodiments, the sequence of the codon- altered polynucleotide has at least 95% identity to CO2-MP-WT-NA (SEQ ID NO: 64). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 96% identity to CO2-MP-WT-NA (SEQ ID NO:64). In a specific embodiment, the sequence of the codon- altered polynucleotide has at least 97% identity to CO2-MP-WT-NA (SEQ ID NO:64). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 98% identity to CO2-MP-WT-NA (SEQ ID NO:64). In a specific embodiment, the sequence of the codon- altered polynucleotide has at least 99% identity to CO2-MP-WT-NA (SEQ ID NO:64). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.5% identity to C02-MP-WT-NA (SEQ ID NO: 64). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.9% identity to C02-MP-WT-NA (SEQ ID NO:64). In another specific embodiment, the sequence of the codon-altered polynucleotide is C02-MP- WT-NA (SEQ ID NO: 64). When determining the sequence identity between a GAA polypeptide and the portion of the CO2 codon-optimized sequence that encodes for the mature GAA polypeptide, only the portions of the sequence encoding the mature polypeptide should be considered. That is, the GAA polynucleotide may also encode for a signal peptide, a propeptide, and/or a purification/detection tag, but the sequence comparison should not include these sequences.

[00114] In some embodiments, a GAA polynucleotide having high sequence identity to CO2-MP- WT-NA (SEQ ID NO:64) further includes a polynucleotide sequence encoding a GAA signal peptide having the amino acid sequence of SP-WT-AA (SEQ ID NO:43). In some embodiments, the GAA signal polynucleotide has a nucleic acid sequence that is at least 90%, 95%, 96%, 97%, 98%, or 100% identical to CO2-SP-WT-NA (SEQ ID NO: 73).

[00115] In some embodiments, a GAA polynucleotide having high sequence identity to CO2-MP- WT-NA (SEQ ID NO:64) further includes a polynucleotide sequence encoding a GAA pro-peptide having the amino acid sequence of PP-WT-AA (SEQ ID NO:39). In some embodiments, the GAA pro-peptide polynucleotide has a nucleic acid sequence that is at least 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to CO2-PP-WT-NA (SEQ ID NO:74). In some embodiments, the GAA propeptide polynucleotide has a nucleic acid sequence that is at least 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to CO2-PP-46-NA (SEQ ID NO:75).

[00116] In some embodiments, the GAA polynucleotide includes a sequence having high sequence identity to the entirety of the CO2 codon-optimized sequence, encoding for the GAA pre-pro-polypeptide. Accordingly, in some embodiments, the sequence of the codon-altered polynucleotide has at least 95% identity to CO2-FL-WT-NA (SEQ ID NO:62). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 96% identity to CO2- FL-WT-NA (SEQ ID NO: 62). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 97% identity to CO2-FL-WT-NA (SEQ ID NO:62). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 98% identity to CO2- FL-WT-NA (SEQ ID NO: 62). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99% identity to C02-FL-WT-NA (SEQ ID NO:62). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.5% identity to C02-FL-WT-NA (SEQ ID NO:62). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.9% identity to C02-FL-WT-NA (SEQ ID NO:62). In another specific embodiment, the sequence of the codon-altered polynucleotide is C02-FL-WT-NA (SEQ ID NO: 62).

[00117] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO2 codon-optimized sequence includes an amino acid sequencing having high sequence identity to the human wild-type mature GAA polypeptide (MP-WT-AA; SEQ ID NO:35). Accordingly, in some embodiments, the encoded GAA polypeptide has a sequence that is at least 90% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 95% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 96% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 97% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 98% identical to MP-WT-AA (SEQ ID NO: 35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.5% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence identical to MP-WT-AA (SEQ ID NO:35). When determining the sequence identity between an encoded GAA polypeptide and the mature GAA polypeptide, only the portions of the sequence corresponding to the mature polypeptide should be considered. That is, the GAA polypeptide may also include a signal peptide, a pro-peptide, and/or a purifi cation/ detection tag, but the sequence comparison should not include these sequences.

[00118] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO2 codon-optimized sequence includes an amino acid sequencing having high sequence identity to the human wild-type GAA pre-pro-polypeptide (FL-WT-AA; SEQ ID N0:2). Accordingly, in some embodiments, the encoded GAA polypeptide has a sequence that is at least 90% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 95% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 96% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 97% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 98% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.5% identical to FL-WT-AA (SEQ ID NO: 2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence identical to FL-WT-AA (SEQ ID NO:2). When determining the sequence identity between an encoded GAA polypeptide and the GAA pre-pro-polypeptide, only the portions of the sequence corresponding to the pre-pro-polypeptide should be considered. That is, the GAA polypeptide may also include a purification/detection tag, but the sequence comparison should not include these sequences.

[00119] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO2 codon-optimized sequence includes one or more known amino acid substitutions, e.g., one or more amino acid substitutions described in U.S. Patent Application Publication No. 2021/0189365, the content of which is incorporated herein by reference in its entirety. In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO2 codon-optimized sequence includes one or more amino acid substitutions present in one of GAA variants 1-5 described herein. In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO2 codon-optimized sequence includes one or more amino acid substitutions present in one of GAA variants 6-13 described herein.

[00120] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO2 codon-optimized sequence includes one or more amino acid substitutions present in GAA variant 6: T151I, L650G, S676D, and L678H. In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO1 codon-optimized sequence includes all of the amino acid substitutions present in GAA variant 6 described herein.

[00121] In some embodiments, a GAA polynucleotide having high sequence identity to CO2-MP- 46-NA (SEQ ID NO: 68) further includes a polynucleotide sequence encoding a GAA pro-peptide having the amino acid sequence of PP-WT-AA (SEQ ID NO:39). In some embodiments, the GAA pro-peptide polynucleotide has a nucleic acid sequence that is at least 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to CO2-PP-WT-NA (SEQ ID NO: 74).

[00122] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO2 codon-optimized sequence includes an amino acid sequencing having high sequence identity to the human variant 6 mature GAA polypeptide (MP- 6-AA; SEQ ID NO:37). Accordingly, in some embodiments, the encoded GAA polypeptide has a sequence that is at least 90% identical to MP-6-AA (SEQ ID NO:37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 95% identical to MP-6-AA (SEQ ID NO:37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 96% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 97% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 98% identical to MP- 6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.5% identical to MP-6-AA (SEQ ID NO:37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence identical to MP-6-AA (SEQ ID NO:37). When determining the sequence identity between an encoded GAA polypeptide and the mature GAA polypeptide, only the portions of the sequence corresponding to the mature polypeptide should be considered. That is, the GAA polypeptide may also include a signal peptide, a pro-peptide, and/or a purification/ detection tag, but the sequence comparison should not include these sequences. [00123] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO2 codon-optimized sequence includes an amino acid sequencing having high sequence identity to the human variant 6 GAA pre-pro-polypeptide (FL- 6-AA; SEQ ID NO: 14). Accordingly, in some embodiments, the encoded GAA polypeptide has a sequence that is at least 90% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 95% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 96% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 97% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 98% identical to FL- 6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.5% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence identical to FL-6-AA (SEQ ID NO: 14). When determining the sequence identity between an encoded GAA polypeptide and the GAA pre-pro-polypeptide, only the portions of the sequence corresponding to the pre-pro- polypeptide should be considered. That is, the GAA polypeptide may also include a purification/detection tag, but the sequence comparison should not include these sequences.

[00124] In some embodiments, the nucleotide sequence of the GAA polynucleotide having high sequence identity to a CO2 codon-optimized sequence (e.g., SEQ ID NO:62 or 64) has a reduced GC content, as compared to the wild-type GAA coding sequence SEQ ID NO: 1, as described above. Accordingly, in some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has a GC content of no more than 61.5%. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has a GC content of no more than 59%. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has a GC content of no more than 60.5%, no more than 59.5%, no more than 58.5%, no more than 57.5%, or no more than 56.5%.

[00125] In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon-optimized sequence has a GC content of from 56.5% to 61.5%. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon-optimized sequence has a GC content of from 57.5% to 61.5%, from 58.5% to 61.5%, from 59.5% to 61.5%, from 60.5% to 61.5%, from 56.5% to 60.5%, from 57.5% to 60.5%, from 58.5% to 60.5%, from 59.5% to 60.5%, from 56.5% to 59.5%, from 57.5% to 59.5%, from 58.5% to 59.5%, from 56.5% to 58.5%, from 57.5% to 58.5%, or from 56.5% to 57.5%.

[00126] In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon-optimized sequence has a GC content of 59%±1.0. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon-optimized sequence has a GC content of 59%±0.8. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has a GC content of 59%±0.6. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has a GC content of 59%±0.5. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has a GC content of 59%±0.4. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon-optimized sequence has a GC content of 59%±0.3. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon-optimized sequence has a GC content of 59%±0.2. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon-optimized sequence has a GC content of 59%±0.1. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has a GC content of 59%.

[00127] In some embodiments, the nucleotide sequence of the GAA polynucleotide having high sequence identity to a CO2 codon-optimized sequence (e.g., SEQ ID NO:62 or 64) has a reduced number of CpG dinucleotides, as compared to the wild-type GAA coding sequence SEQ ID NO: 1, as described above. Accordingly, in some embodiments, the sequence of the codon- altered polynucleotide having high sequence identity to a CO2 codon-optimized sequence has no more than 15 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has no more than 10 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has no more than 5 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has no more than 4 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has no more than 3 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has no more than 2 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has no more than 1 CpG dinucleotide. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO2 codon- optimized sequence has no CpG dinucleotides.

CO3 Codon-Altered polynucleotides

[00128] In one embodiment, a nucleic acid composition provided herein includes a GAA polynucleotide (e.g., a codon-altered polynucleotide) encoding a GAA polypeptide, where the GAA polynucleotide includes a nucleotide sequence having high sequence identity to all or a portion of the CO3 codon-optimized sequence.

[00129] In some embodiments, the GAA polynucleotide includes a sequence having high sequence identity to the portion of the CO3 codon- optimized sequence that encodes for the mature GAA polypeptide. Accordingly, in some embodiments, the sequence of the codon- altered polynucleotide has at least 95% identity to CO3-MP-WT-NA (SEQ ID NO:34). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 96% identity to CO3-MP-WT-NA (SEQ ID NO:34). In a specific embodiment, the sequence of the codon- altered polynucleotide has at least 97% identity to CO3-MP-WT-NA (SEQ ID NO:34). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 98% identity to CO3-MP-WT-NA (SEQ ID NO:34). In a specific embodiment, the sequence of the codon- altered polynucleotide has at least 99% identity to C03-MP-WT-NA (SEQ ID NO:34). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.5% identity to C03-MP-WT-NA (SEQ ID NO:34). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.9% identity to C03-MP-WT-NA (SEQ ID NO:34). In another specific embodiment, the sequence of the codon-altered polynucleotide is C03-MP- WT-NA (SEQ ID NO: 34). When determining the sequence identity between a GAA polypeptide and the portion of the CO3 codon-optimized sequence that encodes for the mature GAA polypeptide, only the portions of the sequence encoding the mature polypeptide should be considered. That is, the GAA polynucleotide may also encode for a signal peptide, a propeptide, and/or a purification/detection tag, but the sequence comparison should not include these sequences.

[00130] In some embodiments, a GAA polynucleotide having high sequence identity to CO3-MP- WT-NA (SEQ ID NO:34) further includes a polynucleotide sequence encoding a GAA signal peptide having the amino acid sequence of SP-WT-AA (SEQ ID NO:43). In some embodiments, the GAA signal polynucleotide has a nucleic acid sequence that is at least 90%, 95%, 96%, 97%, 98%, or 100% identical to CO3-SP-WT-NA (SEQ ID NO:42).

[00131] In some embodiments, a GAA polynucleotide having high sequence identity to CO3-MP- WT-NA (SEQ ID NO:34) further includes a polynucleotide sequence encoding a GAA pro-peptide having the amino acid sequence of PP-WT-AA (SEQ ID NO:39). In some embodiments, the GAA pro-peptide polynucleotide has a nucleic acid sequence that is at least 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to CO3-PP-WT-NA (SEQ ID NO:38). In some embodiments, the GAA propeptide polynucleotide has a nucleic acid sequence that is at least 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to CO3-PP-46-NA (SEQ ID NO:40).

[00132] In some embodiments, the GAA polynucleotide includes a sequence having high sequence identity to the entirety of the CO3 codon-optimized sequence, encoding for the GAA pre-pro-polypeptide. Accordingly, in some embodiments, the sequence of the codon-altered polynucleotide has at least 95% identity to CO3-FL-WT-NA (SEQ ID NO:31). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 96% identity to COS- FL- WT-NA (SEQ ID NO: 31). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 97% identity to CO3-FL-WT-NA (SEQ ID NO:31). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 98% identity to CO3- FL-WT-NA (SEQ ID NO: 31). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99% identity to C03-FL-WT-NA (SEQ ID NO:31). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.5% identity to C03-FL-WT-NA (SEQ ID NO:31). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.9% identity to C03-FL-WT-NA (SEQ ID NO:31). In another specific embodiment, the sequence of the codon-altered polynucleotide is CO3 -FL-WT-NA (SEQ ID NO:31).

[00133] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO3 codon-optimized sequence includes an amino acid sequencing having high sequence identity to the human wild-type mature GAA polypeptide (MP-WT-AA; SEQ ID NO:35). Accordingly, in some embodiments, the encoded GAA polypeptide has a sequence that is at least 90% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 95% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 96% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 97% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 98% identical to MP-WT-AA (SEQ ID NO: 35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.5% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to MP-WT-AA (SEQ ID NO:35). In some embodiments, the encoded GAA polypeptide has a sequence identical to MP-WT-AA (SEQ ID NO:35). When determining the sequence identity between an encoded GAA polypeptide and the mature GAA polypeptide, only the portions of the sequence corresponding to the mature polypeptide should be considered. That is, the GAA polypeptide may also include a signal peptide, a pro-peptide, and/or a purifi cation/ detection tag, but the sequence comparison should not include these sequences.

[00134] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO3 codon-optimized sequence includes an amino acid sequencing having high sequence identity to the human wild-type GAA pre-pro-polypeptide (FL-WT-AA; SEQ ID N0:2). Accordingly, in some embodiments, the encoded GAA polypeptide has a sequence that is at least 90% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 95% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 96% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 97% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 98% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.5% identical to FL-WT-AA (SEQ ID NO: 2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to FL-WT-AA (SEQ ID NO:2). In some embodiments, the encoded GAA polypeptide has a sequence identical to FL-WT-AA (SEQ ID NO:2). When determining the sequence identity between an encoded GAA polypeptide and the GAA pre-pro-polypeptide, only the portions of the sequence corresponding to the pre-pro-polypeptide should be considered. That is, the GAA polypeptide may also include a purification/detection tag, but the sequence comparison should not include these sequences.

[00135] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO3 codon-optimized sequence includes one or more known amino acid substitutions, e.g., one or more amino acid substitutions described in U.S. Patent Application Publication No. 2021/0189365, the content of which is incorporated herein by reference in its entirety. In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO3 codon-optimized sequence includes one or more amino acid substitutions present in one of GAA variants 1-5 described herein. In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO3 codon-optimized sequence includes one or more amino acid substitutions present in one of GAA variants 6-13 described herein. [00136] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO3 codon-optimized sequence includes one or more amino acid substitutions present in GAA variant 6: T151I, L650G, S676D, and L678H. In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO3 codon-optimized sequence includes all of the amino acid substitutions present in GAA variant 6 described herein.

[00137] Accordingly, in some embodiments, the sequence of the codon-altered polynucleotide has at least 95% identity to the portion of a codon- optimized GAA polynucleotide encoding the mature polypeptide of the GAA variant 6 (CO3-MP-6-dNA; SEQ ID NO:36). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 96% identity to CO3- MP-6-dNA (SEQ ID NO: 36). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 97% identity to CO3-MP-6-dNA (SEQ ID NO:36). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 98% identity to CO3- MP-6-dNA (SEQ ID NO: 36). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99% identity to CO3-MP-6-dNA (SEQ ID NO:36). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.5% identity to CO3-MP-6-dNA (SEQ ID NO:36). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.9% identity to CO3-MP-6-dNA (SEQ ID NO:36). In another specific embodiment, the sequence of the codon-altered polynucleotide is CO3-MP-6-dNA (SEQ ID NO: 36). When determining the sequence identity between a GAA polypeptide and the portion of the CO3 codon- optimized sequence that encodes for the mature GAA polypeptide, only the portions of the sequence encoding the mature polypeptide should be considered. That is, the GAA polypeptide may also encode for a signal peptide, a pro-peptide, and/or a purification/detection tag, but the sequence comparison should not include these sequences.

[00138] In some embodiments, a GAA polynucleotide having high sequence identity to CO3-MP- 6-dNA (SEQ ID NO: 36) further includes a polynucleotide sequence encoding a GAA signal peptide having the amino acid sequence of SP-WT-AA (SEQ ID NO:43). In some embodiments, the GAA signal polynucleotide has a nucleic acid sequence that is at least 90%, 95%, 96%, 97%, 98%, or 100% identical to CO3-SP-WT-NA (SEQ ID NO:42). [00139] In some embodiments, a GAA polynucleotide having high sequence identity to CO3-MP- 6-dNA (SEQ ID NO: 36) further includes a polynucleotide sequence encoding a GAA pro-peptide having the amino acid sequence of PP-WT-AA (SEQ ID NO:39). In some embodiments, the GAA pro-peptide polynucleotide has a nucleic acid sequence that is at least 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to CO3-PP-WT-NA (SEQ ID NO:38).

[00140] In some embodiments, the GAA polynucleotide includes a sequence having high sequence identity to the entirety of a codon-optimized GAA polynucleotide encoding the variant 6 GAA pre-pro-polypeptide. Accordingly, in some embodiments, the sequence of the codon- altered polynucleotide has at least 95% identity to CO3-FL-6-dNA (SEQ ID NO:60). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 96% identity to CO3-FL-6-dNA (SEQ ID NO: 60). In a specific embodiment, the sequence of the codon- altered polynucleotide has at least 97% identity to CO3-FL-6-dNA (SEQ ID NO: 60). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 98% identity to CO3-FL-6-dNA (SEQ ID NO: 60). In a specific embodiment, the sequence of the codon- altered polynucleotide has at least 99% identity to CO3-FL-6-dNA (SEQ ID NO: 60). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.5% identity to CO3-FL-6-dNA (SEQ ID NO: 60). In a specific embodiment, the sequence of the codon-altered polynucleotide has at least 99.9% identity to CO3-FL-6-dNA (SEQ ID NO: 60). In another specific embodiment, the sequence of the codon-altered polynucleotide is CO3-FL-6- dNA (SEQ ID NO: 60).

[00141] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO3 codon-optimized sequence includes an amino acid sequencing having high sequence identity to the human variant 6 mature GAA polypeptide (MP- 6-AA; SEQ ID NO:37). Accordingly, in some embodiments, the encoded GAA polypeptide has a sequence that is at least 90% identical to MP-6-AA (SEQ ID NO:37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 95% identical to MP-6-AA (SEQ ID NO:37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 96% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 97% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 98% identical to MP- 6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.5% identical to MP-6-AA (SEQ ID NO:37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the encoded GAA polypeptide has a sequence identical to MP-6-AA (SEQ ID NO:37). When determining the sequence identity between an encoded GAA polypeptide and the mature GAA polypeptide, only the portions of the sequence corresponding to the mature polypeptide should be considered. That is, the GAA polypeptide may also include a signal peptide, a pro-peptide, and/or a purification/ detection tag, but the sequence comparison should not include these sequences.

[00142] In some embodiments, the GAA polypeptide encoded by a GAA polynucleotide having high sequence identity to the CO3 codon-optimized sequence includes an amino acid sequencing having high sequence identity to the human variant 6 GAA pre-pro-polypeptide (FL- 6-AA; SEQ ID NO: 14). Accordingly, in some embodiments, the encoded GAA polypeptide has a sequence that is at least 90% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 95% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 96% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 97% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 98% identical to FL- 6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence that is at least 99.8% identical to FL-6-AA (SEQ ID NO: 14). In some embodiments, the encoded GAA polypeptide has a sequence identical to FL-6-AA (SEQ ID NO: 14). When determining the sequence identity between an encoded GAA polypeptide and the GAA pre-pro-polypeptide, only the portions of the sequence corresponding to the pre-pro-polypeptide should be considered. That is, the GAA polypeptide may also include a purification/detection tag, but the sequence comparison should not include these sequences. [00143] In some embodiments, the nucleotide sequence of the GAA polynucleotide having high sequence identity to a CO3 codon-optimized sequence (e.g., SEQ ID NO:31 or 34) has a reduced GC content, as compared to the wild-type GAA coding sequence SEQ ID NO: 1, as described above. Accordingly, in some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has a GC content of no more than 60%. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has a GC content of no more than 57.5%. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has a GC content of no more than 59%, no more than 58%, no more than 57%, no more than 56%, or no more than 55%.

[00144] In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon-optimized sequence has a GC content of from 55% to 60%. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon-optimized sequence has a GC content of from 56% to 60%, from 57% to 60%, from 58% to 60%, from 59% to 60%, from 55% to 59%, from 56% to 59%, from 57% to 59%, from 58% to 59%, from 55% to 58%, from 56% to 58%, from 57% to 58%, from 55% to 57%, from 56% to 57%, or from 55% to 56%.

[00145] In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon-optimized sequence has a GC content of 57.5%±1.0. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon-optimized sequence has a GC content of 57.5%±0.8. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has a GC content of 57.5%±0.6. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon-optimized sequence has a GC content of 57.5%±0.5. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has a GC content of 57.5%±0.4. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon-optimized sequence has a GC content of 57.5%±0.3. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon-optimized sequence has a GC content of 57.5%±0.2. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon-optimized sequence has a GC content of 57.5%±0.1. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has a GC content of 57.5%.

[00146] In some embodiments, the nucleotide sequence of the GAA polynucleotide having high sequence identity to a CO3 codon-optimized sequence (e.g., SEQ ID NO:31 or 34) has a reduced number of CpG dinucleotides, as compared to the wild-type GAA coding sequence SEQ ID NO: 1, as described above. Accordingly, in some embodiments, the sequence of the codon- altered polynucleotide having high sequence identity to a CO3 codon-optimized sequence has no more than 15 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has no more than 10 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has no more than 5 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has no more than 4 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has no more than 3 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has no more than 2 CpG dinucleotides. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has no more than 1 CpG dinucleotide. In some embodiments, the sequence of the codon-altered polynucleotide having high sequence identity to a CO3 codon- optimized sequence has no CpG dinucleotides.

V. GAA Expression Vectors

[00147] In one aspect, the disclosure provides expression cassettes for expressing a GAA polynucleotide as disclosed herein, e.g., a codon-altered GAA polynucleotide. In some embodiments, an expression cassette comprises one or more nucleic acids encoding a GAA protein and at least one regulatory nucleic acid sequence operably linked to the sequence encoding the GAA protein. In some embodiments, the at least one regulatory nucleic acid sequence is selected from the group consisting of a promoter, an enhancer, an intron, a post- transcriptional regulatory element, an inverted terminal repeat (ITR), a polyadenylation (poly A) sequence, and a combination thereof.

[00148] In some embodiments, the at least one regulatory nucleic acid sequence comprises a promoter. In some embodiments, the promoter is a muscle-specific promoter. In some embodiments, the muscle-specific promoter comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to SPc512_NA (SEQ ID NO:46). In some embodiments, the muscle-specific promoter comprises the polynucleotide sequence of SPc512_NA (SEQ ID NO:46). In some embodiments, the muscle-specific promoter comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to HsDesmin NA (SEQ ID NO:47). In some embodiments, the muscle-specific promoter comprises the polynucleotide sequence of HsDesmin NA (SEQ ID NO:47).

[00149] In some embodiments, the at least one regulatory nucleic acid sequence comprises an enhancer. In some embodiments, the enhancer is a muscle-specific enhancer. In some embodiments, the muscle-specific enhancer comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to Dph-CRE04_NA (SEQ ID NO:48). In some embodiments, the muscle-specific enhancer comprises the polynucleotide sequence of Dph-CRE04_NA (SEQ ID NO:48). In some embodiments, the muscle-specific enhancer comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to sk-SH4_NA (SEQ ID NO:49). In some embodiments, the muscle-specific enhancer comprises the polynucleotide sequence of sk-SH4_NA (SEQ ID NO:49).

[00150] In some embodiments, the at least one regulatory nucleic acid sequence comprises an intron. In some embodiments, the intron comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to MVM_NA (SEQ ID NO: 50). In some embodiments, the intron comprises the polynucleotide sequence of MVM NA (SEQ ID NO: 50).

[00151] In some embodiments, the codon-altered polynucleotides and associated expression cassettes described herein are integrated into expression vectors. As will be appreciated by one of skill in the art, many forms of vectors can be used to effectuate GAA gene therapy using the codon-altered GAA polynucleotide sequences disclosed herein. Non-limiting examples of expression vectors include viral vectors (e.g., vectors suitable for gene therapy), plasmid vectors, bacteriophage vectors, cosmids, phagemids, artificial chromosomes, and the like.

[00152] In one embodiment, the gene therapy vector is an adeno-associated virus (AAV) based gene therapy vector. AAV systems have been described previously and are generally well known in the art (Kelleher and Vos, Biotechniques, 17(6): 1110-17 (1994); Cotten et al., P.N.A.S. U.S.A., 89(13):6094-98 (1992); Curiel, Nat lmmun, 13(2-3): 141-64 (1994); Muzyczka, Curr Top Microbiol Immunol, 158:97-129 (1992); and Asokan A, et al., Mol. Ther., 20(4):699-708 (2012), each incorporated herein by reference in their entireties for all purposes). Details concerning the generation and use of rAAV vectors are described, for example, in U.S. Patent Nos. 5,139,941 and 4,797,368, each incorporated herein by reference in their entireties for all purposes.

[00153] In some embodiments, the expression cassette is a mammalian expression vector. In some embodiments, the mammalian expression vector comprises an adeno-associated virus (AAV) vector. In some embodiments, the AAV vector comprises an AAV8 or AAV9 capsid polypeptide encapsidating the expression cassette. In some embodiments, the AAV vector comprises an engineered capsid polypeptide encapsidating the expression cassette.

[00154] In some embodiments, the codon-altered polynucleotides described herein are integrated into a viral gene therapy vector. Non-limiting examples of viral vectors include: retrovirus, e.g., Moloney murine leukemia virus (MMLV), Harvey murine sarcoma virus, murine mammary tumor virus, and Rous sarcoma virus; adenoviruses, adeno-associated viruses; SV40- type viruses; polyomaviruses; Epstein-Barr viruses; papilloma viruses; herpes viruses; vaccinia viruses; and polio viruses.

[00155] In some embodiments, the gene therapy vector is a retrovirus, and particularly a replication-deficient retrovirus. Protocols for the production of replication-deficient retroviruses are known in the art. For review, see Kriegler, M., Gene Transfer and Expression, A Laboratory Manual, W.H. Freeman Co., New York (1990) and Murry, E. J., Methods in Molecular Biology, Vol. 7, Humana Press, Inc., Cliffton, N.J. (1991).

[00156] In some embodiments, the codon-altered polynucleotides described herein are integrated into a retroviral expression vector. These systems have been described previously, and are generally well known in the art (Mann et al., Cell, 33: 153-159, 1983; Nicolas and Rubinstein, In: Vectors: A survey of molecular cloning vectors and their uses, Rodriguez and Denhardt, eds., Stoneham: Butterworth, pp. 494-513, 1988; Temin, In: Gene Transfer, Kucherlapati (ed.), New York: Plenum Press, pp. 149-188, 1986). In a specific embodiment, the retroviral vector is a lentiviral vector (see, for example, Naldini et al., Science, 272(5259): 263- 267, 1996; Zufferey etal., Nat Biotechnol, 15(9):871-875, 1997; Blomer etal., J Virol., 71(9): 6641-6649, 1997; U.S. Pat. Nos. 6,013,516 and 5,994,136).

[00157] In some embodiments, the codon-altered polynucleotides described herein can be administered to a subject by a non-viral method. For example, naked DNA can be administered into a cell by electroporation, sonoporation, particle bombarment, or hydrodyamic delivery. DNA can also be encapsulated or coupled with polymers, e.g., liposomes, polysomes, polypleses, dendrimers, and administered to the subject as a complex. Likewise, DNA can be coupled to inorganic nanoparticles, e.g., gold, silica, iron oxide, or calcium phosphate particles, or attached to cell-penetrating peptides for delivery to cells in vivo.

[00158] Codon-altered GAA coding polynucleotides can also be incorporated into artificial chromosomes, such as Artificial Chromosome Expression (ACEs) (see, e.g., Lindenbaum et al., Nucleic Acids Res., 32(21):el72 (2004)) and mammalian artificial chromosomes (MACs). For review see, e.g., Perez-Luz and Diaz-Nido, J Biomed Biotechnol. 2010: Article ID 642804 (2010).

[00159] A wide variety of vectors can be used for the expression of a GAA polypeptide from a codon-altered polypeptide in cell culture, including eukaryotic and prokaryotic expression vectors. In certain embodiments, a plasmid vector is contemplated for use in expressing a GAA polypeptide in cell culture. In general, plasmid vectors containing replicon and control sequences which are derived from species compatible with the host cell are used in connection with these hosts. The vector can carry a replication site, as well as marking sequences which are capable of providing phenotypic selection in transformed cells. The plasmid will include the codon-altered polynucleotide encoding the GAA polypeptide, operably linked to one or more control sequences, for example, a promoter.

[00160] Non-limiting examples of vectors for prokaryotic expression include plasmids such as pRSET, pET, pBAD, etc., wherein the promoters used in prokaryotic expression vectors include lac, trc, trp, recA, araBAD, etc. Examples of vectors for eukaryotic expression include: (i) for expression in yeast, vectors such as pAO, pPIC, pYES, pMET, using promoters such as A0X1, GAP, GALI, AUG1, etc; (ii) for expression in insect cells, vectors such as pMT, pAc5, pIB, pMIB, pBAC, etc., using promoters such as PH, plO, MT, Ac5, OpIE2, gp64, polh, etc., and (iii) for expression in mammalian cells, vectors such as pSVL, pCMV, pRc/RSV, pcDNA3, pBPV, etc., and vectors derived from viral systems such as vaccinia virus, adeno-associated viruses, herpes viruses, retroviruses, etc., using promoters such as CMV, SV40, EF-1, UbC, RSV, ADV, BPV, and p-actin.

[00161] In some embodiments, the disclosure provides an AAV gene therapy vector that includes a codon-altered GAA polynucleotide, as described herein, internal terminal repeat (ITR) sequences on the 5’ and 3’ ends of the vector, one or more promoter and/or enhancer sequences operably linked to the GAA polynucleotide, and a poly-adenylation signal following the 3 ’ end of the GAA polynucleotide sequence. In some embodiments, the one or more promoter and/or enhancer sequences include one or more copies of a muscle-specific regulatory control element.

[00162] The codon-altered GAA polynucleotides and viral vectors described herein (e.g., the nucleic acid compositions) are produced according to conventional methods for nucleic acid amplification and vector production. Two predominant platforms have developed for large-scale production of recombinant AAV vectors. The first platform is based on replication in mammalian cells, while the second is based on replication in invertebrate cells. For review, see, Kotin R.M., Hum. Mol. Genet., 20(Rl):R2-6 (2011), the content of which is expressly incorporated herein by reference, in its entirety, for all purposes.

[00163] Accordingly, the disclosure provides methods for producing an adeno-associated virus (AAV) particle. In some embodiments, the methods include introducing a codon-altered GAA polynucleotide construct having high nucleotide sequence identity (e.g., at least 95%, 96%, 97%, 98%, 99%, 99.5%, 99.9%, or 100%) to one of a CO1, CO2, or CO3 sequence, as described herein, into a host cell where the polynucleotide construct is competent for replication in the host cell.

[00164] In some embodiments, the host cell is a mammalian host cell e.g., an HEK, CHO, or BHK cell. In a specific embodiment, the host cell is an HEK 293 cell. In some embodiments, the host cell is an invertebrate cell, e.g., an insect cell. In a specific embodiment, the host cell is an SF9 cell. [00165] In some embodiments, the present disclosure provides expression constructs such as helper plasmids (e.g., non- AAV expression constructs) comprising a nucleic acid that encodes one or more of the AAV capsid polypeptides described herein. Such plasmids are useful as expression constructs for producing AAV capsid polypeptides or proteins or to transfect cells (e.g., as part of a triple transfection) in the preparation of engineered AAV vectors. Alternatively, AAV vectors could be produced using herpes virus, baculovirus, stable genetically engineered cell lines, or any other method known in the art (Dobrowsky et al. (2021) Curr. Opinion Biomed. Engin. 20: 100353, the disclosure of which is hereby incorporated herein by reference in its entirety).

[00166] In some embodiments, the capsid helper plasmid may comprise one or more nucleic acid sequences to regulate expression of the AAV capsid polypeptide. The sequences include but are not limited to, a promoter, an enhancer, an intron, a post-transcriptional regulatory sequence, a polyadenylation (poly A) signal, or any combination thereof, which are operably linked to the nucleic acid sequences that encode the AAV capsid polypeptide.

[00167] The promoter may be a heterologous promoter, a tissue-specific promoter, a cellspecific promoter, a constitutive promoter, an inducible promoter, a hybrid promoter, or any combination thereof. In an embodiment, the capsid helper plasmid of the present disclosure comprises at least one promoter capable of expressing, or directed to primarily express, the nucleic acid segment in a suitable host cell (e.g., a muscle cell) into which the engineered capsid helper plasmid can be transfected. Exemplary promoters include, but are not limited to, a ubiquitous promoter, a CMV promoter, a 0-actin promoter, a muscle-specific promoter, a Desmin promoter, an SPc5-12 promoter, an MCK-based promoter an insulin promoter, an enolase promoter, a BDNF promoter, an NGF promoter, an EGF promoter, a growth factor promoter, an axon-specific promoter, a dendrite-specific promoter, a brain-specific promoter, a hippocampal-specific promoter, a kidney-specific promoter, a retinal- specific promoter, an elafin promoter, a cytokine promoter, an interferon promoter, a growth factor promoter, an al- antitrypsin promoter, a brain cell-specific promoter, a neural cell- specific promoter, a central nervous system cell-specific promoter, a peripheral nervous system cell-specific promoter, an interleukin promoter, a serpin promoter, a hybrid CMV promoter, a hybrid 0-actin promoter, an EFl promoter, a Ula promoter, a Ulb promoter, a Tet-inducible promoter, a VP1 6-Lex A promoter, or any combination thereof. In exemplary embodiments, the promoter may include a mammalian or avian 0-actin promoter.

[00168] Exemplary enhancer sequences include, but are not limited to, one or more selected from the group consisting of a CMV enhancer, a muscle-specific enhancer, a synthetic enhancer, a liver-specific enhancer, a vascular-specific enhancer, a brain-specific enhancer, a neural cellspecific enhancer, a lung-specific enhancer, a kidney-specific enhancer, a pancreas-specific enhancer, retinal-specific enhancer, and an islet cell-specific enhancer.

[00169] Exemplary post-transcriptional regulatory sequences include a woodchuck hepatitis post-transcription regulatory element (WPRE)), one or more ribosome entry sites (IRES), one or more polyadenylation (poly A) signal sequences, or any combination thereof. A polyA signal may be an artificial polyA. Examples of other suitable polyA sequences include, e.g., bovine growth hormone, SV40, rabbit beta globin, and TK polyA, amongst others.

[00170] In some embodiments, the capsid helper plasmid described herein may contain other appropriate transcription initiation, termination, and efficient RNA processing signals. Such sequences include splicing, inducible expression control elements, regulatory elements that enhance expression, sequences that stabilize cytoplasmic mRNA, sequences that enhance translation efficiency (e.g., Kozak consensus sequence), sequences that enhance protein stability, and when desired, sequences that enhance secretion of the encoded product. In one embodiment, a Kozak sequence is included.

VI. Variant GAA Polypeptides

[00171] In one aspect, the present disclosure provides GAA polypeptide variants that have advantageous properties relative to wild type GAA polypeptides. For example, as described in Example 4, a series of variant GAA polypeptides (GAA variants 6-13) were identified, several of which demonstrated significantly increased catalytic activity relative to the human wild-type GAA polypeptide. In particular, GAA variant 6 demonstrated approximately 5.5-fold higher activity than the human wild-type GAA polypeptide, as shown in Figure 5, and improved kinetic parameters, as shown in Figure 6A. As further shown in Figure 5, GAA variants 9 and 11-13 demonstrated higher catalytic activity than the human wild-type GAA polypeptide. The amino acid sequences for GAA variants 6-13, as well as non-codon altered polynucleotides encoding the same, are provided in Figures 1 IB-1 II. [00172] Accordingly, in one aspect, the disclosure provides variant GAA polypeptides having high sequence identity to the variant 6 GAA pre-pro-polypeptide (GAA-FL-6-AA; SEQ ID NO: 14) and/or the variant 6 GAA mature polypeptide (GAA-MP-6-AA; SEQ ID NO:37).

[00173] In some embodiments, the GAA variant protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97% or at least 98% identical to MP-6-AA (SEQ ID NO:37). In some embodiments, the first polypeptide sequence is at least 99% identical to MP-6-AA (SEQ ID NO:37). In some embodiments, the first polypeptide sequence is at least 99.5% identical to MP-6-AA (SEQ ID NO:37). In some embodiments, the first polypeptide sequence is MP-6-AA (SEQ ID NO: 37).

[00174] In some embodiments, the GAA variant protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97% or at least 98% identical to MP-6-AA (SEQ ID NO: 37) and comprises one or more variant amino acids selected from the group consisting of T1511, L650G, S676D, and L678H. In some embodiments, the first polypeptide sequence is at least 99% identical to MP-6-AA (SEQ ID NO: 37). In some embodiments, the first polypeptide sequence is at least 99.5% identical to MP-6-AA (SEQ ID NO: 37).

[00175] In some embodiments, the GAA variant protein further comprises a second polypeptide sequence that is at least 95% identical to PP-WT-AA (SEQ ID NO:39). In some embodiments, the second polypeptide sequence is at least 97% identical to PP-WT-AA (SEQ ID NO:39). In some embodiments, the second polypeptide sequence is PP-WT-AA (SEQ ID NO:39).

[00176] In some embodiments, the recombinant GAA variant protein further comprises a third polypeptide sequence that is at least 95% identical to SP-WT-AA (SEQ ID NO:43). In some embodiments, the third polypeptide sequence is SP-WT-AA (SEQ ID NO:43). In some embodiments, the recombinant GAA variant protein comprises the polypeptide sequence of FL- 6-AA (SEQ ID NO: 33).

[00177] In some embodiments, the recombinant GAA variant protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-6-AA (SEQ ID NO: 14). [00178] In some embodiments, the recombinant GAA variant protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-7-AA (SEQ ID NO: 16).

[00179] In some embodiments, the recombinant GAA variant protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-8-AA (SEQ ID NO: 18).

[00180] In some embodiments, the recombinant GAA variant protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-9-AA (SEQ ID NO:20).

[00181] In some embodiments, the recombinant GAA variant protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-10- AA (SEQ ID NO:22).

[00182] In some embodiments, the recombinant GAA variant protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-11-AA (SEQ ID NO:24).

[00183] In some embodiments, the recombinant GAA variant protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-12-AA (SEQ ID NO:26).

[00184] In some embodiments, the recombinant GAA variant protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-13-AA (SEQ ID NO:28).

[00185] In some embodiments, the recombinant GAA variant protein further comprises a second polypeptide sequence that is at least 95%, at least 97%, or 100% identical to PP-WT-AA (SEQ ID NO: 39).

EXAMPLES

Example 1: Methods and Materials

Vectors and Components Figure 14 provides the nucleotide sequences of various components of the vectors tested herein.

Naming of Vector Constructs

[00186] The following naming convention will be used to describe each vector: [Capsid.Enhancer. Promoter. Transgene]. For instance, a candidate with an AAV9 capsid, the Dph-CRE04 enhancer, the SPc512 muscle-specific promoter, and a human wild type GAA will be abbreviated as “AAV9.Dph-CRE04.SPc512.wtGAA” separated by a period.

Animal Model of Pompe Disease

[00187] B6;129-Gaa^tmlRabn/J mice, referred to herein as “GAA KO” or “gaa'^/_” is a suitable animal model of Pompe disease. This model was generated by insertion of a neomycin cassette into exon 6 of the mouse gaa gene, thereby creating a functional knockout (KO) of the gaa genes. GAA KO recapitulates critical features of both the infantile and the adult forms of PD at a pace suitable for the evaluation of gene therapy (GT). Glycogen accumulation in cardiac and skeletal muscles can be detected as early as 3 weeks of age, resembling IOPD, and reduction in the number of myofibrils and signs of damaged muscle structure, impaired autophagic flux in skeletal muscle, mild cardiac defects and muscle weakness leading to locomotor defects which develops by 8-9 months resembles LOPD (reviewed in Geel et al. (2007) Mol. Genet. Metab. 92(4):299-307). As there is emerging evidence of some secondary pathologies in the smooth muscle and CNS in a subset of IOPD patients (McCall et al. (2018) J. Smooth Muscle Res. 54(0): 100-118), these secondary tissues will be characterized and monitored in the mice experiments to determine if this disease phenotype is replicated in the mouse model.

Dosing of Animals and Tissue Harvesting

[00188] AAV vector preparations comprising test AAV transgene constructs or controls were prepared for injection by dilution in vehicle (1.5 mM KH2PO4, 2.7mM KCI, 8.1 mM Na2HPO4, 136.9mM NaCl, 0.001% Pluronic F-68), and doses were administered through a single intravenous administration of vector or buffer only into the tail vein at 3x10¹² vg/kg, 1x10¹³ vg/kg or 3x10¹³ vg/kg, as indicated. Clinical and mortality observations were conducted daily post dosing until the end of each study. At four or twelve weeks post dosing, mice were anesthetized with isoflurane, euthanized, and necropsied. All animals were perfused with 0.9% saline and total wet weight of whole heart, brain, quadriceps, triceps, and gastrocnemius was recorded. Each tissue was further dissected into 20-30 mg pieces. Samples were transferred to separate ceramic precellys bead tubes (P000916-LYSK0-A; Bertin, Rockville, MD) and snap- frozen in liquid nitrogen. Samples were transferred to dry ice and stored at -80°C.

Vector Copy Number

[00189] DNA was extracted and purified from tissue homogenates using a MagMAX kit (Thermofisher, Los Angeles, California) according to manufacturer instructions. Vector copy number was quantified using a digital polymerase chain reaction (dPCR) quantification assay with primers and probes designed on a proprietary DNA sequence, using a linearized vector plasmid as the reference standard. Each 12 ml dPCR reaction contained 2-200 ng sample genomic DNA (gDNA) that was run using a Qiacuity 4 instrument (Qiagen, Germantown MD). Vector genome copy numbers (VGCNs) were normalized to microgram of DNA used in the dPCR reaction.

GAA Activity Measurements Using 4-MUG Substrate

[00190] Snap-frozen tissues were homogenized in Lysis Buffer (0.2 M sodium acetate, 0.4 M potassium chloride and 0.5% triton, pH 4.3), centrifuged for 10 minutes at 14,000xg, and the supernatant collected. Enzymatic reactions were set up using 10 mL of sample (cell lysate or tissue homogenate) diluted appropriately and 75 mL of 4-methylumbelliferyl-a-D- glucopyranoside (4MU-a-Gal; also referred to herein as “4-MUG”; Burlington, MA) substrate, in black 96-well plates (PerkinElmer, Waltham, MA). The reaction mixture was incubated at 37°C for 1 hour and then stopped by adding 150 mL of stop buffer (133 mM Glycine, 83 mM Sodium Carbonate, pH 10). A standard curve (0-4.25 nmol/mL) was used to measure released fluorescent 4-methylumbelliferone (4-MU) from the individual reaction mixture using the Spectramax M3 reader (PerkinElmer, Waltham, MA) at 460 nm (emission) and 360 nm (excitation). The protein concentration of the clarified supernatant was quantified using a BradfordUltra assay (AbCam, Waltham, MA). To calculate the GAA activity in tissues, the released 4-MU concentration was divided by the sample protein concentration, and activity was reported as nanomoles per hour per milligram protein.

GAA Activity Measurements using Glycogen Substrate [00191] Supernatant from tissue homogenates was transferred to a 1.5 ml tube, placed in boiling water for 10 minutes, cooled and then centrifuged. The supernatant was transferred to prelabeled cryovials for glycogen analysis following enzyme hydrolysis using amyloglucosidase from Aspergillus niger (Sigma- Aldrich, Burlington, MA). Glucose, the glycogen cleavage product, was measured using an Amplex red kit (Invitrogen, Carlsbad, California).

Histology and Immunohistochemistry

[00192] Mouse tissues were collected and preserved in 10% neutral buffered formalin, embedded in paraffin, sectioned, mounted on glass slides, and stained with haematoxylin and eosin (H&E). Muscle cryosections were cut (4-5 Im), placed onto slides, and air-dried for 20 minutes. H&E and Periodic Acid Schiff (PAS) stains were performed according to established protocols using kits from Biovision (Waltham, MA). For GAA immunohistochemistry (IHC), samples were incubated overnight at 4°C with an anti-GAA rabbit antibody (Sigma/HPA029126, St. Louis, MO) diluted 1 :100 in TBST (Tris: 20 mM, NaCl: 150 mM, Tween® 20 detergent: 0.1% (w/v)]. For LAMP-1 staining, samples were incubated overnight at 4°C with an anti-rabbit antibody (Abeam /ab24170, Waltham, MA) diluted 1:3000 in TBST.

Major Histocompatibility Associated Peptide Proteomics (MAPPs) Analysis

[00193] Human peripheral blood mononuclear cells (PBMCs) were isolated from the buffy coat fraction of healthy volunteers. Cells were separated by Lymphoprep (Axis-Shield, Dundee, UK) density centrifugation and donors were characterized by identifying HLA-DR and HLA-DQ haplotypes to 4-digit resolution by HISTO Spot SSO HLA typing (MC Diagnostics, St. Asaph, UK). To prepare monocyte- derived dendritic cells (MoDCs), fresh PBMC from 20 healthy donors were used and CD 14+ cells (monocytes) were isolated using RoboSep™ negative human monocyte isolation kits and a RoboSep™ cell isolation instrument (StemCell Technologies, Cambridge, UK) according to the manufacturer’s instructions. Monocytes were re-suspended in MoDC culture medium (RPMI 1640 supplemented with 10% FBS, 50gm 2-ME, 2mM L- Glutamine (all from ThermoFisher Scientific, Loughborough, UK), IL-4 (Peprotech, London, UK), and GM-CSF (Peprotech) and plated in tissue culture flasks. During the 8-day culture, cells were fed by half volume MoDC culture media change. On day 7, the test samples were added to the cells in MoDC culture medium, to a final concentration of 12.5 pg/mL and incubated at 37 °C, 5% CO2. Following incubation, cells were matured by the addition of lipopolysaccharide (LPS) (Sigma Aldrich, Poole, UK) and incubated at 37 °C, 5% CO2 for 18 hours. On day 8, MoDCs were harvested, washed, and pelleted prior to flash-freezing at -80 °C. The MoDCs were thawed at RT and subsequently lysed using a hypotonic buffer solution (20 mM Tris, 5 mM MgCh; ThermoFisher Scientific, Waltham, MA), 0.1% Triton X-100 and protease inhibitors (Sigma Aldrich, St. Louis, MO), pH 7.8, for 1 hour at 4 °C. HLA-DR/peptide complexes were purified from the cell lysate by immunoprecipitation using magnetic beads (Promega, Southampton, UK) coated with anti-HLA-DR antibody (BioLegend, London, UK) overnight at 4 °C. Peptides bound to HLA-DR were eluted under acidic conditions (3% MeCN, 0.2% TFA; ThermoFisher Scientific Waltham, MA) and purified by solid phase extraction using Oasis® HLB pElution plates (Waters, Ellsmere Port, UK). Peptides were freeze-dried using a 5301 vacuum concentrator (Eppendorf, Stevenage, UK) and stored at -80 °C until analyzed by mass spectrometry (MS).

[00194] Freeze-dried peptides were re-solubilized in 3% MeCN, 0.2% TFA and analyzed using nano liquid chromatography coupled to an Orbitrap mass spectrometer. Nano flow reverse phase separation was performed using a Dionex Ultimate 3000 with an Acclaim™ PepMap™ 100 Cl 8 separation column (75 pm x 150 mm, 2 pm, 100 A) connected online to a Q Exactive Plus™ mass spectrometer (all from ThermoFisher Scientific, Waltham, MA) via a nano-spray ion source. Peptides were identified using the Sequest algorithm, built in the Proteome Discoverer software v2.1 (ThermoFisher Scientific, Waltham, MA) against a proprietary database and the sequences of the test samples determined. Once the final list of identified peptides was completed, the sequence heatmaps were generated using MATLAB (MathWorks®, Cambridge, UK).

Protein Uptake

[00195] C2C12 myoblast cells were seeded into 24-well tissue culture plates (Corning,

Glendale, AZ), grown for 24 hours in DMEM (Invitrogen, Carlsbad, California) with 10% FBS 37°C and 5% CO2 to ~80% confluency, differentiated to myotubes for 3 days in DMEM (Invitrogen, Waltham, MA) containing 10% horse serum at 37° C and 5% CO2. Cells were incubated with purified proteins at various concentrations for 24 hours. Cells were then washed four times with DPBS, pH 7.4 (0.133 g/liter CaCh 2H2O, 0.1 g/liter MgCh 6H2O, 0.2 g/liter KC1, 0.2 g/liter KH2PO4, 8.0 g/liter NaCl, 1.15 g/liter Na2HPO4), and then lysed with the addition of 200 gl/well CelLytic M reagent (Sigma, St. Louis, MO) and 20 minutes of shaking at RT. Lysate debris was removed by centrifugation at 2,000 * g for 5 minutes. Lysate GAA activity of duplicate wells was measured with the GAA 4-methylumbelliferyl-a-D- glucopyranoside (4-MUG) assay (described above) and normalized to lysate protein concentration as determined using the bicinchoninic acid protein assay (Pierce, Appleton, Wisconsin).

Thermal Stability

[00196] Thermal protein unfolding was monitored using a Prometheus NT.48 instrument (NanoTemper Technologies, Miinchen, Germany). For each condition, 50 gl of a l mg/ml protein solution was prepared, and 20 gl of sample was filled into 3 low volume differential scanning fluorimetry (nanoDSF) Grade Standard Capillaries (NanoTemper Technologies, Miinchen, Germany), respectively, and loaded into the instrument. Thermal unfolding of the proteins was monitored in a 1 °C/minute thermal ramp from 25 °C to 95 °C. Tm values were determined automatically by the PR control software.

Chemical Stability

[00197] An 8M stock of GuHCl was prepared by mixing 7.64 g of GuHCl with 4.21 ml assay buffer (50 mM sodium phosphate buffer, pH 7.5, 150 mM NaCl) and the pH adjusted to pH 7.5 using 1 M Tris, pH 8.0. A 9 M stock of urea was prepared freshly by mixing 5.41 g urea with 5.9 nil lx assay buffer. One microliter of concentrated protein stock (final concentration 0.5 mg/ml) was added to 30 pl of a series of denaturant concentrations (0.25-6M) and the mixture was incubated for 1 hour and 16 hours at 25°C.

Echo

[00198] Transthoracic echocardiography was performed on mice that were anaesthetized with isoflurane. Parasternal long axis and short axis images were obtained using an MX550S probe attached to Vevo3100 (FujiFilms, Visulsonics, Ontario, Canada). Images were acquired when the rectal temperature was between 36-38 °C and respiratory rate was 40-120 breaths per minute. Images were analyzed by Vevo Lab 5.5.0 (). An average of three heart beats were used for analysis.

Rotarod [00199] An accelerating rotarod assay was used to determine neuro-motor coordination of the Pompe GAA KO mice. The assay was performed on a rotarod apparatus (Model 47650; Ugo Basile, Italy) that was set to accelerate from 5-40 rpm over 5 minutes. Latency and the speed at fall were recorded from a total of 6 trials (3 trials/day). Animals were habituated and trained before the test trials.

Example 2: Establishing Optimal Tissue Expression For Acid Alpha-Glucosidase Gene Delivery Relevant To Pompe Disease Patients Using Mouse Models

[00200] A comparison of GAA protein activity in key muscle tissues in mice after treatment with engineered AAV9 vectors comprising a GAA transgene in the presence of either a strong ubiquitous promoter (CBA) or a muscle-specific promoter (SPc512) in the presence or absence of either a single muscle-specific enhancer (CSK-SH5 or Dph-CRE04) or both the CSK-SH5 and the Dph-CRE04 muscle-specific enhancers in two orientations or a muscle-specific promoter (Desmin) in the presence or absence of a single muscle-specific enhancer (SKSH4) was performed and compared to control gaa^+l+ (GAA WT) and gaa '~ (GAA KO) mice (Figure IB and Figure 1C). GAA KO mice were dosed once with 3x10¹³ vg/kg of a rAAV9.GAA vector only differing in their enhancer-promoter elements (i.e., all vectors comprised an AAV9 capsid and encoded the same codon optimized wild type (WT) human GAA transgene CO3 - see Example 3). After a 5 week incubation, the animals were sacrificed, their muscle tissues harvested, and GAA enzymatic activity determined according to the 4-MUG assay described in Example 1. An increase in GAA activity was observed in heart, diaphragm, quadriceps, and triceps of GAA KO mice treated with vectors that included the enhancer elements as compared to animals that were dosed with vectors that did not have any enhancer elements added to the muscle specific SPc512 promoter (Figure IB - Figure 1H). In the heart, supraphysiological levels of GAA activity were observed and were significantly higher than those observed after ERT treatment (see horizontal dashed black line in Figures IB and 1C). A 15-fold increase in activity was observed with the addition of the Dph-CRE04 enhancer element compared to the SPc512 promoter alone, with order of activity being, Dph-CRE04 > CSK-SH5 > Dph-CRE04 + CSK-SH5 = CSK-SH5 + Dph-CRE04 in the heart, with activity exceeding that observed with the ubiquitous promoter, CBA (Figure IB). In diaphragm and quadriceps, a 3-5 fold increase in GAA activity was observed for mice dosed with vectors comprising Dph-CRE04 + SPc512 compared to the SPc512 alone (Figures 1C and ID, respectively), with the order of activity being similar to that observed in the heart. In triceps muscle, only the mice dosed with vectors comprising the Dph-CRE04 enhancer or both the CSK-SH5 and the Dph-CRE04 enhancer showed an increase in GAA activity over the use of the SPc512 promoter alone but not CSK- SH5 alone (Figure IE). Similarly, the addition of the enhancer element SKSH4 to the desmin promoter resulted in 2-3 fold increase in GAA activity was observed in quadriceps and triceps of GAA KO mice treated with vectors that included the enhancer elements as compared to animals that were dosed with vectors that did not have any enhancer elements (Figure IF - Figure 1H). Increased clearance of glycogen in the heart muscle of the GAA KO mice dosed with vectors comprising SKSH4 enhancer added as observed in tissue sections using PAS staining (Figure 1G). Further a 76 kDa mature form of GAA was detected in tissue lysates confirming correct GAA processing in the lysosomes (Figure 1H). Consistently, the addition of enhancer elements to muscles specific promoters resulted in increased GAA activity across muscle tissues. The addition of the enhancer element Dph-CRE04 to the SPc512 promoter resulted in the most robust and significant increase in activity and glycogen reduction across all muscles tested. Therefore, the Dph-CRE04 enhancer element along with SPc512 promoter was selected for all subsequent studies.

[00201] Figure 2 is a comparison of GAA protein activity in key muscle tissues in mice after treatment with engineered AAV9 vectors comprising a GAA transgene in the presence of either the muscle-specific SPc512 promoter alone or in the presence of the Dph-CRE04 musclespecific enhancer and compared to GAA WT and GAA KO mice (Figure 2A). GAA KO mice were dosed once either with buffer alone or with 3x10¹³ vg/kg of a rAAV9.GAA vector only differing in the enhancer-promoter elements (i.e., all vectors were AAV9 and encoded same codon optimized wild type (WT) human GAA transgene CO3). After a 5 week incubation, the animals were sacrificed, their muscle tissues harvested, and GAA activity determined according to methods described in Example 1. Figure 2A shows GAA enzymatic activity using 4-MUG as a substrate and Figure 2B shows reduction in glycogen levels in the respective tissues. This study shows that the vectors comprising the GAA constructs that contain the muscle specific enhancer Dph-CRE04 when combined with the muscle specific promoter SPc512 results in enhanced GAA proteins levels and catalytic activity as measured by the increased release of fluorescent 4-MU (Figure 2A) and the enhanced reduction in glycogen (Figure 2B) in all four muscle types compared to the GAA KO mice dosed with vector comprising only the muscle specific SPc512 promoter.

[00202] Histological examination of triceps muscle revealed a reduction in glycogen, LAMP-

1 (a marker of lysosomal size and burden) and vacuolation in animals that received vector comprising the SPc512 promoter combined with the Dph-CRE04 enhancer compared to the SPc512 promoter alone. These findings are consistent with a model whereby improved efficacy can be achieved with increased GAA expression and activity. This study confirmed the benefit of adding skeletal muscle enhancers to a muscle promoter when constructing a transgene construct and that increased expression of GAA in muscle with the addition of enhancer elements reduces glycogen buildup in GAA KO mice. The best efficacy in these studies was observed with the use of the Dph-CRE04 enhancer element.

[00203] Given that treatment of the GAA KO mice with AAV9 vectors containing a codon optimized GAA, a muscle-specific promoter, and a muscle-specific enhancer did not fully rescue the gaa ^f' phenotype and fully restore GAA levels or activity, the codon optimized GAA protein was further optimized. Example 3 describes the identification of a preferred codon-optimized human GAA that provided enhanced gene expression. Example 4 describes a campaign 2 to identify amino acid substitutions that improved the catalytic activity of GAA.

Example 3: Codon Optimization of WT GAA Nucleotide Sequence to Increase Expression

[00204] The human WT GAA nucleotide sequence was codon optimized in order to improve expression in human muscle cells, while reducing the immuno- stimulatory CpG content. Three codon variants with reduced CpGs (named CO1, CO2, and CO3) were inserted into an expression cassette comprising a muscle-specific Sk-SH4 enhancer and the muscle specific desmin promoter [AAV9.Sk-SH4.desmin.COGAA] (Figure 3) and tested for expression at a dose of 3x10¹³ vg/kg in GAA KO mice. The CO3 codon optimized variant demonstrated increased GAA expression and activity at or slightly higher than the WT hGAA sequence and resulted in a similar or more efficient reduction in the levels of glycogen in heart, diaphragm, and quadriceps muscle (Figure 4A and 4B, respectively). Significant clearance of glycogen in the heart muscle of GAA KO mice dosed with vectors comprising CO3was also observed in tissue sections using PAS staining (Figure 4C). Further, a 76 kDa mature form of GAA was detected in tissue lysates confirming correct GAA processing in the lysosomes (Figure 4D). While these results were encouraging, insufficient clearance of glycogen observed in the skeletal and respiratory muscles suggested that further improvements to the GAA protein were required to increase its efficacy.

Table 2: Wild Type and Codon Optimized GAA Nucleic Acid Sequences

Example 4: GAA With Enhanced Catalytic Activity

[00205] The GAA protein was engineered to improve its specific activity (2x-10x) for its natural substate glycogen. Using initial in silico prediction, based on structural site directed as well random mutagenesis, 13 different libraries ranging over 100,000 clones were screened for GAA variants with increased activity over the natural hGAA. The initial screening was performed in HEK cells followed by validation promising positive hits in C2C12 cells. Activity was initially screened using the 4-MUG assay while GAA’s natural substrate glycogen was used for final selection of the GAA variants that were then further assessed both in vitro and in vivo. Only GAA variants demonstrating an increase in activity on glycogen (2X-8X compared to WT hGAA) were selected (Figure 5). Table 5 provides the amino acid sequence of each of these proteins.

[00206] Table 5: Amino Acid Sequences of Variant Campaign 2 GAA Proteins

[00207] These GAA variants had approximately 3-4 amino acid differences relative to WT hGAA. GAA variant 6 (Var 6) was further evaluated in vitro for catalytic activity using kinetic assays using either the synthetic substrate 4-MUG (Figure 6A) or the natural substrate glycogen (Figure 6B). A 3.5x fold improvement in the activity on lOmg/ml or lOOmg/ml glycogen substrate was observed for GAA Var 6 compared to the WT hGAA (Figures 6B).

Example 6: Production and purification viral vectors expressing a transgene

[00208] This example summarizes some viral vectors encompassed by the present disclosure.

[00209] A recombinant adeno-associated virus 9 (rAAV9) was developed to express wild type human cx-GAL or cx-GAL variants (e.g., amino acid sequences shown in Table 1) under the control of a ubiquitous promoter, in a viral vector. A WPRE element was linked to the 3’ end of the GLA transgene to increases transgene expression to improve mRNA stability, A bovine growth hormone poly A tail was appended to the 3’ end of the WPRE element. The DNA construct of promoter-GLA-WPRE-BGHpA was integrated between the inverted terminal repeats of a circular plasmid vector. Figure 1 shows an exemplary r AAV9 vector construct.

[00210] rAAV vectors were encapsulated using the AAV2 inverted terminal repeats and rep sequences using methods in the art. The rAAV9 stocks were produced using HEK-293T cells by the adenovirus free, triple-plasmid co-transfection method and purified using cesium chloride ultracentrifugation. Titers of v.g. particle number were determined by quantitative PCR.

[00211] Purified rAAV9 virus suspension were diluted in the formulation buffer consisting of 1.5 mM KH2PO4 (Potassium dihydrogen phosphate), 2.7 mM KC1 (Potassium chloride), 8.1 mM Na2HPO4 (Di-sodium hydrogen phosphate), 136.9 mM NaCl (Sodium chloride) and 0.001% Pluronic F-68. Null vector with rAAV9 capsid (rAAV9-null) were used as controls.

[00212] Two rAAV vectors Variant 1 and Variant 2 were prepared and used for the following studies. The Variant 1 comprising a codon optimized nucleic acid sequence of SEQ ID NO: 58.

Variant 2 comprising a codon optimized nucleic acid sequence of SEQ ID NO: 59.

INCORPORATION BY REFERENCE

[00213] The present disclosure incorporates by reference in their entirety techniques well known in the fields of virology, immunology, molecular biology, drug delivery, and gene therapy. These techniques include, but are not limited to, techniques described in the following publications: Ausubel etal. (eds.) (1993) CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, NY; Ausubel etal. (eds.) (1999) SHORT PROTOCOLS IN MOLECULAR BIOLOGY (4th Ed.) John Wiley & Sons, NY; Smolen and Ball (eds.) (1984) CONTROLLED DRUG BIO AVAILABILITY, DRUG PRODUCT DESIGN AND PERFORMANCE, Wiley & Sons, NY; Giege etal. (1999) CRYSTALLIZATION OF NUCLEIC ACIDS AND PROTEINS, a Practical Approach (2nd Ed.) Oxford University Press, NY; Hammerhng et al. (1981) MONOCLONAL ANTIBODIES AND T-CELL HYBRIDOMAS, Elsevier, NY; Harlow etal. (1988) ANTIBODIES: A LABORATORY MANUAL (2^nd Ed.) Cold Spring Harbor Laboratory Press, NY; Kabat etal. (1987 and 1991) SEQUENCES OF PROTEINS OF IMMUNOLOGICAL INTEREST, National Institutes of Health, Bethesda, MD; Kabat et al. (1991) SEQUENCES OF PROTEINS OF IMMUNOLOGICAL INTEREST (5^th Ed.) U.S. Department of Health and Human Services, NIH Publication No. 91-3242; Kontermann and Dubel (eds.) (2001) ANTIBODY ENGINEERING, Springer- Verlag, NY; Knegler (1990) GENE TRANSFER AND EXPRESSION, A LABORATORY MANUAL, Stockton Press, NY; Lu and Weiner (eds.) (2001) CLONING AND EXPRESSION VECTORS FOR GENE FUNCTION ANALYSIS BioTechniques Press, MA; Old and Primrose (1985) PRINCIPLES OF GENE MANIPULATION: AN INTRODUCTION TO GENETIC ENGINEERING (3rd Ed.) Blackwell Scientific Publications, MA; Sambrook et al. (eds.) (1989) MOLECULAR CLONING: A LABORATORY MANUAL (2nd Ed.) Cold Spring Harbor Laboratory Press, NY; Winnacker (1987) FROM GENES TO CLONES: INTRODUCTION TO GENE TECHNOLOGY, VCH Publishers, NY.

[00214] The contents of all cited references (including literature references, patents, patent applications, and websites) that maybe cited throughout this application are hereby expressly incorporated by reference in their entirety for any purpose, as are the references cited therein. The disclosure will employ, unless otherwise indicated, conventional techniques of virology, immunology, molecular biology, and cell biology, which are well known in the art.

EQUIVALENTS

[00215] The disclosure may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting of the disclosure. Scope of the disclosure is thus indicated by the appended claims rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced herein.

Claims

CLAIMS WE CLAIM:

1. A nucleic acid encoding an acid alpha-glucosidase (GAA) protein, the nucleic acid comprising a first polynucleotide sequence that is at least 95%, at least 96%, at least 97% or at least 98% identical to CO3-MP-6-dNA (SEQ ID NO: 36).

2. The nucleic acid of claim 1, wherein the first polynucleotide sequence is at least 99% identical to CO3-MP-6-dNA (SEQ ID NO:36).

3. The nucleic acid of claim 1, wherein the first polynucleotide sequence is at least 99.5% identical to CO3-MP-6-dNA (SEQ ID NO:36).

4. A nucleic acid encoding an acid alpha-glucosidase (GAA) protein, the nucleic acid comprising a first polynucleotide sequence that is at least 95%, at least 96%, at least 97% or at least 98% identical to C03-MP-WT-NA (SEQ ID NO:34).

5. The nucleic acid of claim 4, wherein the first polynucleotide sequence is at least 99% identical to C03-MP-WT-NA (SEQ ID NO:34).

6. The nucleic acid of claim 4, wherein the first polynucleotide sequence is at least 99.5% identical to C03-MP-WT-NA (SEQ ID NO:34).

7. The nucleic acid of any one of claims 1-6, wherein the nucleic acid further comprises a second polynucleotide sequence that is at least 95%, at least 96%, at least 97%, or at least 98% identical to C03-PP-WT-NA (SEQ ID NO:38).

8. The nucleic acid of claim 7, wherein the second polynucleotide sequence is at least 99% identical to C03-PP-WT-NA (SEQ ID NO:38).

9. The nucleic acid of claim 7, wherein the second polynucleotide sequence is C03-PP-WT- NA (SEQ ID NO:38).

10. The nucleic acid of any one of claims 1-9, wherein the nucleic acid further comprises a third polynucleotide sequence that is at least 95%, at least 96%, at least 97%, or at least 98% identical to CO3-SP-WT-NA (SEQ ID NO:42).

11. The nucleic acid of claim 10, wherein the third polynucleotide sequence is CO3-SP-WT- NA (SEQ ID NO:42).

12. The nucleic acid of any one of claims 1-11, wherein the encoded GAA protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97% or at least 98% identical to MP-6-AA (SEQ ID NO: 14).

13. The nucleic acid of claim 12, wherein the first polypeptide sequence is at least 99% identical to MP-6-AA (SEQ ID NO: 14).

14. The nucleic acid of claim 12, wherein the first polypeptide sequence is at least 99.5% identical to MP-6-AA (SEQ ID NO: 14).

15. The nucleic acid of any one of claims 12-14, wherein the encoded GAA protein comprises an amino acid substitution selected from the group consisting of T151I, L650G, S676D, and L678H, numbered relative to the full-length wild type GAA protein sequence of FL- WT-AA (SEQ ID NO:2).

16. The nucleic acid of any one of claims 1-11, wherein the encoded GAA protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97% or at least 98% identical to MP-WT-AA (SEQ ID NO:35).

17. The nucleic acid of claim 16, wherein the first polypeptide sequence is at least 99% identical to MP-WT-AA (SEQ ID NO:35).

18. The nucleic acid of claim 16, wherein the first polypeptide sequence is at least 99.5% identical to MP-WT-AA (SEQ ID NO:35).

19. The nucleic acid of any one of claims 1-18, wherein the encoded GAA protein further comprises a second polypeptide sequence that is at least 95% identical to PP-WT-AA (SEQ ID NO:39).

20. The nucleic acid of claim 19, wherein the second polypeptide sequence is at least 97% identical to PP-WT-AA (SEQ ID NO: 39).

21. The nucleic acid of claim 19, wherein the second polypeptide sequence is PP-WT-AA (SEQ ID NO:39).

22. The nucleic acid of any one of claims 1-21, wherein the encoded GAA protein further comprises a third polypeptide sequence that is at least 95% identical to SP-WT-AA (SEQ ID NO:43).

23. The nucleic acid of claim 10, wherein the second polypeptide sequence is SP-WT-AA (SEQ ID NO:43).

24. The nucleic acid of any one of claims 1-23, wherein the encoded GAA protein further comprises an amino acid substitution selected from the group consisting of T151I, L650G, L650S L650T, L650E, L650Y, L650F, S676D, and L678H, numbered relative to the full-length wild type GAA protein sequence of FL-WT-AA (SEQ ID NO:2).

25. The nucleic acid of any one of claims 4-11 and 16-24, wherein the encoded GAA protein further comprises an amino acid substitution selected from the group consisting of T151I, L650G, S676D, and L678H, numbered relative to the full-length wild type GAA protein sequence of FL-WT-AA (SEQ ID NO:2).

26. The nucleic acid of any one of claims 1-25, wherein the polynucleotide sequence encoding the GAA protein comprises no more than five CpG dinucleotides.

27. The nucleic acid of any one of claims 1-25, wherein the polynucleotide sequence encoding the GAA protein comprises a GC content of 54% to 60%.

28. The nucleic acid of claim 1, wherein the first polynucleotide sequence is CO3-MP-6- dNA (SEQ ID NO: 36).

29. The nucleic acid of claim 1, comprising the polynucleotide sequence of CO3-FL-6-NA (SEQ ID NO: 32).

30. The nucleic acid of claim 1, comprising the polynucleotide sequence of CO3-FL-6-dNA (SEQ ID NO: 60).

31. The nucleic acid of claim 4, wherein the first polynucleotide sequence is C03-MP-WT- NA (SEQ ID NO:34).

32. The nucleic acid of claim 4, comprising the polynucleotide sequence of CO3-FL-WT-NA (SEQ ID NO:31).

33. An expression cassette comprising the nucleic acid of any one of claims 1-32 and at least one regulatory nucleic acid sequence operably linked to the sequence encoding the GAA protein.

34. The expression cassette of claim 33, wherein the at least one regulatory nucleic acid sequence is selected from the group consisting of a promoter, an enhancer, an intron, a post- transcriptional regulatory element, an inverted terminal repeat (ITR), a polyadenylation (poly A) sequence, and a combination thereof.

35. The expression cassette of claim 33, wherein the at least one regulatory nucleic acid sequence comprises a promoter.

36. The expression cassette of claim 35, wherein the promoter is a muscle-specific promoter.

37. The expression cassette of claim 36, wherein the muscle-specific promoter comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to SPc512_NA (SEQ ID NO:46).

38. The expression cassette of claim 36, wherein the muscle-specific promoter comprises the polynucleotide sequence of SPc512_NA (SEQ ID NO:46).

39. The expression cassette of claim 36, wherein the muscle-specific promoter comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to HsDesmin NA (SEQ ID NO:47).

40. The expression cassette of claim 36, wherein the muscle-specific promoter comprises the polynucleotide sequence of HsDesmin NA (SEQ ID NO:47).

41. The expression cassette of any one of claims 33-40, wherein the at least one regulatory nucleic acid sequence comprises an enhancer.

42. The expression cassette of claim 41, wherein the enhancer is a muscle-specific enhancer.

43. The expression cassette of claim 42, wherein the muscle-specific enhancer comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to Dph-CRE04_NA (SEQ ID NO:48).

44. The expression cassette of claim 42, wherein the muscle-specific enhancer comprises the polynucleotide sequence of Dph-CRE04_NA (SEQ ID NO:48).

45. The expression cassette of claim 42, wherein the muscle-specific enhancer comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to sk-SH4_NA (SEQ ID NO:49).

46. The expression cassette of claim 42, wherein the muscle-specific enhancer comprises the polynucleotide sequence of sk-SH4_NA (SEQ ID NO:49).

47. The expression cassete of any one of claims 33-46, wherein the at least one regulatory nucleic acid sequence comprises an intron.

48. The expression cassette of claim 47, wherein the intron comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to MVM NA (SEQ ID NO: 50).

49. The expression cassette of claim 47, wherein the intron comprises the polynucleotide sequence of MVM NA (SEQ ID NO: 50).

50. A mammalian expression vector comprising the expression cassette of any of claims 33- 49.

51. The mammalian expression vector of claim 50, comprising an adeno-associated virus (AAV) vector.

52. The mammalian expression vector of claim 51 , wherein the AAV vector comprises an AAV8 or AAV9 capsid polypeptide encapsidating the expression cassette.

53. The mammalian expression vector of claim 51, wherein the AAV vector comprises an engineered capsid polypeptide encapsidating the expression cassete.

54. A host cell comprising the nucleic acid of any one of claims 1-32.

55. A host cell comprising the expression cassette of any of claims 33-49.

56. The host cell of claim 55, further comprising a nucleic acid encoding an AAV capsid polypeptide.

57. The host cell of claim 56, wherein the AAV capsid polypeptide is an AAV8 or AAV9 capsid polypeptide.

58. The host cell of claim 56 or 57, further comprising a nucleic acid encoding a viral helper gene selected from the group consisting of E4, E2a, and VA.

59. A recombinant acid alpha-glucosidase (GAA) variant protein, wherein the GAA variant protein comprises an amino acid substitution selected from the group consisting of T151I, L650G, L650S, L650T, L650E, L650Y, L650F, S676D, L678H, and L868F, numbered relative to the full-length wild type GAA protein sequence of FL-WT-AA (SEQ ID NO:2).

60. The recombinant GAA variant protein of claim 59, comprising an amino acid substitution selected from the group consisting of T151I, L650G, S676D, and L678H, numbered relative to the full-length wild type GAA protein sequence of FL-WT-AA (SEQ ID NO:2).

61. The recombinant GAA variant protein of claim 59 or 60, comprising a first polypeptide sequence that is at least 95%, at least 96%, at least 97% or at least 98% identical to MP-6-AA (SEQ ID NO:37).

62. The recombinant GAA variant protein of claim 61 , wherein the first polypeptide sequence is at least 99% identical to MP-6-AA (SEQ ID NO:37).

63. The recombinant GAA variant protein of claim 61 , wherein the first polypeptide sequence is at least 99.5% identical to MP-6-AA (SEQ ID NO: 37).

64. The recombinant GAA variant protein of claim 61 , wherein the first polypeptide sequence is MP-6-AA (SEQ ID NO: 37).

65. The recombinant GAA variant protein of any one of claims 59-64, further comprising a second polypeptide sequence that is at least 95% identical to PP-WT-AA (SEQ ID NO:39).

66. The recombinant GAA variant protein of claim 65, wherein the second polypeptide sequence is at least 97% identical to PP-WT-AA (SEQ ID NO: 39).

67. The recombinant GAA variant protein of claim 65, wherein the second polypeptide sequence is PP-WT-AA (SEQ ID NO: 39).

68. The recombinant GAA variant protein of any one of claims 59-67, further comprising a third polypeptide sequence that is at least 95% identical to SP-WT-AA (SEQ ID NO:43).

69. The recombinant GAA variant protein of claim 68, wherein the third polypeptide sequence is SP-WT-AA (SEQ ID NO:43).

70. The recombinant GAA variant protein of claim 59, comprising the polypeptide sequence of FL-6-AA (SEQ ID NO: 14).

71. A recombinant acid alpha-glucosidase (GAA) variant protein, wherein the GAA variant protein comprises a first polypeptide sequence that is at least 95%, at least 96%, at least 97% or at least 98% identical to MP-6-AA (SEQ ID NO:37) and wherein the GAA variant comprises one or more variant amino acids selected from the group consisting of T151I, L650G, S676D, and L678H.

72. The recombinant GAA variant protein of claim 71 , wherein the first polypeptide sequence is at least 99% identical to MP-6-AA (SEQ ID NO:37).

73. The recombinant GAA variant protein of claim 71 , wherein the first polypeptide sequence is at least 99.5% identical to MP-6-AA (SEQ ID NO: 37).

74. The recombinant GAA variant protein of any one of claims 71-73, comprising a isoleucine residue at position 151 (relative to SEQ ID NO:2).

75. The recombinant GAA variant protein of any one of claims 71-74, comprising a glycine residue at position 650 (relative to SEQ ID NO:2).

76. The recombinant GAA variant protein of any one of claims 71-75, comprising a aspartic acid residue at position 676 (relative to SEQ ID NO:2).

77. The recombinant GAA variant protein of any one of claims 71-76, comprising a histidine residue at position 678 (relative to SEQ ID NO:2).

78. The recombinant GAA variant protein of claim 71 , wherein the first polypeptide sequence is MP-6-AA (SEQ ID NO: 37).

79. The recombinant GAA variant protein of any one of claims 71-78, further comprising a second polypeptide sequence that is at least 95% identical to PP-WT-AA (SEQ ID NO:39).

80. The recombinant GAA variant protein of claim 79, wherein the second polypeptide sequence is at least 97% identical to PP-WT-AA (SEQ ID NO: 39).

81. The recombinant GAA variant protein of claim 80, wherein the second polypeptide sequence is PP-WT-AA (SEQ ID NO: 39).

82. The recombinant GAA variant protein of any one of claims 71-81, further comprising a third polypeptide sequence that is at least 95% identical to SP-WT-AA (SEQ ID NO:43).

83. The recombinant GAA variant protein of claim 82, wherein the second polypeptide sequence is SP-WT-AA (SEQ ID NO:43).

84. The recombinant GAA variant protein of claim 71, comprising the polypeptide sequence of FL-6-AA (SEQ ID NO:33).

85. A recombinant acid alpha-glucosidase (GAA) variant protein, wherein the GAA variant protein comprises a set of amino acid substitutions, numbered relative to the full-length wild type GAA protein sequence of FL-WT-AA (SEQ ID NO:2), selected from the group consisting of: a) T151I, L650G, S676D, and L678H, b) L650S, S676D, and L678H, c) L650T, S676D, and L678H, d) L650E, S676D, and L678H, e) L650Y, S676D, and L678H, f) L650F, S676D, and L678H, g) L650G, S676D, and L678H, and h) S676D, and L678H.

86. The recombinant GAA variant protein of claim 85, comprising a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-6- AA (SEQ ID NO: 14).

87. The recombinant GAA variant protein of claim 85, comprising a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-7-AA (SEQ ID NO: 16).

88. The recombinant GAA variant protein of claim 85, comprising a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-8-AA (SEQ ID NO: 18).

89. The recombinant GAA variant protein of claim 85, comprising a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-9- AA (SEQ ID NO:20).

90. The recombinant GAA variant protein of claim 85, comprising a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-10-AA (SEQ ID NO:22).

91. The recombinant GAA variant protein of claim 85, comprising a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-l l-AA (SEQ ID NO:24).

92. The recombinant GAA variant protein of claim 85, comprising a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-12-AA (SEQ ID NO:26).

93. The recombinant GAA variant protein of claim 85, comprising a first polypeptide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% identical to amino acid residues 70-952 of FL-13-AA (SEQ ID NO:28).

94. The recombinant GAA variant protein of any one of claims 85-93, further comprising a second polypeptide sequence that is at least 95%, at least 97%, or 100% identical to PP-WT-AA (SEQ ID NO:39).

95. The recombinant GAA variant protein of any one of claims 85-93, further comprising a third polypeptide sequence that is at least 95% or 100% identical to SP-WT-AA (SEQ ID NO:43).

96. A nucleic acid encoding the recombinant GAA variant protein of any one of claims 59- 95.

97. An expression cassette comprising the nucleic acid of claim 96, and at least one regulatory nucleic acid sequence operably linked to the sequence encoding the GAA protein.

98. The expression cassette of claim 97, wherein the at least one regulatory nucleic acid sequence is selected from the group consisting of a promoter, an enhancer, an intron, a post- transcriptional regulatory element, an inverted terminal repeat (ITR), a polyadenylation (poly A) sequence, and a combination thereof.

99. The expression cassette of claim 97, wherein the at least one regulatory nucleic acid sequence comprises a promoter.

100. The expression cassette of claim 99, wherein the promoter is a muscle-specific promoter.

101. The expression cassette of claim 100, wherein the muscle-specific promoter comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to SPc512_NA (SEQ ID NO:46).

102. The expression cassette of claim 100, wherein the muscle-specific promoter comprises the polynucleotide sequence of SPc512_NA (SEQ ID NO:46).

103. The expression cassette of claim 100, wherein the muscle-specific promoter comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to sk-SH4_NA (SEQ ID NO:49).

104. The expression cassette of claim 100, wherein the muscle-specific promoter comprises the polynucleotide sequence of sk-SH4_NA (SEQ ID NO:49).

105. The expression cassette of any one of claims 97-104, wherein the at least one regulatory nucleic acid sequence comprises an enhancer.

106. The expression cassette of claim 100, wherein the enhancer is a muscle-specific enhancer.

107. The expression cassette of claim 106, wherein the muscle-specific enhancer comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to Dph-CRE04_NA (SEQ ID NO:48).

108. The expression cassette of claim 106, wherein the muscle-specific enhancer comprises the polynucleotide sequence of Dph-CRE04_NA (SEQ ID NO:48).

109. The expression cassette of claim 106, wherein the muscle-specific enhancer comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to hDesmin_NA (SEQ ID NO:47).

110. The expression cassette of claim 106, wherein the muscle-specific enhancer comprises the polynucleotide sequence of hDesmin NA (SEQ ID NO:47).

111. The expression cassette of any one of claims 97-110, wherein the at least one regulatory nucleic acid sequence comprises an intron.

112. The expression cassette of claim 111, wherein the intron comprises a polynucleotide sequence that is at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to MVM NA (SEQ ID NO: 50).

113. The expression cassette of claim 111, wherein the intron comprises the polynucleotide sequence of MVM NA (SEQ ID NO: 50).

114. A mammalian expression vector comprising the expression cassette of any of claims 97- 113.

115. The mammalian expression vector of claim 114, comprising an adeno-associated virus (AAV) vector.

116. The mammalian expression vector of claim 115, wherein the AAV vector comprises an AAV8 or AAV9 capsid polypeptide encapsidating the expression cassette.

117. A host cell comprising the nucleic acid of claim 96.

118. A host cell comprising the expression cassette of any of claims 97-113.

119. The host cell of claim 118, further comprising a nucleic acid encoding an AAV capsid polypeptide.

120. The host cell of claim 119, wherein the AAV capsid polypeptide is an AAV8 or AAV9 capsid polypeptide.

121. The host cell of any one of claims 118-120, further comprising a nucleic acid encoding a viral helper gene selected from the group consisting of E4, E2a, and VA.

122. A pharmaceutical composition comprising the mammalian expression vector according to any one of claims 50-53 and 114-116 and a pharmaceutically acceptable carrier.

123. The pharmaceutical composition of claim 122 for the treatment of Pompe disease.

124. A method for treating Pompe disease in a subject in need thereof, the method comprising administering to the subject a therapeutically effective amount of the composition of claim 122.

125. The method of claim 124, wherein the composition is administered by epicardial injection, intravenous injection, intramuscular injection, intraperitoneal injection, intracardiac injection, intracardiac catheterization, direct intramyocardial injection, transvascular administration, antegrade intracoronary injection, retrograde injection, transendomyocardial injection, or molecular cardiac surgery with recirculating delivery (MCARD).