Attorney Docket No. 2013237-1122 CANCER VACCINES CROSS REFERENCE TO RELATED APPLICATIONS [0001] The present application claims the benefit of U.S. Provisional Appln. No.63/570,183 filed March 26, 2024, the entire contents of which are hereby incorporated by reference herein in their entirety. BACKGROUND [0002] Cancer is a primary cause of mortality, accounting for 1 in 4 of all deaths. Despite recent advances in the field of cancer immunotherapy there remains no single, broadly applicable treatment. Molecular heterogeneity of tumors renders many therapies ineffective for cancer patients. SUMMARY [0003] The present disclosure provides new insights and technologies for treatment of cancer, and in some embodiments provides patient-specific cancer therapeutics. [0004] Among other things, the present disclosure provides an insight that different tumors may respond differently to treatment that involves inducing or encouraging a host immune response directed at the tumor. For example, different tumors may be more or less effectively treated by encouraging host immune responses to neoepitopes that have arisen in the particular tumor in the individual patient, and/or may be more or less effectively treated by encouraging host immune responses to antigens or epitopes that are commonly expressed (e.g., shared) by tumors, e.g., of a particular type or at a particular stage, etc. (or even in that particular patient). [0005] Without wishing to be bound by any particular theory, the present disclosure proposes that one or more features of a particular tumor, and/or of the particular patient (e.g., its immunological state) may impact effectiveness of different immunotherapeutic approaches (e.g., targeting neoepitopes vs shared epitopes, and/or of targeting particular epitopes). Given that at least some features (e.g., of a patient’s immunological state) can be impacted by environment or other changeable conditions (e.g., diet, rest, stage of tumor, etc.), the present disclosure proposes that strategies described herein (e.g., simultaneously targeting a plurality of neoepitopes and a plurality of shared epitopes) may be particularly useful and/or effective. Furthermore, the Page 1 of 214 12608199v1
Attorney Docket No. 2013237-1122 present disclosure provides an insight that provided technologies may prove particularly effective when considered with respect to a specific patient (and/or a specific tumor) or, alternatively or additionally when considered with respect to a population of patients. For instance, provided technologies may increase the percentage of patients in a population who respond and/or the extent and/or type of response in a specific patient. [0006] The present disclosure appreciates that various technologies have been developed for identifying neoepitopes and/or for assessing their likely usefulness as immunotherapy targets, and also that various technologies have been developed for identifying shared tumor antigens and/or epitopes and/or for assessing their likely usefulness as immunotherapy targets. The present disclosure provides an insight that a therapeutic strategy encouraging host immune responses that target both neoepitopes and shared epitopes can be particularly effective for certain patients, and/or across patient populations. [0007] The present disclosure provides a particular insight that delivering polyepitopic polypeptides that together include both a plurality of neoepitopes and a plurality of shared epitopes can prove surprisingly effective for treatment of cancer (e.g., across population(s) of cancer patients and/or specifically within individual patients). [0008] The disclosure is based, in part, on the recognition that a fraction of individual tumors may not contain targetable shared antigens but may contain individual neoantigens, while a subset of tumors may contain no or a very limited number of predicted neoantigens but may contain targetable shared antigens. Accordingly, the disclosure provides and/or utilized compositions that include both a vaccine comprising shared tumor antigen epitopes (e.g., a first polyepitopic vaccine) and a vaccine comprising individual neoantigen epitopes (e.g., a second polyepitopic vaccine). Additionally, the disclosure is based, in part, on the insight that compositions that include a polyepitopic vaccine that includes a plurality of shared tumor antigen epitopes may be particularly useful for cancer treatment. Without wishing to be bound by any particular theory, in some embodiments, compositions of the disclosure can be beneficial to a broader patient population as compared to a composition comprising only a vaccine comprising shared tumor antigen epitopes and/or as compared to a composition comprising only a vaccine comprising individual neoantigen epitopes. Page 2 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0009] Certain aspects of the present disclosure are based on the recognition that a fully personalized vaccine (e.g., including a first ribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of individualized shared tumor antigen epitopes, and a second ribonucleotide encoding a second polyepitopic vaccine construct comprising a plurality of neoantigen epitopes) may offer certain advantages relative to alternative vaccine strategies, including, e.g., increasing number of subjects that respond to vaccination (e.g., subjects with a relatively low number or quality of neoepitope candidates may benefit; subjects with no or few expressed shared antigens may benefit); increasing likelihood that such vaccine induces relevant T cells (e.g., non-neoantigens with proven immunogenicity and expression can be selected; immunogenicity for neoantigen candidates is usually unknown; and shared neoantigens may rise T cells against antigens that are not expressed); and/or inducing stronger T-cell responses (e.g., such vaccine encodes epitopes instead of full length antigens; more antigens can be targeted; and fewer sequences can be included that do not result in T-cell response (higher effective dose per epitope)). [0010] Aspects of the present disclosure provide a method of producing a cancer vaccine for a subject, the method comprising combining: (a) a first pharmaceutical composition comprising a first polyribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes; and (b) a second pharmaceutical composition comprising a second polyribonucleotide encoding a second polyepitopic vaccine construct comprising a plurality of neoantigen epitopes. [0011] In some embodiments, the plurality of shared tumor antigen epitopes comprises at least one shared tumor antigen epitope (and/or is entirely comprised of shared tumor antigen epitopes) expressed by at least about 15% of subjects having a comparable cancer (e.g., a cancer of the same type and/or stage, etc). In some embodiments, such shared tumor antigen epitope is expressed by at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, or more of such subjects having such comparable cancer. [0012] In some embodiments the method further comprises producing the second pharmaceutical composition. Page 3 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0013] In some embodiments, the method further comprises producing the second pharmaceutical composition by a method comprising: (i) identifying cancer-specific somatic mutations in a tumor specimen of a subject to provide a cancer mutation signature of the subject; and (ii) producing the second pharmaceutical composition comprising the second polyribonucleotide encoding the second polyepitopic vaccine construct, wherein the plurality of neoantigen epitopes comprises a plurality of cancer-specific somatic mutations of the cancer mutation signature identified in step (i). [0014] In some embodiments, the step of identifying cancer-specific somatic mutations comprises using next generation sequencing (NGS). [0015] In some embodiments, the method further comprises producing the second pharmaceutical composition by a method comprising: (i) providing a tumor specimen from the subject and providing a non-tumor specimen; (ii) identifying sequence differences between (1) the genome, exome and/or transcriptome of the tumor specimen and (2) the genome, exome and/or transcriptome of the non-tumor specimen; and (iii) producing the second pharmaceutical composition comprising the second polyribonucleotide, wherein the neoantigen epitopes comprise one or more of the sequence differences identified in step (ii). [0016] In some such embodiments, the step of identifying sequence differences comprises using NGS. [0017] In some embodiments, the non-tumor specimen is from the subject. [0018] In some embodiments, the step of identifying sequence differences comprises identifying sequence differences between the exome of the tumor specimen and the exome of the non-tumor specimen. [0019] In some embodiments, the first polyepitopic vaccine construct comprises about 2 to about 50, or about 2 to about 40, or about 2 to about 30, or about 2 to about 20, or about 2 to about 10 shared tumor antigen epitopes. In some particular embodiments, the first polyepitopic vaccine construct comprises about 10 to about 20 shared tumor antigen epitopes. [0020] In some embodiments, the first polyepitopic vaccine construct comprises shared tumor antigen epitopes from one shared tumor antigen. Page 4 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0021] In some embodiments, the first polyepitopic vaccine construct comprises at least one shared tumor antigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 different shared tumor antigens. [0022] In some embodiments, the second polyepitopic vaccine construct comprises about 2 to about 50, or about 2 to about 40, or about 2 to about 30, or about 2 to about 20, or about 2 to about 10 neoantigen epitopes. In some particular embodiments, the second polyepitopic vaccine construct comprises about 10 to about 20 neoantigen epitopes. [0023] In some embodiments, the second polyepitopic vaccine construct comprises neoantigen epitopes from one neoantigen. [0024] In some embodiments, the second polyepitopic vaccine construct comprises at least one neoantigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 different neoantigens. [0025] In some embodiments, each shared tumor antigen epitope and each neoantigen epitope comprises about 5 to about 50 amino acids, although in many embodiments an epitope may comprise at least about 8 amino acids (e.g., so that an epitope, or each epitope, included in a polyepitopic construct for use in accordance with the present disclosure, comprises about 8 to about 50 amino acids). [0026] In some embodiments, each shared tumor antigen epitope and each neoantigen epitope comprises about 30 amino acids. [0027] In some embodiments, each shared tumor antigen epitope and each neoantigen epitope comprises about 9 amino acids. [0028] In some embodiments, the first polyepitopic construct comprises shared tumor antigen epitopes arranged in a head-to-tail configuration. [0029] In some embodiments, the first polyepitopic construct comprises a linker between each shared tumor antigen epitope. [0030] In some embodiments, the second polyepitopic construct comprises neoantigen epitopes arranged in a head-to-tail configuration. [0031] In some embodiments, the second polyepitopic construct comprises a linker between each neoantigen epitope. Page 5 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0032] In some embodiments, the linker comprises about 3 to about 50 amino acids. [0033] In some embodiments, the linker comprises about 6 to about 30 amino acids. [0034] In some embodiments, the linker comprises (i) glycine amino acids or (ii) serine and glycine amino acids. [0035] In some embodiments, the first and the second polyribonucleotides comprise a 5’ cap, a 5’ UTR, a 3’ UTR, and a polyA tail. [0036] In another aspect, the disclosure provides a pharmaceutical composition comprising: (1) a first polyribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes); and a second polyribonucleotide encoding a second polyepitopic vaccine construct comprising a plurality of neoantigen epitopes; or (2) at least one polyribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes); and a second polyepitopic vaccine construct comprising a plurality of neoantigen epitopes; or (3) at least one polyribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes) and a plurality of neoantigen epitopes. [0037] In some embodiments, the plurality of shared tumor antigen epitopes comprises at least one shared tumor antigen epitope expressed by at least about 15% of subjects having a comparable cancer (e.g., a cancer of the same type and/or stage, etc.). In some embodiments, such shared tumor antigen epitope is expressed by at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, or more of such subjects having such comparable cancer. [0038] In some embodiments, the first polyepitopic vaccine construct comprises about 2 to about 50, or about 2 to about 40, or about 2 to about 30, or about 2 to about 20, or about 2 to about 10 shared tumor antigen epitopes. In some particular embodiments, the first polyepitopic vaccine construct comprises about 10 to about 20 shared tumor antigen epitopes. Page 6 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0039] In some embodiments, the first polyepitopic vaccine construct comprises shared tumor antigen epitopes from one shared tumor antigen. [0040] In some embodiments, the first polyepitopic vaccine construct comprises at least one shared tumor antigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 different shared tumor antigens. [0041] In some embodiments, the second polyepitopic vaccine construct comprises about 2 to about 50, or about 2 to about 40, or about 2 to about 30, or about 2 to about 20, or about 2 to about 10 neoantigen epitopes. In some particular embodiments, the second polyepitopic vaccine construct comprises about 10 to about 20 neoantigen epitopes. [0042] In some embodiments, the second polyepitopic vaccine construct comprises neoantigen epitopes from one neoantigen. [0043] In some embodiments, the second polyepitopic vaccine construct comprises at least one neoantigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 different neoantigens. [0044] In some embodiments, each shared tumor antigen epitope and each neoantigen epitope comprises about 5 to about 50 amino acids, although in many embodiments an epitope may comprise at least about 8 amino acids (e.g., so that an epitope, or each epitope, included in a polyepitopic construct for use in accordance with the present disclosure, comprises about 8 to about 50 amino acids). [0045] In some embodiments, each shared tumor antigen epitope and each neoantigen epitope comprises about 30 amino acids. [0046] In some embodiments, each shared tumor antigen epitope and each neoantigen epitope comprises about 9 amino acids. [0047] In some embodiments, the first polyepitopic construct comprises shared tumor antigen epitopes arranged in a head-to-tail configuration. [0048] In some embodiments, the first polyepitopic construct comprises a linker between each shared tumor antigen epitope. [0049] In some embodiments, the second polyepitopic construct comprises neoantigen epitopes arranged in a head-to-tail configuration. Page 7 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0050] In some embodiments, the second polyepitopic construct comprises a linker between each neoantigen epitope. [0051] In some embodiments, the linker comprises about 3 to about 50 amino acids. [0052] In some embodiments, the linker comprises about 6 to about 30 amino acids. [0053] In some embodiments, the linker comprises (i) glycine amino acids or (ii) serine and glycine amino acids. [0054] In some embodiments, the first and the second polyribonucleotides comprise a 5’ cap, a 5’ UTR, a 3’ UTR, and a polyA tail. [0055] In another aspect, the disclosure provides a method of treating a subject having cancer, the method comprising: (a) administering to the subject a first pharmaceutical composition comprising a first polyribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes; and (b) administering to the subject a second pharmaceutical composition comprising a second polyribonucleotide construct encoding a second polyepitopic vaccine comprising a plurality of neoantigen epitopes. [0056] In some embodiments, the method further comprises before the administering steps, mixing the first pharmaceutical composition and the second composition to form an admixture comprising the first pharmaceutical composition and the second pharmaceutical composition, and administering the admixture to the subject. [0057] In some embodiments, the method further comprises administering the first pharmaceutical composition concurrently with the second pharmaceutical composition. [0058] In some embodiments, the method further comprises administering the second pharmaceutical composition about 1, 2, 3, 4, 5, 10, 15, or 30 minutes before or after administering the first pharmaceutical composition. [0059] In some embodiments, the method further comprises administering the second pharmaceutical composition at least 1 hour, 2 hours, 3 hours, 4 hours, 6 hours, 12 hours, 24 hours, 48 hours, 1 week, 2 weeks, 4 weeks, 6 weeks, 8 weeks, 12 weeks, 16 weeks, 20 weeks, 24 weeks, 36 weeks, 48 weeks, or 52 weeks before or after administering the first pharmaceutical composition. Page 8 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0060] In some embodiments, the plurality of shared tumor antigen epitopes comprises at least one shared tumor antigen epitope expressed by at least about 15% of subjects having a comparable cancer (e.g., a cancer of the same type and/or stage, etc.). In some embodiments, such shared tumor antigen epitope is expressed by at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, or more of such subjects having such comparable cancer. [0061] In some embodiments, the method further comprises producing the second pharmaceutical composition. [0062] In some embodiments, the method further comprises producing the second pharmaceutical by a method comprising: (i) identifying cancer-specific somatic mutations in a tumor specimen of a subject to provide a cancer mutation signature of the subject; and (ii) producing the second pharmaceutical composition comprising the second polyribonucleotide encoding the second polyepitopic vaccine construct, wherein the plurality of neoantigen epitopes comprises a plurality of cancer-specific somatic mutations of the cancer mutation signature identified in step (i). [0063] In some embodiments, the step of identifying cancer-specific somatic mutations comprises using next generation sequencing (NGS). [0064] In some embodiments, the method further comprises producing the second pharmaceutical by a method comprising: (i) providing a tumor specimen from the subject and providing a non-tumor specimen; (ii) identifying sequence differences between (1) the genome, exome and/or transcriptome of the tumor specimen and (2) the genome, exome and/or transcriptome of the non-tumor specimen; and (iii) producing the second pharmaceutical composition comprising the second polyribonucleotide, wherein the neoantigen epitopes comprise one or more of the sequence differences identified in step (ii). [0065] In some embodiments, the step of identifying sequence differences comprises using NGS. [0066] In some embodiments, the non-tumor specimen is from the subject. [0067] In some embodiments, the step of identifying sequence differences comprises identifying sequence differences between the exome of the tumor specimen and the exome of the non-tumor specimen. Page 9 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0068] In some embodiments, the first polyepitopic vaccine construct comprises about 2 to about 50, or about 2 to about 40, or about 2 to about 30, or about 2 to about 20, or about 2 to about 10 shared tumor antigen epitopes. In some particular embodiments, the first polyepitopic vaccine construct comprises about 10 to about 20 shared tumor antigen epitopes. [0069] In some embodiments, the first polyepitopic vaccine construct comprises shared tumor antigen epitopes from one shared tumor antigen. [0070] In some embodiments, the first polyepitopic vaccine construct comprises at least one shared tumor antigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 different shared tumor antigens. [0071] In some embodiments, the second polyepitopic vaccine construct comprises about 2 to about 50, or about 2 to about 40, or about 2 to about 30, or about 2 to about 20, or about 2 to about 10 neoantigen epitopes. In some particular embodiments, the second polyepitopic vaccine construct comprises about 10 to about 20 neoantigen epitopes. [0072] In some embodiments, the second polyepitopic vaccine construct comprises neoantigen epitopes from one neoantigen. [0073] In some embodiments, the second polyepitopic vaccine construct comprises at least one neoantigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 different neoantigens. [0074] In some embodiments, each shared tumor antigen epitope and each neoantigen epitope comprises about 5 to about 50 amino acids, although in many embodiments an epitope may comprise at least about 8 amino acids (e.g., so that an epitope, or each epitope, included in a polyepitopic construct for use in accordance with the present disclosure, comprises about 8 to about 50 amino acids). [0075] In some embodiments, each shared tumor antigen epitope and each neoantigen epitope comprises about 30 amino acids. [0076] In some embodiments, each shared tumor antigen epitope and each neoantigen epitope comprises about 9 amino acids. [0077] In some embodiments, the first polyepitopic construct comprises shared tumor antigen epitopes arranged in a head-to-tail configuration. Page 10 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0078] In some embodiments, the first polyepitopic construct comprises a linker between each shared tumor antigen epitope. [0079] In some embodiments, the second polyepitopic construct comprises neoantigen epitopes arranged in a head-to-tail configuration. [0080] In some embodiments, the second polyepitopic construct comprises a linker between each neoantigen epitope. [0081] In some embodiments, the linker comprises about 3 to about 50 amino acids. [0082] In some embodiments, the linker comprises about 6 to about 30 amino acids. [0083] In some embodiments, the linker comprises (i) glycine amino acids or (ii) serine and glycine amino acids. [0084] In some embodiments, the first and the second polyribonucleotides comprise a 5’ cap, a 5’ UTR, a 3’ UTR, and a polyA tail. [0085] In another aspect, the disclosure provides a method of treating a subject having cancer, the method comprising a step of administering to the subject one or both of a first pharmaceutical composition and a second pharmaceutical composition so that the subject receives both, wherein: (a) the first composition comprises a first polyribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes; and (b) the second pharmaceutical composition comprises a second polyribonucleotide construct encoding a second polyepitopic vaccine comprising a plurality of neoantigen epitopes. [0086] In another aspect, the disclosure provides a method of treating cancer, the method comprising a step of administering to each of a plurality of cancer patients a combination therapy that delivers each of: (a) a first polypeptide comprising a plurality of neoepitopes determined to have arisen in a tumor in the cancer patients; and (b) a second polypeptide comprising a plurality of shared tumor antigen epitopes detectable in a sample from the subject. [0087] In some embodiments, one or both of the first and second polypeptides is delivered by expression from an administered polynucleotide encoding the polypeptide. [0088] In some embodiments, the polynucleotide is a polyribonucleotide. Page 11 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0089] In some embodiments, the first pharmaceutical composition contains no neoantigen epitopes. BRIEF DESCRIPTION OF THE DRAWING [0090] FIG.1 depicts an exemplary workflow for selecting neoantigen epitopes and shared tumor antigen epitopes. [0091] FIG.2 depicts an exemplary workflow for selecting neoantigen epitopes. TSA, Target Selection Algorithm; VAF, Variant Allele Frequency. [0092] FIG.3 depicts an exemplary workflow for selecting neoantigen epitopes. [0093] FIG.4 is a schematic of an exemplary polyribonucleotide encoding a polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes. [0094] FIG.5 is a schematic of an exemplary polyribonucleotide encoding a polyepitopic vaccine construct comprising a plurality of neoantigen epitopes. [0095] FIG.6 depicts two exemplary workflows for selecting individualized shared tumor antigen epitopes. [0096] FIG.7 depicts an exemplary workflow for selecting neoepitopes and non-neoepitopes. [0097] FIG.8 depicts a schematic of an exemplary polyribonucleotide encoding neoepitopes and an exemplary polyribonucleotide encoding individualized non-neoepitopes. [0098] FIG.9 depicts exemplary workflow of RNA vaccine manufacturing process. [0099] FIG.10 depicts the workflow of the study including the different treatments and the planned analyses. Gr., group; i.m., intramuscular; irrel., irrelevant RNA; i.v., intravenous; IVT, in vitro transcription; LNP, lipid nanoparticle; LPX, lipoplex; m1Y, N1-methyl-pseudouridine containing RNA; neo, neoepitope; non neo, non neoepitope; s.c., subcutaneous; U, unmodified RNA; v.f., vena facialis. [0100] FIG.11 depicts vaccine-specific CD8+ T cell expansion at different time points post vaccination.W, weekly; T, triweekly; LPX, lipoplex; LNP, lipid nanoparticle; U, unmodified RNA; Ψ, N1-methyl-pseudouridine containing RNA. Page 12 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0101] FIG.12 depicts tumor size in response to different treatments as indicated at different timepoints as indicated. [0102] FIG.13 is a schematic depicting an exemplary bioinformatics workflow for detecting antigen based neotherapeutic target epitopes (“dANTe pipeline”), as described in Example 3. Abbreviations: BRCA = Breast invasive carcinoma; COAD = Colon adenocarcinoma; HLA = Human Leukocyte antigen; HNSC = Head and Neck squamous cell carcinoma; MHC = Major histocompatibility complex; TAA = Tumor associated antigen; WES = Whole exome sequencing. [0103] FIG.14 is a bar chart depicting frequency of fusion events in 45 target genes across patients; bars are colored according to indications for patients having fusion events in a given target gene. Abbreviations: OV = Ovarian serous cystadenocarcinoma (associated with HOXB13 fusion events); PRAD = Prostate adenocarcinoma (associated with ACPP and/or KLK3 fusion events); SKCM = Skin Cutaneous Melanoma (associated with PMEL and/or TYR fusion events); TNBC_MERIT = Triple negative breast carcinoma_Mutanome engineered RNA immuno- therapy (associated with ACTL8 fusion events). [0104] FIG.15 is a box plot depicting number of non-neoantigen transcripts from an exemplary non-neoantigen input list passing expression filters per indication. Individual patients are shown as dots. Abbreviations: BRCA = Breast invasive carcinoma; COAD = Colon adenocarcinoma; HNSC = Head and neck squamous cell carcinoma; KIRP = Kidney renal papillary cell carcinoma; LIHC = Liver hepatocellular carcinoma; LUAD = Lung adenocarcinoma; LUSC = Lung squamous cell carcinoma; OV = Ovarian serous cystadenocarcinoma; PAAD = Pancreatic adenocarcinoma; PRAD = Prostate adenocarcinoma; READ = Rectum adenocarcinoma; SKCM = Skin Cutaneous Melanoma; STAD = Stomach adenocarcinoma; TNBC_MERIT = Triple negative breast carcinoma_Mutanome engineered RNA immuno-therapy. [0105] FIG.16 is a series of bar charts depicting fractions of patients with at least one transcript of a given gene from the non-neoantigen input list passing a gene-specific expression cutoff filter, according to indication. Abbreviations as in preceding figure. [0106] FIG.17 is a box plot depicting number of mutation events per patient within transcript regions defined by the exemplary input list of non-neoantigens, according to indication. Each dot represents a patient. Somatic mutations are shown in lighter shade (yellow), germline mutations are shown in darker shade (red). Abbreviations as in preceding figure. Page 13 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0107] FIG.18 is a box plot depicting number of 9-mer MHC I potential non-neoepitope targets by indication. Dots represent individual patients. Abbreviations as in preceding figure. [0108] FIG.19 depicts a correlation between number of transcripts after expression filtering and number of generated non-neoepitope potential targets. Each dot represents an individual patient. R = 0.78, p < 2.2 x 10-16, Pearson correlation. Abbreviations as in preceding figure. [0109] FIG.20A is a series of bar charts depicting contribution of individual genes to a non- neoepitope target pool for a subset of indications. Data are pooled across different patients. X- axis represents individual genes, y-axis presents number of generated targets per gene. Only targets for 9-mers are shown. Abbreviations as in preceding figure. [0110] FIG.20B is a series of bar charts depicting contribution of individual genes to a non- neoepitope target pool for a subset of indications. Data are pooled across different patients. X- axis represents individual genes, y-axis presents number of generated targets per gene. Only targets for 9-mers are shown. Abbreviations as in preceding figure. [0111] FIG.21 is a bar chart depicting number of targets per gene contributing to a non- neoepitope target pool. Only genes are displayed which have at least one target in one patient after expression filtering. Abbreviations as in preceding figure. [0112] FIG.22 depicts an exemplary view of independent mapping of generated non-neoepitope targets to a respective transcript sequence (transcript uc010aap.2) for a single patient (COAD patient TCGA-4N-A93T). X-axis represents the combined (spliced) exonic regions for a single transcript in the exemplary non-neoantigen input list according to a sliding window of potential target sequences (n=108 windows). Abbreviations: SNP = Single nucleotide polymorphism. [0113] FIG.23 is a box plot depicting fractions of non-neoepitope targets by indication for which precomputed MHC I binding scores were stored in a Lookup table to reduce computational cost. Abbreviations as in previous figures. [0114] FIG.24 is a box plot depicting number of expressed genes after filtering for each indication (N = 10–16 patients per indication). Abbreviations as in preceding figure. [0115] FIG.25 is a scatter plot depicting variant allele frequency (VAF) for neoepitope targets in tumor tissue (green/lighter shade) or normal tissue (blue/darker shade). Each dot represents an individual neoepitope target mutation. Abbreviations as in preceding figure. Page 14 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0116] FIG.26 is a scatter plot depicting tumor content (x-axis) versus total number of somatic mutations (y-axis). Tumor content does not correlate with number of detected somatic mutations. Each dot represents an individual patient. [0117] FIG.27 is a box plot depicting number of total somatic mutations by indication; Y-axis represents the average+/-SD of mutation events (n = 10–16 patients per indication). Abbreviations as in previous figures. [0118] FIG.28 is a bar chart depicting average +/- SD of selected somatic mutation events (i.e., neoepitope targets) remaining after prioritization, by indication (n = 10–16 patients per indication). Abbreviations as in preceding figure. [0119] FIG.29 is a 2-D plot depicting absolute frequency (y-axis) of occurrence of a neoepitope target for a given target gene (x-axis) across all patients (n = 159 patients). [0120] FIG.30 is a bar chart depicting number of unique target genes across different cancer indications. Between 100 and 500 unique target genes were found (n = 10–16 patients per indication). Abbreviations as in previous figures. [0121] FIG.31 is a bar chart depicting number of neoepitope target genes for 30 common oncogenic genes in the full neoepitope target gene space, with different colors/shading for different indications. [0122] FIG.32 is a scatter plot depicting number of SNVs detected across several cancer indications. Each dot represents a single patient. SNVs were detected using aligned DNA reads of tumor and WT samples. Abbreviations as in previous figures. [0123] FIG.33 is a scatter plot depicting number of InDels detected across several cancer indications. Each dot represents a single patient. InDels were detected using Strelka. Abbreviations as in preceding figure. [0124] FIG.34 is a block flow diagram showing an exemplary bioinformatics workflow for detecting antigen based neotherapeutic target epitopes (“dANTe pipeline”), as described in Example 3 (HLA - human leukocyte antigen; pMHC - peptide-major histocompatibility complex; TAA - tumor associated antigen; WES - whole exome sequence). Page 15 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0125] FIG.35 is a block flow diagram showing a sANTe algorithm to divide targets into different batches according to known epitopes, known ligands or ligands predicted to be presented, according to an illustrative embodiment (b - batch; MHC I/II - major histocompatibility complex I/II). [0126] FIG.36 is a schematic showing targets, target clusters, and combined target clusters, according to an illustrative embodiment. [0127] FIG.37 is a block flow diagram showing selection process of known epitopes from the immune epitope database (IEDB), according to an illustrative embodiment. [0128] FIG.38A is a block flow diagram showing downranking of target clusters and respective thresholds, according to an illustrative embodiment. [0129] FIG.38B is a block flow diagram showing downranking of target clusters, according to an illustrative embodiment. [0130] FIG.38C is a block flow diagram showing downranking of target clusters, according to an illustrative embodiment. [0131] FIG.39 is a box plot showing relative frequency of antigen candidate peptides present in a proteome, according to an illustrative embodiment. [0132] FIG.40 is a box plot showing final target cluster lengths per patient across indications for different sANTe clustering approaches, according to an illustrative embodiment. [0133] FIG.41 is a box plot showing a number of predicted strong MHC I ligands below a threshold of 0.03 in selected target clusters per patient for various indication, according to an illustrative embodiment. [0134] FIG.42 is a box plot showing a number of final target clusters per patient for various indications, according to an illustrative embodiment. [0135] FIG.43 is a bar chart showing percentage of final selected target clusters from a given antigen of a target list across various indications, according to an illustrative embodiment. [0136] FIG.44 is a bar chart showing percentage of patients whose antigen expression passed a respective expression threshold across various indications, according to an illustrative embodiment. Page 16 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0137] FIG.45A is a bar chart showing fractions of final target clusters per indication derived from a specific antigen of a target list with a number on top of the bars representing the number of final target clusters, according to an illustrative embodiment. [0138] FIG.45B is a bar chart showing fractions of final target clusters per indication derived from a specific antigen of a target list with a number on top of the bars representing the number of final target clusters, according to an illustrative embodiment. [0139] FIG.46 is a box plot showing a number of final target clusters available from tier one to tier three antigens and selected using two different sANTe approaches: “No penalty” approach, “Clusters batch” approach, and “Available genes” – a median between two approaches, according to an illustrative embodiment. [0140] FIG.47 is a 2D plot showing final target clusters and available antigens for each analyzed lung adenocarcinoma (LUAD) patient, according to an illustrative embodiment. [0141] FIG.48 is a 2D plot showing final target clusters and available antigens for each analyzed lung squamous cell carcinoma (LUSC) patient, according to an illustrative embodiment. [0142] FIG.49 is a 2D plot showing final target clusters and available antigens for each analyzed ovarian serous cystadenocarcinoma (OV) patient, according to an illustrative embodiment. [0143] FIG.50 is a 2D plot showing final target clusters and available antigens for each analyzed prostate adenocarcinoma (PRAD) patient, according to an illustrative embodiment. [0144] FIG.51 is a 2D plot showing final target clusters and available antigens for each analyzed skin cutaneous melanoma (SKCM) patient, according to an illustrative embodiment. [0145] FIG.52 is a 2D plot showing final target clusters and available antigens for each analyzed triple negative breast carcinoma (TNBC) patient, according to an illustrative embodiment. [0146] FIG.53A is a block flow diagram showing a method for selecting neoantigen epitopes for inclusion in a polyepitopic vaccine, according to an illustrative embodiment. Page 17 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0147] FIG.53B is a block flow diagram showing a method for selecting non-neoantigen epitopes for inclusion in a polyepitopic vaccine, according to an illustrative embodiment. [0148] FIG.54 is a block flow diagram showing a method for identifying polynucleotide sequences of antigens for inclusion in a polyepitopic vaccine, according to an illustrative embodiment. [0149] FIG.55 is a block flow diagram showing a division of candidate polynucleotide sequences into batches, according to an illustrative embodiment. [0150] FIG.56A is a block flow diagram showing ordering of candidate polynucleotide sequences within a batch, according to an illustrative embodiment. [0151] FIG.56B is a block flow diagram showing ordering of candidate polynucleotide sequences within batches, according to an illustrative embodiment. [0152] FIG.57 is a block flow diagram showing determination of placement of candidate polynucleotide sequences in a polyepitopic construct using placement costs, according to an illustrative embodiment. [0153] FIG.58 is a block diagram of an exemplary cloud computing environment, used in certain embodiments. [0154] FIG.59 is a block diagram of an example computing device and an example mobile computing device used in certain embodiments. DEFINITIONS [0155] About: The term “about”, when used herein in reference to a value, refers to a value that is similar, in context to the referenced value. In general, those skilled in the art, familiar with the context, will appreciate the relevant degree of variance encompassed by “about” in that context. For example, in some embodiments, the term “about” may encompass a range of values that within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less of the referred value. [0156] Agent: As used herein, the term “agent,” may refer to a physical entity. In some embodiments, an agent may be characterized by a particular feature and/or effect. For example, Page 18 of 214 12608199v1
Attorney Docket No. 2013237-1122 as used herein, the term “therapeutic agent” refers to a physical entity has a therapeutic effect and/or elicits a desired biological and/or pharmacological effect. In some embodiments, an agent may be a compound, molecule, or entity of any chemical class including, for example, a small molecule, polypeptide, nucleic acid, saccharide, lipid, metal, or a combination or complex thereof. In some embodiments, part or all of an agent may be depicted herein as a chemical structure, or may be described using chemical nomenclature and/or with reference to general principles of organic chemistry, e.g., in accordance with the Periodic Table of Elements, CAS version, Handbook of Chemistry and Physics, 75th Ed; “Organic Chemistry”, Thomas Sorrell, University Science Books, Sausalito: 1999, and/or “March’s Advanced Organic Chemistry”, 5th Ed., Ed.: Smith, M.B. and March, J., John Wiley & Sons, New York: 2001, the entire contents of which are hereby incorporated by reference. Unless otherwise stated or clear from context, chemical structures depicted herein may be considered to reference or include one or more, or all, stereoisomeric (e.g., enantiomeric or diastereomeric) forms of the structure, and/or one or more, or all, geometric or conformational isomeric forms of the structure. For example, unless otherwise indicated or clear, both R and S configurations of a stereocenter may be contemplated in embodiments of the disclosure. In some embodiments, a compound may be described and/or utilized as a particular single stereochemical isomer; alternatively or additionally, in some embodiments, such a compound may be described and/or utilized as a combination (e.g., a mixture) of one or more enantiomeric (e.g., diastereomeric) forms (e.g., as a racemic preparation). Analogously, in some embodiments, a single geometric isomer may be described and/or utilized; in some embodiments, a combination (e.g., a mixture) of geometric (or conformational) isomers may be described and/or utilized. Unless otherwise stated or clear from context, all tautomeric forms of provided compounds are within the scope of the disclosure. Still further, unless otherwise indicated or clear from context, in some embodiments, a particular chemical compound (e.g., as may be represented by a depicted chemical structure) may be described and/or utilized in an alternative isotopic form – i.e., in a form in which one or more atoms is isotopically altered (e.g., so that a hydrogen is replaced by deuterium or tritium, and/or a carbon is replaced by 13C- or 14C-. Thus, in some embodiments, a particular compound may be described and/or utilized as or in an isotopically enriched preparation. [0157] Amino acid: In its broadest sense, as used herein, the term “amino acid” refers to a compound and/or substance that can be, is, or has been incorporated into a polypeptide chain, Page 19 of 214 12608199v1
Attorney Docket No. 2013237-1122 e.g., through formation of one or more peptide bonds. In some embodiments, an amino acid has the general structure H2N–C(H)(R)–COOH. In some embodiments, an amino acid is a naturally- occurring amino acid. In some embodiments, an amino acid is a non-natural amino acid; in some embodiments, an amino acid is a D-amino acid; in some embodiments, an amino acid is an L- amino acid. “Standard amino acid” refers to any of the twenty standard L-amino acids commonly found in naturally occurring peptides. “Nonstandard amino acid” refers to any amino acid, other than the standard amino acids, regardless of whether it is prepared synthetically or obtained from a natural source. In some embodiments, an amino acid, including a carboxy- and/or amino-terminal amino acid in a polypeptide, can contain a structural modification as compared with the general structure above. For example, in some embodiments, an amino acid may be modified by methylation, amidation, acetylation, pegylation, glycosylation, phosphorylation, and/or substitution (e.g., of the amino group, the carboxylic acid group, one or more protons, and/or the hydroxyl group) as compared with the general structure. In some embodiments, such modification may, for example, alter the circulating half-life of a polypeptide containing the modified amino acid as compared with one containing an otherwise identical unmodified amino acid. In some embodiments, such modification does not significantly alter a relevant activity of a polypeptide containing the modified amino acid, as compared with one containing an otherwise identical unmodified amino acid. As will be clear from context, in some embodiments, the term “amino acid” may be used to refer to a free amino acid; in some embodiments it may be used to refer to an amino acid residue of a polypeptide. [0158] Antigen: term “antigen”, as used herein, refers to an agent that elicits an immune response; and/or (ii) an agent that binds to a T cell receptor (e.g., when presented by an MHC molecule) or to an antibody. In some embodiments, an antigen elicits a humoral response (e.g., including production of antigen-specific antibodies); in some embodiments, an elicits a cellular response (e.g., involving T-cells whose receptors specifically interact with the antigen). In some embodiments, and antigen binds to an antibody and may or may not induce a particular physiological response in an organism. In general, an antigen may be or include any chemical entity such as, for example, a small molecule, a nucleic acid, a polypeptide, a carbohydrate, a lipid, a polymer (in some embodiments other than a biologic polymer [e.g., other than a nucleic acid or amino acid polymer) etc. In some embodiments, an antigen is or comprises a polypeptide. In some embodiments, an antigen is or comprises a glycan. Those of ordinary skill Page 20 of 214 12608199v1
Attorney Docket No. 2013237-1122 in the art will appreciate that, in general, an antigen may be provided in isolated or pure form, or alternatively may be provided in crude form (e.g., together with other materials, for example in an extract such as a cellular extract or other relatively crude preparation of an antigen-containing source). In some embodiments, antigens utilized in accordance with the present invention are provided in a crude form. In some embodiments, an antigen is a recombinant antigen. [0159] Associated: Two events or entities are “associated” with one another, as that term is used herein, if the presence, level, degree, type and/or form of one is correlated with that of the other. For example, a particular entity (e.g., polypeptide, genetic signature, metabolite, microbe, etc.) is considered to be associated with a particular disease, disorder, or condition, if its presence, level and/or form correlates with incidence of, susceptibility to, severity of, stage of, etc. the disease, disorder, or condition (e.g., across a relevant population). In some embodiments, two or more entities are physically “associated” with one another if they interact, directly or indirectly, so that they are and/or remain in physical proximity with one another. In some embodiments, two or more entities that are physically associated with one another are covalently linked to one another; in some embodiments, two or more entities that are physically associated with one another are not covalently linked to one another but are non-covalently associated, for example by means of hydrogen bonds, van der Waals interaction, hydrophobic interactions, magnetism, and combinations thereof. [0160] Cancer: The term “cancer” is used herein to generally refer to a disease or condition in which cells of a tissue of interest exhibit relatively abnormal, uncontrolled, and/or autonomous growth, so that they exhibit an aberrant growth phenotype characterized by a significant loss of control of cell proliferation. In some embodiments, cancer may comprise cells that are precancerous (e.g., benign), malignant, pre-metastatic, metastatic, and/or non-metastatic. In some embodiments, cancer may be characterized by a solid tumor. In some embodiments, cancer may be characterized by a hematologic tumor. In general, examples of different types of cancers known in the art include, for example, triple negative breast cancer (TNBC), hematopoietic cancers including leukemias, lymphomas (Hodgkin’s and non-Hodgkin’s), myelomas and myeloproliferative disorders; sarcomas, melanomas, adenomas, carcinomas of solid tissue, squamous cell carcinomas of the mouth, throat, larynx, and lung, liver cancer, genitourinary cancers such as prostate, cervical, bladder, uterine, and endometrial cancer and Page 21 of 214 12608199v1
Attorney Docket No. 2013237-1122 renal cell carcinomas, bone cancer, pancreatic cancer, skin cancer, cutaneous or intraocular melanoma, cancer of the endocrine system, cancer of the thyroid gland, cancer of the parathyroid gland, head and neck cancers, ovarian cancer, breast cancer, glioblastomas, colorectal cancer, gastro-intestinal cancers and nervous system cancers, benign lesions such as papillomas, and the like. [0161] Comparable: As used herein, the term “comparable” refers to two or more agents, entities, situations, sets of conditions, etc., that may not be identical to one another but that are sufficiently similar to permit comparison therebetween so that one skilled in the art will appreciate that conclusions may reasonably be drawn based on differences or similarities observed. In some embodiments, comparable sets of conditions, circumstances, individuals, or populations are characterized by a plurality of substantially identical features and one or a small number of varied features. Those of ordinary skill in the art will understand, in context, what degree of identity is required in any given circumstance for two or more such agents, entities, situations, sets of conditions, etc., to be considered comparable. For example, those of ordinary skill in the art will appreciate that sets of circumstances, individuals, or populations are comparable to one another when characterized by a sufficient number and type of substantially identical features to warrant a reasonable conclusion that differences in results obtained or phenomena observed under or with different sets of circumstances, individuals, or populations are caused by or indicative of the variation in those features that are varied. [0162] Corresponding to: As used herein, the term “corresponding to” refers to a relationship between two or more entities. For example, the term “corresponding to” may be used to designate the position/identity of a structural element in a compound or composition relative to another compound or composition (e.g., to an appropriate reference compound or composition). For example, in some embodiments, a monomeric residue in a polymer (e.g., an amino acid residue in a polypeptide or a nucleic acid residue in a polynucleotide) may be identified as “corresponding to” a residue in an appropriate reference polymer. For example, those of ordinary skill will appreciate that, for purposes of simplicity, residues in a polypeptide are often designated using a canonical numbering system based on a reference related polypeptide, so that an amino acid “corresponding to” a residue at position 190, for example, need not actually be the 190th amino acid in a particular amino acid chain but rather corresponds to the residue found at Page 22 of 214 12608199v1
Attorney Docket No. 2013237-1122 190 in the reference polypeptide; those of ordinary skill in the art readily appreciate how to identify “corresponding” amino acids. For example, those skilled in the art will be aware of various sequence alignment strategies, including software programs such as, for example, BLAST, CS-BLAST, CUSASW++, DIAMOND, FASTA, GGSEARCH/GLSEARCH, Genoogle, HMMER, HHpred/HHsearch, IDF, Infernal, KLAST, USEARCH, parasail, PSI- BLAST, PSI-Search, ScalaBLAST, Sequilab, SAM, SSEARCH, SWAPHI, SWAPHI-LS, SWIMM, or SWIPE that can be utilized, for example, to identify “corresponding” residues in polypeptides and/or nucleic acids in accordance with the present disclosure. Those of skill in the art will also appreciate that, in some instances, the term “corresponding to” may be used to describe an event or entity that shares a relevant similarity with another event or entity (e.g., an appropriate reference event or entity). To give but one example, a gene or protein in one organism may be described as “corresponding to” a gene or protein from another organism in order to indicate, in some embodiments, that it plays an analogous role or performs an analogous function and/or that it shows a particular degree of sequence identity or homology, or shares a particular characteristic sequence element. [0163] Dosing regimen: Those skilled in the art will appreciate that the term “dosing regimen” (or “therapeutic regimen”) may be used to refer to a set of unit doses (typically more than one) that are administered individually to a subject, typically separated by periods of time. In some embodiments, a given therapeutic agent has a recommended dosing regimen, which may involve one or more doses. [0164] Encode: As used herein, the term “encode” or “encoding” refers to sequence information of a first molecule that guides production of a second molecule having a defined sequence of nucleotides (e.g., a polyribonucleotide) or a defined sequence of amino acids. For example, a DNA molecule can encode an RNA molecule (e.g., by a transcription process that includes a DNA-dependent RNA polymerase enzyme). An RNA molecule can encode a polypeptide (e.g., by a translation process). Thus, a gene, a cDNA, or an RNA molecule encodes a polypeptide if transcription and translation of RNA corresponding to that gene produces the polypeptide in a cell or other biological system. In some embodiments, a coding region of a polyribonucleotide encoding a target antigen refers to a coding strand, the nucleotide sequence of which is identical to the polyribonucleotide sequence of such a target antigen. In some embodiments, a coding Page 23 of 214 12608199v1
Attorney Docket No. 2013237-1122 region of a polyribonucleotide encoding a target antigen refers to a non-coding strand of such a target antigen, which may be used as a template for transcription of a gene or cDNA. [0165] Epitope: As used herein, the term “epitope” refers to a moiety that is specifically recognized by an immune system (e.g., an immune system component) of a subject. For example, in some embodiments, an epitope may be a moiety that is specifically recognized by a T cell, a B cell, an immunoglobulin (e.g., antibody or receptor), immunoglobulin (e.g., antibody or receptor), binding component or an aptamer. In some embodiments, an epitope is comprised of a plurality of chemical atoms or groups on an antigen. In some embodiments, such chemical atoms or groups are surface-exposed when the antigen adopts a relevant three-dimensional conformation. In some embodiments, such chemical atoms or groups are physically near to each other in space when the antigen adopts such a conformation. In some embodiments, at least some such chemical atoms are groups are physically separated from one another when the antigen adopts an alternative conformation (e.g., is linearized). [0166] Expression: As used herein, the term “expression” of a nucleic acid sequence refers to the generation of a gene product from the nucleic acid sequence. In some embodiments, a gene product can be a transcript, e.g., a polyribonucleotide as provided herein. In some embodiments, a gene product can be a polypeptide. In some embodiments, expression of a nucleic acid sequence involves one or more of the following: (1) production of an RNA template from a DNA sequence (e.g., by transcription); (2) processing of an RNA transcript (e.g., by splicing, editing, etc.); (3) translation of an RNA into a polypeptide or protein; and/or (4) post-translational modification of a polypeptide or protein. [0167] Homology: As used herein, the term “homology” or “homolog” refers to the overall relatedness between polynucleotide molecules (e.g., DNA molecules and/or RNA molecules) and/or between polypeptide molecules. In some embodiments, polynucleotide molecules (e.g., DNA molecules and/or RNA molecules) and/or polypeptide molecules are considered to be “homologous” to one another if their sequences are at least 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% identical. In some embodiments, polynucleotide molecules (e.g., DNA molecules and/or RNA molecules) and/or polypeptide molecules are considered to be “homologous” to one another if their sequences are at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or Page 24 of 214 12608199v1
Attorney Docket No. 2013237-1122 99% similar (e.g., containing residues with related chemical properties at corresponding positions). For example, as is well known by those of ordinary skill in the art, certain amino acids are typically classified as similar to one another as “hydrophobic” or “hydrophilic” amino acids, and/or as having “polar” or “non-polar” side chains. Substitution of one amino acid for another of the same type may often be considered a “homologous” substitution. [0168] Identity: As used herein, the term “identity” refers to the overall relatedness between polynucleotide molecules (e.g., DNA molecules and/or RNA molecules) and/or between polypeptide molecules. In some embodiments, polynucleotide molecules (e.g., DNA molecules and/or RNA molecules) and/or between polypeptide molecules are considered to be “substantially identical” to one another if their sequences are at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical. Calculation of the percent identity of two nucleic acid or polypeptide sequences, for example, can be performed by aligning the two sequences for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second sequence for optimal alignment and non-identical sequences can be disregarded for comparison purposes). In certain embodiments, the length of a sequence aligned for comparison purposes is at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or substantially 100% of the length of a reference sequence. The nucleotides at corresponding positions are then compared. When a position in the first sequence is occupied by the same residue (e.g., nucleotide or amino acid) as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which needs to be introduced for optimal alignment of the two sequences. The comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. For example, the percent identity between two nucleotide sequences can be determined using the algorithm of Meyers and Miller, 1989, which has been incorporated into the ALIGN program (version 2.0). In some exemplary embodiments, nucleic acid sequence comparisons made with the ALIGN program use a PAM120 weight residue table, a gap length penalty of 12 and a gap penalty of 4. The percent identity between two nucleotide sequences can, alternatively, be determined using the GAP program in the GCG software package using an NWSgapdna.CMP matrix. Page 25 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0169] Increased, Induced, or Reduced: As used herein, these terms or grammatically comparable comparative terms, indicate values that are relative to a comparable reference measurement. For example, in some embodiments, an assessed value achieved with a provided composition (e.g., a pharmaceutical composition) may be “increased” relative to that obtained with a comparable reference composition. Alternatively or additionally, in some embodiments, an assessed value achieved in a subject may be “increased” relative to that obtained in the same subject under different conditions (e.g., prior to or after an event; or presence or absence of an event such as administration of a composition (e.g., a pharmaceutical composition) as described herein, or in a different, comparable subject (e.g., in a comparable subject that differs from the subject of interest in prior exposure to a condition, e.g., absence of administration of a composition (e.g., a pharmaceutical composition) as described herein.). In some embodiments, comparative terms refer to statistically relevant differences (e.g., that are of a prevalence and/or magnitude sufficient to achieve statistical relevance). Those skilled in the art will be aware, or will readily be able to determine, in a given context, a degree and/or prevalence of difference that is required or sufficient to achieve such statistical significance. In some embodiments, the term “reduced” or equivalent terms refers to a reduction in the level of an assessed value by at least 5%, at least 10%, at least 20%, at least 50%, at least 75% or higher, as compared to a comparable reference. In some embodiments, the term “reduced” or equivalent terms refers to a complete or essentially complete inhibition, i.e., a reduction to zero or essentially to zero. In some embodiments, the term “increased” or “induced” refers to an increase in the level of an assessed value by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 80%, at least 100%, at least 200%, at least 500%, or higher, as compared to a comparable reference. [0170] In order: As used herein with reference to a polynucleotide or polyribonucleotide, “in order” refers to the order of features from 5' to 3' along the polynucleotide or polyribonucleotide. As used herein with reference to a polypeptide, “in order” refers to the order of features moving from the N-terminal-most of the features to the C-terminal-most of the features along the polypeptide. “In order” does not mean that no additional features can be present among the listed features. For example, if Features A, B, and C of a polynucleotide are described herein as being “in order, Feature A, Feature B, and Feature C,” this description does not exclude, e.g., Feature D being located between Features A and B. Page 26 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0171] Individualized shared tumor antigen: As used herein, an “individualized shared tumor antigen” is a shared tumor antigen that is expressed in an individual subject. [0172] Individualized shared tumor antigen epitope: As used herein, an “individualized shared tumor antigen epitope” is a shared tumor antigen epitope that is expressed in an individual subject. [0173] Linker: As used herein, the term “linker” refers to a portion of a polypeptide that connects different regions, portions, or antigens to one another. [0174] Lipid: As used herein, the terms “lipid” and “lipid-like material” are broadly defined as molecules which comprise one or more hydrophobic moieties or groups and optionally also one or more hydrophilic moieties or groups. Molecules comprising hydrophobic moieties and hydrophilic moieties are also typically denoted as amphiphiles. [0175] Neoantigen: As used herein, the term “neoantigen” refers to an antigen that is not present in a reference, such as a normal non-cancerous or germline cell, but is present in a cancer cell. In some embodiments, a neoantigen includes one or more mutations relative to a corresponding antigen present in a normal non-cancerous or germline cell. [0176] Neoantigen epitope: As used herein, the term “neoantigen epitope” refers to an epitope that is not present in a reference, such as a normal non-cancerous or germline cell, but is present in a cancer cell. [0177] Non-neoantigen: As used herein, the term “non-neoantigen” refers to a tumor antigen that is not a neoantigen. In some embodiments, a non-neoantigen is a shared tumor antigen. In some embodiments, a non-neoantigen is an individualized shared tumor antigen. [0178] Non-neoantigen epitope: As used herein, the term “non-neoantigen epitope” refers to a tumor epitope that is not a neoantigen epitope. In some embodiments, a non-neoantigen epitope is a shared tumor antigen epitope. In some embodiments, a non-neoantigen epitope is an individualized shared tumor antigen epitope. [0179] Nucleic acid/ Polynucleotide: As used herein, the term “nucleic acid” refers to a polymer of at least 10 nucleotides or more. In some embodiments, a nucleic acid is or comprises DNA. In some embodiments, a nucleic acid is or comprises RNA. In some embodiments, a nucleic acid is or comprises peptide nucleic acid (PNA). In some embodiments, a nucleic acid is Page 27 of 214 12608199v1
Attorney Docket No. 2013237-1122 or comprises a single stranded nucleic acid. In some embodiments, a nucleic acid is or comprises a double-stranded nucleic acid. In some embodiments, a nucleic acid comprises both single and double-stranded portions. In some embodiments, a nucleic acid comprises a backbone that comprises one or more phosphodiester linkages. In some embodiments, a nucleic acid comprises a backbone that comprises both phosphodiester and non-phosphodiester linkages. For example, in some embodiments, a nucleic acid may comprise a backbone that comprises one or more phosphorothioate or 5'-N-phosphoramidite linkages and/or one or more peptide bonds, e.g., as in a “peptide nucleic acid”. In some embodiments, a nucleic acid comprises one or more, or all, natural residues (e.g., adenine, cytosine, deoxyadenosine, deoxycytidine, deoxyguanosine, deoxythymidine, guanine, thymine, uracil). In some embodiments, a nucleic acid comprises on or more, or all, non-natural residues. In some embodiments, a non-natural residue comprises a nucleoside analog (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3 - methyl adenosine, 5-methylcytidine, C-5 propynyl-cytidine, C-5 propynyl-uridine, 2- aminoadenosine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5 - propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, 6-O-methylguanine, 2-thiocytidine, methylated bases, intercalated bases, and combinations thereof). In some embodiments, a non-natural residue comprises one or more modified sugars (e.g., 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose) as compared to those in natural residues. In some embodiments, a nucleic acid has a nucleotide sequence that encodes a functional gene product such as an RNA or polypeptide. In some embodiments, a nucleic acid has a nucleotide sequence that comprises one or more introns. In some embodiments, a nucleic acid may be prepared by isolation from a natural source, enzymatic synthesis (e.g., by polymerization based on a complementary template, e.g., in vivo or in vitro), reproduction in a recombinant cell or system, or chemical synthesis. In some embodiments, a nucleic acid is at least 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 20, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10,000, 10,500, 11,000, 11,500, 12,000, 12,500, 13,000, 13,500, 14,000, 14,500, 15,000, 15,500, 16,000, 16,500, 17,000, 17,500, 18,000, 18,500, 19,000, 19,500, or 20,000 or more residues or nucleotides long. Page 28 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0180] Pharmaceutically effective amount: The term “pharmaceutically effective amount” or “therapeutically effective amount” refers to the amount which achieves a desired reaction or a desired effect alone or together with further doses. In the case of the treatment of a particular disease (e.g., cancer), a desired reaction in some embodiments relates to inhibition of the course of the disease (e.g., cancer). In some embodiments, such inhibition may comprise slowing down the progress of a disease (e.g., cancer) and/or interrupting or reversing the progress of the disease (e.g., cancer). In some embodiments, a desired reaction in a treatment of a disease (e.g., cancer) may be or comprise delay or prevention of the onset of a disease (e.g., cancer) or a condition (e.g., a cancer associated condition). An effective amount of a composition (e.g., a pharmaceutical composition) described herein will depend, for example, on disease (e.g., cancer) or a condition (e.g., a cancer associated condition) to be treated, the severity of such a disease (e.g., cancer) or a condition (e.g., a cancer associated condition), individual parameters of the patient, including, e.g., age, physiological condition, size and weight, the duration of treatment, the type of an accompanying therapy (if present), the specific route of administration and similar factors. Accordingly, doses of a composition (e.g., a pharmaceutical composition) described herein may depend on various of such parameters. In the case that a reaction in a patient is insufficient with an initial dose, higher doses (or effectively higher doses achieved by a different, more localized route of administration) may be used. [0181] Polypeptide: As used herein, the term “polypeptide” refers to a polymeric chain of amino acids. In some embodiments, a polypeptide has an amino acid sequence that occurs in nature. In some embodiments, a polypeptide has an amino acid sequence that does not occur in nature. In some embodiments, a polypeptide has an amino acid sequence that is engineered in that it is designed and/or produced through action of the hand of man. In some embodiments, a polypeptide may comprise or consist of natural amino acids, non-natural amino acids, or both. In some embodiments, a polypeptide may comprise or consist of only natural amino acids or only non-natural amino acids. In some embodiments, a polypeptide may comprise D-amino acids, L- amino acids, or both. In some embodiments, a polypeptide may comprise only D-amino acids. In some embodiments, a polypeptide may comprise only L-amino acids. In some embodiments, a polypeptide may include one or more pendant groups or other modifications, e.g., modifying or attached to one or more amino acid side chains, at the polypeptide’s N-terminus, at the polypeptide’s C-terminus, or any combination thereof. In some embodiments, such pendant Page 29 of 214 12608199v1
Attorney Docket No. 2013237-1122 groups or modifications comprise acetylation, amidation, lipidation, methylation, pegylation, etc., including combinations thereof. In some embodiments, a polypeptide may be cyclic, and/or may comprise a cyclic portion. In some embodiments, a polypeptide is not cyclic and/or does not comprise any cyclic portion. In some embodiments, a polypeptide is linear. In some embodiments, a polypeptide may be or comprise a stapled polypeptide. In some embodiments, the term “polypeptide” may be appended to a name of a reference polypeptide, activity, or structure; in such instances it is used herein to refer to polypeptides that share the relevant activity or structure and thus can be considered to be members of the same class or family of polypeptides. For each such class, the present specification provides and/or those skilled in the art will be aware of exemplary polypeptides within the class whose amino acid sequences and/or functions are known; in some embodiments, such exemplary polypeptides are reference polypeptides for the polypeptide class or family. In some embodiments, a member of a polypeptide class or family shows significant sequence homology or identity with, shares a common sequence motif (e.g., a characteristic sequence element) with, and/or shares a common activity (in some embodiments at a comparable level or within a designated range) with a reference polypeptide of the class; in some embodiments with all polypeptides within the class). For example, in some embodiments, a member polypeptide shows an overall degree of sequence homology or identity with a reference polypeptide that is at least about 30-40%, and is often greater than about 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more and/or includes at least one region (e.g., a conserved region that may in some embodiments be or comprise a characteristic sequence element) that shows very high sequence identity, often greater than 90% or even 95%, 96%, 97%, 98%, or 99%. Such a conserved region usually encompasses at least 3-4 and often up to 35 or more amino acids; in some embodiments, a conserved region encompasses at least one stretch of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 or more contiguous amino acids. In some embodiments, a relevant polypeptide may comprise or consist of a fragment of a parent polypeptide. [0182] Prevent: As used herein, the term “prevent” or “prevention” when used in connection with the occurrence of a disease, disorder, and/or condition, refers to reducing the risk of developing the disease, disorder and/or condition and/or to delaying onset of one or more characteristics or symptoms of the disease, disorder or condition. Prevention may be considered Page 30 of 214 12608199v1
Attorney Docket No. 2013237-1122 complete when onset of a disease, disorder or condition has been delayed for a predefined period of time. [0183] Reference: As used herein, the term “reference” describes a standard or control relative to which a comparison is performed. For example, in some embodiments, an agent, animal, individual, population, sample, sequence or value of interest is compared with a reference or control agent, animal, individual, population, sample, sequence or value. In some embodiments, a reference or control is tested and/or determined substantially simultaneously with the testing or determination of interest. In some embodiments, a reference or control is a historical reference or control, optionally embodied in a tangible medium. Typically, as would be understood by those skilled in the art, a reference or control is determined or characterized under comparable conditions or circumstances to those under assessment. Those skilled in the art will appreciate when sufficient similarities are present to justify reliance on and/or comparison to a particular possible reference or control. [0184] Ribonucleic acid (RNA) or Polyribonucleotide: As used herein, the term “ribonucleic acid,” “RNA,” or “polyribonucleotide” refers to a polymer of ribonucleotides. In some embodiments, an RNA is single stranded. In some embodiments, an RNA is double stranded. In some embodiments, an RNA comprises both single and double stranded portions. In some embodiments, an RNA can comprise a backbone structure as described in the definition of “Nucleic acid / Polynucleotide” above. An RNA can be a regulatory RNA (e.g., siRNA, microRNA, etc.), or a messenger RNA (mRNA). In some embodiments, an RNA is a mRNA. In some embodiments, where an RNA is a mRNA, a RNA typically comprises at its 3' end a poly(A) region. In some embodiments, where an RNA is a mRNA, an RNA typically comprises at its 5' end an art-recognized cap structure, e.g., for recognizing and attachment of a mRNA to a ribosome to initiate translation. In some embodiments, a RNA is a synthetic RNA. Synthetic RNAs include RNAs that are synthesized in vitro (e.g., by enzymatic synthesis methods and/or by chemical synthesis methods). [0185] Ribonucleotide: As used herein, the term “ribonucleotide” encompasses unmodified ribonucleotides and modified ribonucleotides. For example, unmodified ribonucleotides include the purine bases adenine (A) and guanine (G), and the pyrimidine bases cytosine (C) and uracil (U). Modified ribonucleotides may include one or more modifications including, but not limited Page 31 of 214 12608199v1
Attorney Docket No. 2013237-1122 to, for example, (a) end modifications, e.g., 5' end modifications (e.g., phosphorylation, dephosphorylation, conjugation, inverted linkages, etc.), 3' end modifications (e.g., conjugation, inverted linkages, etc.), (b) base modifications, e.g. , replacement with modified bases, stabilizing bases, destabilizing bases, or bases that base pair with an expanded repertoire of partners, or conjugated bases, (c) sugar modifications (e.g., at the 2' position or 4' position) or replacement of the sugar, and (d) internucleoside linkage modifications, including modification or replacement of the phosphodiester linkages. The term “ribonucleotide” also encompasses ribonucleotide triphosphates including modified and non-modified ribonucleotide triphosphates. [0186] RNA lipid nanoparticle: As used herein, the term “RNA lipid nanoparticle” refers to a nanoparticle comprising at least one lipid and RNA molecule(s), e.g., one or more polyribonucleotides as provided herein. In some embodiments, an RNA lipid nanoparticle comprises at least one cationic amino lipid. In some embodiments, an RNA lipid nanoparticle comprises at least one cationic amino lipid, at least one helper lipid, and at least one polymer- conjugated lipid (e.g., PEG-conjugated lipid). In various embodiments, RNA lipid nanoparticles as described herein can have an average size (e.g., Z-average) of about 100 nm to 1000 nm, or about 200 nm to 900 nm, or about 200 nm to 800 nm, or about 250 nm to about 700 nm. In some embodiments of the present disclosure, RNA lipid nanoparticles can have a particle size (e.g., Z- average) of about 30 nm to about 200 nm, or about 30 nm to about 150 nm, about 40 nm to about 150 nm, about 50 nm to about 150 nm, about 60 nm to about 130 nm, about 70 nm to about 110 nm, about 70 nm to about 100 nm, about 80 nm to about 100 nm, about 90 nm to about 100 nm, about 70 to about 90 nm, about 80 nm to about 90 nm, or about 70 nm to about 80 nm. In some embodiments, an average size of lipid nanoparticles is determined by measuring the average particle diameter. In some embodiments, RNA lipid nanoparticles may be prepared by mixing lipids with RNA molecules described herein. [0187] Secretory signal: As used herein, the term “secretory signal” refers to an amino acid sequence motif that targets associated polypeptides for translocation to a secretory pathway. [0188] Shared tumor antigen: As used herein, the term “shared tumor antigen” refers to a tumor antigen expressed by a large fraction of cancers. In some embodiments, a “shared tumor antigen” is a tumor antigen expressed by a large fraction of cancers of the same type and/or a large fraction of cancers of different types. In some embodiments, a “shared tumor antigen” is a Page 32 of 214 12608199v1
Attorney Docket No. 2013237-1122 tumor antigen shared by a large fraction of different subjects having the same cancer type and/or different cancer types. With reference to “shared tumor antigen”, the term “large fraction” refers to at least 15%. [0189] Shared tumor antigen epitope: As used herein, the term “shared tumor antigen epitope” refers to an epitope of and/or derived from a shared tumor antigen. [0190] Subject: As used herein, the term “subject” refers to an organism to be administered with a composition described herein, e.g., for experimental, diagnostic, prophylactic, and/or therapeutic purposes. Typical subjects include animals (e.g., mammals such as mice, rats, rabbits, non-human primates, domestic pets, etc.) and humans. In some embodiments, a subject is a human subject. In some embodiments, a subject is suffering from a disease, disorder, or condition (e.g., cancer and/or a cancer-associated condition). In some embodiments, a subject is susceptible to a disease, disorder, or condition (e.g., cancer and/or a cancer-associated condition). In some embodiments, a subject displays one or more symptoms or characteristics of a disease, disorder, or condition (e.g., cancer and/or a cancer-associated condition). In some embodiments, a subject displays one or more non-specific symptoms of a disease, disorder, or condition (e.g., cancer and/or a cancer-associated condition). In some embodiments, a subject does not display any symptom or characteristic of a disease, disorder, or condition (e.g., cancer and/or a cancer- associated condition). In some embodiments, a subject is someone with one or more features characteristic of susceptibility to or risk of a disease, disorder, or condition (e.g., cancer and/or a cancer-associated condition). In some embodiments, a subject is a patient. In some embodiments, a subject is an individual to whom diagnosis and/or therapy is and/or has been administered. [0191] Suffering from: An individual who is “suffering from” a disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition) has been diagnosed with and/or displays one or more symptoms of a disease, disorder, and/or condition. [0192] Susceptible to: An individual who is “susceptible to” a disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition) is one who has a higher risk of developing the disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition ) than does a member of the general public. In some embodiments, an individual who is susceptible to a disease, disorder and/or condition (e.g., cancer and/or a cancer-associated Page 33 of 214 12608199v1
Attorney Docket No. 2013237-1122 condition) may not have been diagnosed with the disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition). In some embodiments, an individual who is susceptible to a disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition) may exhibit symptoms of the disease, disorder, and/or condition (e.g., cancer and/or a cancer- associated condition). In some embodiments, an individual who is susceptible to a disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition) may not exhibit symptoms of the disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition). In some embodiments, an individual who is susceptible to a disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition) will develop the disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition). In some embodiments, an individual who is susceptible to a disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition) will not develop the disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition). [0193] Therapy: The term “therapy” refers to an administration or delivery of an agent or intervention that has a therapeutic effect and/or elicits a desired biological and/or pharmacological effect (e.g., has been demonstrated to be statistically likely to have such effect when administered to a relevant population). In some embodiments, a therapeutic agent or therapy is any substance that can be used to alleviate, ameliorate, relieve, inhibit, prevent, delay onset of, reduce severity of, and/or reduce incidence of one or more symptoms or features of a disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition). In some embodiments, a therapeutic agent or therapy is a medical intervention that can be performed to alleviate, relieve, inhibit, present, delay onset of, reduce severity of, and/or reduce incidence of one or more symptoms or features of a disease, disorder, and/or condition. [0194] Treat: As used herein, the term “treat,” “treatment,” or “treating” refers to any method used to partially or completely alleviate, ameliorate, relieve, inhibit, prevent, delay onset of, reduce severity of, and/or reduce incidence of one or more symptoms or features of a disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition). Treatment may be administered to a subject who does not exhibit signs of a disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition). In some embodiments, treatment may be administered to a subject who exhibits only early signs of the disease, disorder, and/or condition Page 34 of 214 12608199v1
Attorney Docket No. 2013237-1122 (e.g., cancer and/or a cancer-associated condition), for example for the purpose of decreasing the risk of developing pathology associated with the disease, disorder, and/or condition. In some embodiments, treatment may be administered to a subject at a later-stage of disease, disorder, and/or condition (e.g., cancer and/or a cancer-associated condition). DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS I. Polyepitopic Vaccine Constructs [0195] The present disclosure, among other things, provides technologies that are and/or utilize polyepitopic vaccine constructs. In various embodiments, the present disclosure provides cancer therapies that involve vaccination with two or more polyepitopic constructs, wherein the polyepitopic constructs together comprise both neoepitopes and non-neoepitopes (e.g., shared antigen epitopes). In some embodiments, a polyepitopic construct that includes a neoepitope does not include any shared antigen epitopes. In some embodiments, a polyepitopic construct that includes a shared antigen epitope does not include any neoepitopes. In some embodiments, a polyepitopic construct may include both one or more neoepitopes and one or more non- neoepitopes (e.g., shared antigen epitopes). [0196] The disclosure is based, in part, on an insight that certain individual tumors may not contain targetable shared antigens (e.g., may not be effectively treated by targeting shared antigens) but may contain effectively targetable individual neoantigens (e.g., neoepitopes), while a certain tumors may contain no or a very limited number of targetable neoantigens (e.g., may have an unusually low mutation rate, or may not have developed mutations in targetable sites, or otherwise may not be effectively treated by targeting neoepitopes) but may contain targetable shared tumor antigens and/or epitopes. Accordingly, the disclosure provides and/or utilizes a combination vaccine strategy that targets both neoepitopes and non-neoepitopes (e.g., shared tumor antigen epitopes). [0197] In embodiments described herein, the present disclosure provides and/or utilizes vaccine compositions that deliver polyepitopic polypeptides that together include both non-neoepitopes (e.g., shared tumor antigen epitopes) (e.g., a first polyepitopic vaccine) and individual neoantigen epitopes (e.g., a second polyepitopic vaccine). In some embodiments, polyepitopic polypeptides Page 35 of 214 12608199v1
Attorney Docket No. 2013237-1122 may be delivered by administration of one or more compositions that includes the polypeptide(s) (or a pro version thereof). In some embodiments, polyepitopic polypeptides may be delivered by administration of one or more compositions that includes a nucleic acid encoding the polypeptide(s) (or a pro version thereof). [0198] The present disclosure further provides an insight that polynucleotide vaccine strategies, and in particular polyribonucleic acid vaccine strategies are particularly amenable for use as described herein. Thus, in embodiments, the present disclosure provides combination vaccine strategies that comprise administering to a subject first and second polynucleotide (e.g., polyribonucleotide) vaccines; in particular embodiments a first such polynucleotide (e.g., polyribonucleotide) vaccine encodes a polyepitopic non-neoantigen (e.g., shared tumor antigen) polypeptide and a second such polynucleotide (e.g., polyribonucleotide) vaccine encodes a polyepitopic neoepitope polypeptide. [0199] The present disclosure provides yet a further insight that polyribonucleotide vaccination is proving to be a particularly effective approach to directing host immune responses (including both B- and T-cell responses) toward an encoded target epitope(s); the present disclosure appreciates that such polyribonucleotide vaccination may be particularly useful and/or effective in the present context and application. [0200] The present disclosure is additionally based, in part, on an insight that compositions that include or deliver a polyepitopic vaccine that includes a plurality of non-neoantigen (e.g., shared tumor antigen) epitopes may be particularly useful for cancer treatment. Without wishing to be bound by any particular theory, in some embodiments, compositions of the disclosure can be beneficial to a broader patient population as compared to a composition comprising only a vaccine comprising non-neoantigen (e.g., shared tumor antigen) epitopes and/or as compared to a composition comprising only a vaccine comprising individual neoantigen epitopes. [0201] In some embodiments, the present disclosure utilizes RNA technologies as a modality to express at least (i) a first polyepitopic vaccine construct that includes one or more non- neoantigen (e.g., shared tumor antigen) epitopes, and (ii) a second polyepitopic vaccine construct that includes one or more neoantigen epitopes. [0202] In some embodiments, a first polyepitopic vaccine construct includes any number of epitopes such that the number of nucleotides encoding all epitopes is about 1,300 nucleotides in Page 36 of 214 12608199v1
Attorney Docket No. 2013237-1122 length. In some embodiments, a first polyepitopic vaccine construct includes up to about 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 50 non-neoantigen (e.g., shared tumor antigen) epitopes. In some embodiments, a first polyepitopic vaccine construct includes about 8 non- neoantigen (e.g., shared tumor antigen) epitopes. In some embodiments, a first polyepitopic vaccine construct includes non-neoantigen (e.g., shared tumor antigen) epitopes from a single non-neoantigen (e.g., shared tumor antigen). In some embodiments, a first polyepitopic vaccine construct includes at least one non-neoantigen (e.g., shared tumor antigen) epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, or 30 different non-neoantigens (e.g., shared tumor antigens). In some embodiments, a first polyepitopic vaccine construct does not include a neoantigen epitope. In some embodiments, a first polyepitopic vaccine construct includes shared tumor antigen epitopes that match a subject’s HLA. In some embodiments, a first polyepitopic vaccine construct includes one or more individualized shared tumor antigen epitopes. In some embodiments, shared tumor antigen epitopes are identified as occurring in clusters of neighboring epitopes. In such embodiments, up to three epitopes are selected in one target sequence cluster. [0203] In some embodiments, a first polyepitopic vaccine construct includes any number of epitopes such that the number of nucleotides encoding all epitopes is about 1,300 nucleotides in length. In some embodiments, a second polyepitopic vaccine construct includes up to about 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 50 neoantigen epitopes. In some embodiments, a second polyepitopic vaccine construct includes about 10 neoantigen epitopes. In some embodiments, a second polyepitopic vaccine construct includes neoantigen epitopes from a single neoantigen. In some embodiments, a second polyepitopic vaccine construct includes at least one neoantigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, or 30 different neoantigens. In some embodiments, a second polyepitopic vaccine construct does not include a shared tumor antigen epitope. [0204] In some embodiments, a non-neoantigen (e.g., shared tumor antigen) epitope and/or a neoantigen epitope includes about 8 to about 50 amino acids, e.g., about 8 to about 40 amino acids, e.g., about 10 to about 25 amino acids. In some embodiments, a non-neoantigen (e.g., shared tumor antigen) epitope and/or a neoantigen epitope includes about 27 amino acids. In some embodiments, a non-neoantigen (e.g., shared tumor antigen) epitope and/or a neoantigen Page 37 of 214 12608199v1
Attorney Docket No. 2013237-1122 epitope includes about 9 amino acids. In some embodiments, a non-neoantigen (e.g., shared tumor antigen) epitope includes about 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, or 40 amino acids, and a neoantigen epitope includes about 25, 27, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, or 50 amino acids. [0205] In some embodiments, a first and/or a second polyepitopic vaccine construct includes a pre-defined control epitope (e.g., to assess potency). In some embodiments, a first polyepitopic vaccine construct includes a tetanus epitope (e.g., P2P16) as a pre-defined control epitope. In some embodiments, a second polyepitopic vaccine construct includes a pre-defined control epitope that is not present on the first polyepitopic vaccine construct. In some embodiments, a second polyepitopic vaccine construct includes no pre-defined control epitope. [0206] In some embodiments, a vaccine construct additionally includes one or more additional amino acid sequences, such as a secretory signal, a trafficking signal, and/or a linker, as described herein. A. Selection of Epitopes [0207] In some embodiments, a non-neoantigen epitope (e.g., a shared tumor antigen epitope) and/or a neoantigen epitope utilized in a vaccine construct described herein is identified, characterized, and/or selected using any known method. For example, in some embodiments, antigens and/or epitopes are selected using a method described in, e.g., WO2011/143656, WO2012/159754, WO2014/082729, WO2014/168874, WO2014/180569, WO2015/014869, WO2016/128376, WO2017/184590, WO2017/194610, WO2018/015433, WO2018/148671, WO2018/224405, WO2020/132586, WO2022/104002, WO2022/020311, WO2020/260897, WO2020/260898, WO2021/005338, WO2021/005339, WO2021/209775, WO2021/212123 and/or other publications such as: Laumont et al., Nature Communications 7.1 (2016); Pataskar, et al., Nature 603.7902 (2022); Bartok, et al., Nature 590.7845 (2021); Ouspenskaia, et al., Nature Biotechnology 40.2 (2022); Abelin et al., Immunity. 2017 Feb 21;46(2):315-326; Abelin et al., Immunity. 2019 Oct 15;51(4):766-779.e17 each of which is incorporated by reference herein in its entirety. [0208] In some embodiments, antigens and/or epitopes are experimentally or computationally assessed. In some embodiments, antigens and/or epitopes are assessed by consultation with published reports. Page 38 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0209] For example, in some embodiments, HLA-I and/or HLA-II binding is experimentally assessed; in some embodiments it is predicted. In some embodiments, predicted HLA-I or HLA- II binding is assessed using an algorithm such as neonmhc 1 and/or neonmhc2, which predict and/or characterize likelihood of MHC class I and MHC class II binding, respectively. Alternatively or additionally, in some embodiments, an MHC-peptide presentation prediction algorithm or MHC-peptide presentation predictor is or comprises NetMHCpan or NetMHCIIpan. In some embodiments, a hidden Markov model approach may be utilized for MHC-peptide presentation prediction and/or characterization. In some embodiments, the peptide prediction model MARIA may be utilized. In some embodiments, NetMHCpan is not utilized to predict or characterize likelihood of MHC binding. In some embodiments, the peptide prediction model MARIA is not utilized to predict or characterize likelihood of MHC binding. In some embodiments, neither NetMHCpan nor NetMHCIIpan is utilized to predict or characterize likelihood of MHC binding. In some embodiments, an MHC-peptide presentation prediction algorithm or MHC-peptide presentation predictor is or comprises RECON® (Real-time Epitope Computation for ONcology), which offers high quality MHC-peptide presentation prediction based on expression, processing and binding capabilities. See, for example, Abelin et al., Immunity 21:315, 2017; Abelin et al., Immunity 15:766, 2019, each of which is incorporated herein by reference in its entirety. [0210] In some embodiments, expression level is experimentally determined (e.g., in a model system or in cancer patients). In some embodiments, expression level is a reported level (e.g., in a published or presented report). In some embodiments, expression level is assessed as RNA (e.g., via RNASeq or whole exome sequencing). In some embodiments, expression level is assessed as protein. [0211] In some embodiments, one or more neoantigen epitopes are selected using a method schematically depicted in FIG.1. In FIG.1, MHC I/II binding assessment includes processing, but not full presentation; expression is a separate feature. Additionally, a score that includes all presented features in one value plus peptide features is applied to decide selection of targets within one round. [0212] In some embodiments, one or more neoantigen epitopes are selected using a method schematically depicted in FIG.2. In FIG.2, the number of rounds is reduced relative to the Page 39 of 214 12608199v1
Attorney Docket No. 2013237-1122 method depicted in FIG.1. Additionally, in FIG.2, MHC I/II: Presentation* includes expression, where features are integrated in the presentation score directly, and the expression of a mutation are included directly in this presentation model. [0213] In some embodiments, non-neoantigen epitopes and neoantigen epitopes are selected as depicted in FIG.3. In FIG.3, tumor associated antigens are referred to as “TAA”. In some embodiments, identification of shared antigen epitopes and/or neoantigen epitopes can be performed by any of the methods described herein. [0214] In some embodiments, an individualized shared tumor antigen epitope is selected as depicted in FIG.6. In some embodiments, individualized non-neoantigens are selected from list of validated antigens, e.g., 20-40 validated antigens, or ~ 30 validated antigens. In some embodiments, neoantigen epitopes and non-neoepitopes are selected as depicted in FIG.7. In some embodiments, neoantigen epitopes and/or non-neoepitopes are identified by whole exome sequencing and/or RNA sequencing of tumor cells and/or normal (i.e., non-cancerous) cells from a subject. In some embodiments, RNA sequencing is performed on peripheral blood mononuclear cells (PBMCs). [0215] In some embodiments, a shared tumor antigen epitope is from ACTL8, ANKRD30A, CBX2, CLDN6, CST9, CST9L, EDDM3b, KK-LC-1, KLK2, KLK3 (PSA), LRRC26, MAGEA1, MAGEA3, MAGEA4, MAGEA5, MAGEA9b, MAGEB4, MAGEC1, NY-ESO-1, PLAC1, PRAME, SPANXE, SSX4, TPTE, TP53, or XAGE5. [0216] In some embodiments, a shared tumor antigen is one that is known to be particularly highly expressed in tumor sample(s) as compared with non-tumor sample(s) (e.g., of the same tissue). In some such embodiments, the tumor antigen may have an amino acid sequence that is identical to that of a protein expressed in non-tumor cells, but the protein may be consistently overexpressed in tumor cells (e.g., of a particular type and/or stage, etc.). In some embodiments, a shared tumor antigen may include a mutation (relative to protein found in non-tumor samples) that is preferentially associated with (e.g., found in) tumor cells rather than non-tumor cells (e.g., of the same cell type). [0217] In some embodiments, a polyepitopic vaccine construct that includes one or more shared tumor antigen epitopes, as described herein, encodes non-neoantigens and/or non-neoantigen Page 40 of 214 12608199v1
Attorney Docket No. 2013237-1122 epitopes selected from a list of tumor-associated antigens known to be associated with multiple indications (e.g., different cancers). [0218] In some embodiments, a polyepitopic vaccine construct that includes one or more shared tumor antigen epitopes, as described herein, is selected from a pre-furnished polyepitopic vaccine warehouse, such as a pre-furnished RNA vaccine warehouse (“off the shelf”). Such pre- furnished vaccine warehouse comprises a set of pre-manufactured vaccine products, each pre- manufactured vaccine product inducing an immune response against one or more shared tumor antigen epitopes. A warehouse can include pre-manufactured vaccine products that are designed to be applicable to a large fraction of cancer patients having cancer of the same type and/or to a large fraction of cancer patients having cancer of different types. Accordingly, a pre-furnished vaccine warehouse can include a set of pre-manufactured vaccine products applicable to a large fraction of cancer patients. [0219] For example, if a set of prevalent shared tumor antigen epitopes is known for a particular cancer type, it is possible to produce such pre-furnished vaccine warehouse comprising a set of pre-manufactured vaccine products that induce immune responses against said prevalent shared tumor antigen epitopes. In some embodiments, it is possible to select from a pre-furnished vaccine warehouse one or more pre-manufactured vaccine products that will induce an immune response against one or more shared tumor antigen epitopes expressed in cancer cells of a particular patient being treated. [0220] In some embodiments, selection includes testing the patient for shared tumor antigen epitope expression. In some embodiments, such testing may identify one or more appropriate pre-manufactured polyepitopic vaccine products (e.g., from a pre-furnished vaccine warehouse that includes shared tumor antigen epitope(s)) expressed by cancer cells of the patient and may be useful in the practice of the present disclosure as applied to that particular patient. Alternatively, in some embodiments, such testing may identify particular shared tumor antigen epitope(s) as individualized shared tumor antigen epitope(s) that may desirably be included in a polyepitopic construct particularly produced for that particular patient. [0221] In some embodiments, selection includes transcriptomic/ peptidomic analysis of the patient’s tumor. For example, tumor samples from eligible patients can be analyzed for tumor antigen epitope signatures. In some embodiments, a shared tumor antigen epitope profile can be Page 41 of 214 12608199v1
Attorney Docket No. 2013237-1122 determined by quantitative, multiplex RT-PCR and/or IHC, and respective pre-manufactured or specifically tailored polyepitopic vaccine product(s) can be selected or produced. [0222] In some embodiments, empirical data is used to select pre-manufactured polyepitopic vaccine products that will most likely target one or more tumor antigens expressed by cancer cells of the patient. In some embodiments, selection or design of a polyepitopic vaccine may be based at least in part on a cancer patient’s HLA type and/or on one or more features of the patient’s tumor or immunological state (e.g., mutational load, high mutation burden, mismatch- repair deficiency, or microsatellite instability, or any combination thereof). [0223] In some embodiments, a set of shared tumor antigen epitopes (e.g., as included in one or more newly produced or pre-manufactured polyepitopic vaccine products) is optimized regarding coverage of tumor samples and/or their shared tumor antigen epitope expression pattern. In some embodiments, a set comprises vaccine product(s) that are selected to target a maximum number of tumor samples of either the same and/or a different type, while typically keeping the number of vaccine products in the set as low as possible. In some embodiments, a set may not necessarily include shared tumor antigen epitopes that are shared in the largest fractions of tumor patients. Rather, in some embodiments, a set of shared antigen polyepitopic vaccine products can be optimized with respect to (i) shared tumor antigen epitopes that are shared by a particularly large fraction of tumor patients (e.g., suffering from the same tumor type and/or stage) and/or (ii) shared tumor antigen epitopes that are shared by a particularly large number of tumor patients (e.g., suffering from the same tumor type and/or stage), while keeping the number of separate vaccine products as low as possible. Alternatively or additionally, in some embodiments, a set of shared antigen polyepitopic vaccine products can be optimized with respect to epitopes that are particularly highly expressed within one or more tumor samples of the subject and/or that are expressed in a particularly large number or percentage of tumor samples of the subject. [0224] In some embodiments, a vaccine warehouse of pre-manufactured polyepitopic vaccine products is suitable for targeting at least 15% of patients having a particular tumor type. [0225] In some embodiments, a pre-furnished vaccine warehouse includes a set of about 2, 5, 10, 20, 30, 40, 50, 75, or 100 different pre-manufactured polyepitopic vaccine products, each pre- Page 42 of 214 12608199v1
Attorney Docket No. 2013237-1122 manufactured polyepitopic vaccine product including a polyepitopic vaccine construct that includes one or more shared tumor antigen epitopes, as described herein. Detecting Antigen-based Neotherapeutic Target Epitopes (“dANTe”) [0226] In some embodiments, non-neoantigen epitopes and/or neoantigen epitopes are selected as depicted schematically in FIG.13 (referred to herein as “dANTe” pipeline). [0227] In certain embodiments, a method d100 for selecting neoantigen epitopes may include steps as shown in FIG.53A. For example, method d100 can include one or more of: step d102, comprising obtaining whole exome sequenced matched germline DNA and tumor DNA and tumor RNAseq data; step d104, comprising performing HLA typing using DNA reads from germline sequences; step d106, comprising aligning DNA reads to a reference genome; step d108, comprising detecting mutations in sequences; step d110, comprising annotating sequences to describe regions/sites of interest; step d112, comprising computing predicted peptide-HLA I and II binding/presentation scores; step d114, comprising ranking neoepitope targets based on scores and expression values; and/or step d116, comprising storing and/or providing the ranked list of neoepitope targets. In some embodiments, method d100 includes step d102, comprising obtaining whole exome sequenced matched germline DNA and tumor DNA and tumor RNAseq data; step d104, comprising performing HLA typing using DNA reads from germline sequences; step d106; comprising aligning DNA reads to a reference genome; step d108, comprising detecting mutations in sequences; step d110, comprising annotating sequences to describe regions/sites of interest; step d112, comprising computing predicted peptide-HLA I and II binding/presentation scores; step d114, comprising ranking neoepitope targets based on scores and expression values, and step d116, comprising storing and/or providing the ranked list of neoepitope targets. [0228] In certain embodiments, a method dn100 for selecting non-neoantigen epitopes may include steps as shown in FIG.53B. For example, method dn100 can include submethods dn100a and dn100b. In some embodiments, submethod dn100a includes one or more of: step dn102a, comprising providing an input list of tumor-associated antigen targets and threshold expression levels; step dn104a, comprising pre-computing HLA binding for targets on an input list; and/or step dn106a, comprising filtering for transcripts exceeding threshold expression level. In some embodiments, submethod dn100a includes step dn102a, comprising providing an input Page 43 of 214 12608199v1
Attorney Docket No. 2013237-1122 list of tumor-associated antigen targets and threshold expression levels, followed by step dn104b, comprising pre-computing HLA binding for targets on an input list, followed by step dn106a, comprising filtering for transcripts exceeding threshold expression level. In some embodiments, submethod dn100a includes step dn102a, comprising providing an input list of tumor-associated antigen targets and threshold expression levels, followed by step dn106a, comprising filtering for transcripts exceeding threshold expression level, followed by step dn104a, comprising pre- computing HLA binding for targets on an input list. In some embodiments, submethod dn100b includes one or more of: step dn102b, comprising obtaining whole exome sequenced matched germline DNA and tumor DNA and tumor RNAseq data; step dn104b, comprising performing HLA typing using DNA reads from germline sequences; step dn106b, comprising aligning DNA reads to a reference genome; step dn108b, comprising detecting mutations in exonic regions; step dn110b, comprising generating non-neoepitope target sequences; step dn112b, comprising annotating sequences to describe regions/sites of interest; step dn114b, comprising computing predicted peptide-HLA I and II binding/presentation scores; and/or step dn116b, comprising storing and/or providing a list of non-neoepitope targets with scores and expression values. In some embodiments, submethod dn100b includes: step dn102b, comprising obtaining whole exome sequenced matched germline DNA and tumor DNA and tumor RNAseq data; step dn104b, comprising performing HLA typing using DNA reads from germline sequences; step dn106b, comprising aligning DNA reads to a reference genome; step 108b, detecting mutations in exonic regions; step dn110b, comprising generating non-neoepitope target sequences; step dn112b, comprising annotating sequences to describe regions/sites of interest; step dn114b, comprising computing predicted peptide-HLA I and II binding/presentation scores; and step dn116b, comprising storing and/or providing a list of non-neoepitope targets with scores and expression values, and includes step dn104a, comprising pre-computing HLA binding for targets on the input list, and step dn106a, comprising filtering for transcripts exceeding threshold expression level (e.g., step dn104a before step dn106a, or step dn106a before step dn104a) after step dn102b, step dn104b, step dn106b, step 108b, step dn112b, or step dn114b. [0229] In some embodiments, non-neoantigen epitopes are selected using a list of tumor- associated antigen gene targets known to be associated with multiple indications (e.g., different cancers). In some embodiments, each of the gene targets on a list of tumor-associated antigen Page 44 of 214 12608199v1
Attorney Docket No. 2013237-1122 gene targets is associated with a predefined expression threshold (e.g., a level above which the tumor-associated antigen gene target is considered to be expressed in a given patient or sample). [0230] In some embodiments, expression of each of the genes on a list of tumor-associated antigen gene targets is determined for a patient or sample using, e.g., a method described herein; expression of each of the gene targets is compared to a predefined threshold expression level; and the list of tumor-associated antigen gene targets used to select non-neoantigen epitopes is limited to those gene targets for which expression exceeds the respective predefined threshold. [0231] In some embodiments, selection of non-neoantigen epitopes and/or neoantigen epitopes includes a step of whole exome sequencing and/or RNA sequencing of tumor cells and/or normal (i.e., non-cancerous) cells from a subject. [0232] In some embodiments, selection of non-neoantigen epitopes and/or neoantigen epitopes includes a step of HLA typing or HLA calling (using, e.g., a method described herein). [0233] In some embodiments, selection of non-neoantigen epitopes and/or neoantigen epitopes includes a step of detecting one or more somatic mutations (e.g., a single nucleotide variant (SNV) and/or an InDel) using, for example, a method described herein. [0234] In some embodiments, selection of non-neoantigen epitopes and/or neoantigen epitopes includes a step of detecting one or more germline variants (using, e.g., a method described herein). In some embodiments, germline variants are detected by computation using results of a preceding step (e.g., in the vicinity of detected somatic mutations or within exonic regions of an exemplary input list of tumor-associated antigen gene targets). [0235] In some embodiments, selection of non-neoantigen epitopes and/or neoantigen epitopes includes a step of detecting gene fusions using, e.g., a method described herein. In some embodiments, selection of non-neoantigen epitopes and/or neoantigen epitopes includes a step of deprioritizing target sequences in which gene fusions were detected. [0236] In some embodiments, selection of non-neoantigen epitopes and/or neoantigen epitopes includes a step of annotating non-neoantigen epitopes and/or neoantigen epitope sequences with mutational information obtained using one or more steps described herein. [0237] In some embodiments, step of determining expression of each of the genes on a list of tumor-associated antigen gene targets, as described herein, is performed before sequencing Page 45 of 214 12608199v1
Attorney Docket No. 2013237-1122 and/or mutational analysis. In some embodiments, step of determining expression of each of the genes on a list of tumor-associated antigen gene targets, as described herein, is performed after sequencing and/or mutational analysis. [0238] In some embodiments, selection of non-neoantigen epitopes includes a step of deriving peptide non-neoantigen sequences by generating all possible peptide combinations based on detected mutation events and corresponding phasing of relevant alleles, using, e.g., a method described herein. In some embodiments, peptide non-neoantigen sequences containing a somatic mutation are discarded. [0239] In some embodiments, selection of non-neoantigen epitopes and/or neoantigen epitopes includes a step of predicting HLA binding and presentation using a predictive model with expression and cleavage features. In some embodiments, HLA binding predictions may be precomputed for known peptide-allele combinations (e.g., those associated with tumor- associated antigen gene targets and/or known alleles), and precomputed binding predictions may be obtained by querying the precomputed binding predictions. In some embodiments, HLA binding predictions may be computed “on-line” by a predictive model (e.g., for unknown alleles and/or mutations). [0240] In some embodiments, predicting HLA binding and presentation includes a step of creating a Lookup table of precomputed peptide-MHC binding scores for selecting non- neoantigen epitopes using a list of tumor-associated antigen gene targets. In some embodiments, querying a Lookup table to obtain a precomputed peptide-MHC binding score reduces computational time and/or expense associated with prediction of peptide-MHC binding scores used for selecting non-neoantigen epitopes. In some embodiments, a Lookup table used for selecting non-neoantigen epitopes contains precomputed peptide-MHC binding scores for peptide-allele combinations derived from a list of tumor-associated antigen gene targets and frequently occurring alleles in the human population. [0241] In some embodiments, a selected non-neoantigen epitope is validated by comparison of its sequence to a reference sequence, such as a corresponding exonic reference transcript sequence. In some embodiments, successful validation by comparison of a non-neoantigen epitope sequence to a reference sequence includes up to two mismatches (e.g., to consider single nucleotide polymorphisms (SNPs) in the non-neoantigen epitope sequence). Page 46 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0242] In some embodiments, selection of neoantigen epitopes includes a step of ranking neoantigen epitope sequences based on MHC I and/or MHC II presentation and/or expression values. In some embodiments, selection of neoantigen epitopes includes a step of selecting a group of epitopes in which some epitopes have InDels and some epitopes have SNVs. In some embodiments, selection of neoantigen epitopes includes a step of selecting a group of epitopes in which some epitopes have high MHC I presentation scores and some epitopes have high MHC II presentation scores. [0243] In some embodiments, a selected neoantigen epitope is validated by comparison of its sequence to a reference proteome. In some embodiments, a selected neoantigen epitope is discarded (i.e., no longer considered selected) when a full match to its sequence is found in a reference proteome. Selecting Antigen-based Neotherapeutic Target Epitopes (“sANTe”) [0244] In some embodiments, an additional framework may be applied to further select epitopes. For example, once antigens of a subject are selected (e.g., from genomic data from the subject) (e.g., using “dANTe” framework described herein), a framework referred to herein as Selecting Antigen-based Neotherapeutic Target Epitopes (“sANTe”) may be applied to further select epitopes for inclusion in a polyepitopic vaccine. [0245] In certain embodiments, a method s100 for selecting polynucleotide sequences of antigens for inclusion in a polyepitopic vaccine (e.g., for a cancer therapy) may include the following steps as shown in FIG.54. At step s102, a candidate epitope list (e.g., candidate polynucleotide sequences) identifying a plurality of candidate epitopes and MHC presentation and immunogenicity data may be obtained. For each candidate epitope of the plurality of candidate epitopes, the MHC presentation and immunogenicity data may comprise: (i) a corresponding MHC presentation score representing a known and/or predicted likelihood and/or strength of binding between the particular candidate epitope and an MHC of the subject; and (ii) a corresponding immunogenicity score representing a known and/or predicted immunogenicity of the particular candidate epitope. At step s104, for each candidate epitope, a corresponding rank based at least in part on the MHC presentation and immunogenicity data may be determined. At step s106, a subset of the plurality of candidate epitopes may be selected based at Page 47 of 214 12608199v1
Attorney Docket No. 2013237-1122 least in part on the corresponding rank as a set of target epitopes. At step s108, the set of target epitopes may be stored and/or provided (e.g., for display and/or further processing). Candidate Polynucleotide Sequences and Immune Response Data [0246] Each candidate of a plurality of candidate polynucleotide sequences may comprise: DNA sequence, RNA sequence, polypeptide sequence, or any combination thereof. Each candidate of a plurality of candidate polynucleotide sequences may comprise: gene associated data (e.g., a gene identifier for a gene from to which the candidate belongs to), transcript associated data (e.g., a transcript identifier for a transcript from which the candidate originates from), proteome associated data (e.g., values associated with matching at least a part of the candidate to a proteome data of a subject, a Boolean value describing whether the candidate matches to a proteome data, a Boolean value describing whether a candidate within 3 positions upstream or downstream matches to a proteome data), or any combination thereof. [0247] Each candidate of a plurality of candidate polynucleotide sequences may comprise expected (e.g., as derived from the literature and experiments) and/or predicted (e.g., as determined in silico) immunogenicity associated data (e.g., related to immune response processes, e.g., cleavage, HLA binding, expression, presentation). For example, immunogenicity associated data may comprise an immunogenicity tier (e.g., from highest to lowest evidence), gene expression associated data (e.g., expression value in transcripts per million of the associated gene, gene expression cut-off), associated allele data (e.g., allele name, major histocompatibility complex (MHC) class), HLA binding associated data (e.g., predicted percentile rank, cut-off value for the binding percentile rank), presentation at a cellular surface associated data (e.g., predicted presentation percentile rank), or any combination thereof. The HLA binding associated data may be used to bin each candidate of a plurality of candidate polynucleotide sequences into various binding categories (e.g., strong, intermediate, and weak). Ranking and Filtering Candidate Polynucleotide Sequences [0248] In certain embodiments, a plurality of candidate polynucleotide sequences is filtered using a proteome data from a subject (e.g., from healthy cells, healthy tissue, cancer tissue). For example, epitope match in the proteome data may be related certain degree of abundance of the epitope in healthy tissue. For each candidate of at least a part of a plurality of candidate polynucleotide sequences, the candidate may be filtered (e.g., removed from subsequent Page 48 of 214 12608199v1
Attorney Docket No. 2013237-1122 analysis) if associated polypeptide sequence (e.g., at least a part of it) matches to a sequence (e.g., 8-mer) from a proteome data. The proteome data may be (e.g., manually, automatically) curated (e.g., processed) (e.g., to exclude a gene specific whitelist that contains, e.g., homologous sequences including sequences from annotated pseudogenes). [0249] In certain embodiments, a plurality of candidate polynucleotide sequences comprises a plurality of candidate cores. Each core of a plurality of candidate cores may be defined as an overlap of each of the plurality of candidate polynucleotide sequences with “known” epitopes, corresponding “known” HLA ligands, epitope-allele pairs corresponding to the strongest binding. In other words, candidate cores may correspond to parts of candidate sequences that are associated with the strongest binding. [0250] In certain embodiments, a plurality of candidate polynucleotide sequences is ranked according to associated immunogenicity tier, HLA binding associated data, “known” epitopes data, “known” ligands data, gene expression level, or any combination thereof. The “known” epitopes data may refer to a list of epitopes (e.g., obtained from public, such as immune epitope database (IEDB), and private results) with evidence for immunogenicity (e.g., evidence that epitope specific T cells kill tumor cell lines or autologous tumor cells that endogenously express associated antigen, evidence that epitope specific T cells kill target cells transduced/transfected with associate antigen and are reported in more than one publication, experimental immunogenicity data with ex vivo enzyme-linked immunospot assay (ELISpot) and multimer data or T-cell receptor data). Certain epitopes may be excluded from the list of epitopes (e.g., negative experimental data, assay validity is questionable, only reported in one publication). The “known” ligands data may refer to a list of MHC class I and II ligands (e.g., obtained from public and private mass spectrometry data). The list of MHC ligands may be filtered to exclude ligand-allele pairs with low binding. [0251] In certain embodiments, corresponding ranking comprises dividing a plurality of candidate polynucleotide sequences s202 into batches s204a, s204b, and s204c as shown in FIG. 55. A batch may comprise candidate polynucleotide sequences with a same immunogenicity tier. A batch may comprise candidate polynucleotide sequences with two consecutive immunogenicity tiers. A batch may comprise candidate polynucleotide sequences with “known” Page 49 of 214 12608199v1
Attorney Docket No. 2013237-1122 epitopes. A batch may comprise candidate polynucleotide sequences with “known” ligands. A batch may comprise candidate polynucleotide sequences with a same binding category. [0252] Batches (e.g., with associated candidate polynucleotide sequences) may be ranked by the associated tiers. Batches may be ranked by “known” epitopes, “known” ligands, and associated binding categories, for example, as shown in FIG.35. [0253] In certain embodiments, each core of a plurality of candidate cores is extended by an overlap of at least one amino acid between the core and a plurality of candidate polynucleotide sequences. As such, a core may be extended on each terminus as shown in FIG.36. Without wishing to be bound to any theory, longer candidate sequences may be preferrable, for example, to aid in overcoming immune tolerance, improving stability of a final construct, and/or being associated with a stronger immunogenicity. A core with extensions may need to satisfy peptide- MHC prediction threshold of a given batch (e.g., and in case of batches with “known” epitopes and “known” ligands, a threshold of strong binders may be assumed). A core may be checked to not exceed a length of 40 amino acids. Extensions may be checked to be derived from a same transcript window as a core. [0254] In certain embodiments, each core of a plurality of candidate cores is extended by flanks (e.g., on each terminus, e.g., by up to three flanking amino acids). [0255] In certain embodiments, two cores of a plurality of candidate cores are combined if the two cores overlap by at least one amino acid and correspond to a same transcript window. Only cores from a same batch may be considered for combining. Only cores from a same batch and with a same binding category may be considered for combining. A core with a best ranking out of two cores may determine ranking of a combined core. If a length of a combined core is longer than 40 amino acids, the combined core may be split (e.g., preserving original cores without extensions). [0256] In certain embodiments, a plurality of candidate polynucleotide sequences is filtered to remove any duplicates (e.g., partial duplicates) (e.g., by collapsing two or more candidates into one candidate while preserving associated data) [e.g., while keeping candidates with a best parameter (e.g., rank, predicted presented ligands, peptide-MHC prediction value, expression value)]. For example, identical candidates from different transcripts of a same gene may be collapsed into one candidate while keeping data about associated transcripts. For example, Page 50 of 214 12608199v1
Attorney Docket No. 2013237-1122 identical candidates or candidates with identical cores may be discarded per antigen while keeping a candidate with a best parameter (e.g., rank, predicted presented ligands, peptide-MHC prediction value, expression value). For example, each candidate of the plurality of candidate polynucleotide sequences that is a substring of another candidate (e.g., of a same antigen) may be filtered out. [0257] In certain embodiments, a plurality of candidate polynucleotide sequences is filtered using potency linker data. A potency linker may refer to linker that affects immunogenic properties. [0258] In certain embodiments, a plurality of candidate polynucleotide sequences is ranked based on variety of associated antigens. Variety of antigens may be associated with a stronger immune response due to, for example, broadened immune response and a higher chance of overcoming immune tolerance. For example, if N (e.g., two) candidates s302 in a batch s304 correspond to a same antigen, remaining candidates corresponding to the same antigen may be downranked to an end of the batch as shown in FIG.56A. For example, if K (e.g., two, three) candidates s312 in a batch s314a correspond to a same antigen and correspond to “known” epitopes or “known” ligands, remaining candidates corresponding to the same antigen may be downranked to an end of another batch (e.g, tier three, tier five) as shown in FIG.56B. The values of N, K may be associated with a number of antigens in a specific set of batches. Selecting Candidate Polynucleotide Sequences for a Polyepitope Vaccine [0259] In certain embodiments, for each candidate of a plurality of candidate polynucleotide sequences s402, a placement cost is determined for candidate placement s408 in a polyepitope vaccine construct s406 as shown in FIG.57. For each candidate, a placement cost represents an array of values s404 associated with a potential candidate placement position in a polyepitope vaccine construct. The placement cost may be associated with a rank of a candidate. For example, a placement cost may be determined as a rank of a candidate multiplied by 50. The placement cost may be associated with a length of a candidate and a placement position on a polyepitope vaccine construct. For example, if a length of a candidate is below 17 amino acids, one is added to candidate placement cost for positions three and six. For example, if a candidate length exceeds 27 amino acids, one is added for every additional amino acid for a first and a last placement positions. The placement cost may be associated with linkers s410 (e.g., in between Page 51 of 214 12608199v1
Attorney Docket No. 2013237-1122 candidates, before a first place, after a last place). For example, linkers at specific positions may be pre-defined and incompatible with certain candidates. A candidate at positions with an incompatible linker may receive a placement cost of 100,000. [0260] In certain embodiments, for each candidate of a plurality of candidate polynucleotide sequences, compatibility with various linkers is determined. For example, a combination of a linker – candidate – linker may be checked to contain 8-mer of a homology to maximum expression risk genes. A candidate without any allowable linker may be discarded. [0261] In certain embodiments, a length of a polyepitopic vaccine construct is 1,282 base pairs. In certain embodiments, a minimum number of candidates for a polyepitopic vaccine is eight. In certain embodiments, a maximum number of candidates for a polyepitopic vaccine is determined by linker availability. For example, a maximum number of candidates may be 18. If a total number of candidates in a selected subset is less than 8, a construction of a polyepitopic vaccine construct may not be attempted. If a total number of candidates in a selected subset is less than 8 and at least three candidates correspond to different cores, the at least three candidates may be duplicated in alternative scheme based on their ranking until eight candidates are reached (e.g., 1-2-3-1-2-3-1-2). A placement cost of 1000 may be added to a first duplication of a candidate and 2000 to a second duplication of the candidate. [0262] In certain embodiments, placement of candidates a plurality of candidate polynucleotide sequences in a polyepitopic construct is determined using an optimization algorithm (e.g., Munkers algorithm) based on placement costs (e.g., minimizing an overall placement cost, keeping an overall placement cost below 100,000). If optimization was unsuccessful, a candidate with a highest placement cost may be discarded and an optimization is performed again (e.g., if number of candidates allows). B. Secretory Signals [0263] In some embodiments, a polyepitopic vaccine construct described herein includes a secretory signal, e.g., that is functional in mammalian cells. In some embodiments, a secretory signal comprises or consists of a human secretory signal. In some embodiments, a secretory signal comprises or consists of a non-human secretory signal. In some embodiments, a heterologous secretory signal comprises or consists of a viral secretory signal. In some embodiments, a viral secretory signal comprises or consists of an HSV secretory signal (e.g., an Page 52 of 214 12608199v1
Attorney Docket No. 2013237-1122 HSV-1 or HSV-2 secretory signal). In some embodiments, an HSV secretory signal comprises or consists of an HSV glycoprotein D (gD) secretory signal. In some embodiments, a secretory signal comprises or consists of an Ebola virus secretory signal. In some embodiments, an Ebola virus secretory signal comprises or consists of an Ebola virus spike glycoprotein (SGP) secretory signal. [0264] In some embodiments, a secretory signal is characterized by a length of about 15 to 30 amino acids. [0265] In many embodiments, a secretory signal is positioned at the N-terminus of a polyepitopic vaccine construct described herein. In some embodiments, a secretory signal preferably allows transport of a polyepitopic vaccine construct with which it is associated into a defined cellular compartment, preferably a cell surface, endoplasmic reticulum (ER) or endosomal-lysosomal compartment. [0266] In some embodiments, a secretory signal is selected from an S1S2 secretory signal (aa 1- 19), an immunoglobulin secretory signal (aa 1-22), a human SPARC secretory signal, a human insulin isoform 1 secretory signal, a human albumin secretory signal, etc. Those skilled in the art will be aware of other secretory signal such as, for example, as disclosed in WO2017/081082 (e.g., SEQ ID NOs: 1-1115 and 1728, or fragments variants thereof). In some embodiments, a secretory signal is a secretory signal as described in Kreiter, Sebastian, et al. "Increased antigen presentation efficiency by coupling antigens to MHC class I trafficking signals." The Journal of Immunology, 180.1 (2008): 309-318., the content of which is incorporated herein in its entirety, e.g., the mmsec secretory signal. In some embodiments, a polyepitopic vaccine construct described herein does not comprise a secretory signal. [0267] In some embodiments, a secretory signal is one listed in Table 1, or a secretory signal having 1, 2, 3, 4, or 5 amino acid differences relative thereto. In some embodiments, a signal sequence is selected from those included in the Table 1 below and/or those encoded by the sequences in Table 2 below. Page 53 of 214 12608199v1
Attorney Docket No. 2013237-1122 Table 1: Exemplary secretory signals Signal Sequence (Amino Acid) HSV-1 gD SP MGGAAARLGAVILFVVIVGLHGVRSKY HSV-2 gD SP MGRLTSGVGTAALLVVAVGLRVVCA HSV-2 MGRLTSGVGTAALLVVAVGLRVVCAKYA Csp (isolate 3D7) MMRKLAILSVSSFLFVEA HSV-1 gD SP MGGAAARLGAVILFVVIVGLHGVRGKY Ebola spike glycoprotein GP MGVTGILQLPRDRFKRTSFFLWVIILFQRTFS SARS-CoV-2-S MFVFLVLLPLVSSQCVNLT human Ig heavy chain signal MDWIWRILFLVGAATGAHSQM peptide (huSec) HuIgGk signal peptide METPAQLLFLLLLWLPDTTG IgE heavy chain epsilon-1signal MDWTWILFLVAAATRVHS peptide Japanese encephalitis PRM signal MLGSNSGQRVVFTILLLLVAPAYS sequence VSVg protein signal sequence MKCLLYLAFLFIGVNCA TRIO MCRGLSAVLILLVSLSAQLHVVVG human Ig heavy chain signal MELGLSWIFLLAILKGVQC peptide human Ig heavy chain signal MELGLRWVFLVAILEGVQC peptide human Ig heavy chain signal MKHLWFFLLLVAAPRWVLS peptide human Ig heavy chain signal MDWTWRILFLVAAATGAHS peptide human Ig heavy chain signal MDWTWRFLFVVAAATGVQS peptide Page 54 of 214 12608199v1
Attorney Docket No. 2013237-1122 Signal Sequence (Amino Acid) human Ig heavy chain signal MEFGLSWLFLVAILKGVQC peptide human Ig heavy chain signal MEFGLSWVFLVALFRGVQC peptide human Ig heavy chain signal MDLLHKNMKHLWFFLLLVAAPRWVLS peptide human Ig kappa chain signal MDMRVPAQLLGLLLLWLSGARC peptide human Ig kappa chain signal MKYLLPTAAAGLLLLAAQPAMA peptide Table 2: Exemplary polynucleotide sequences encoding secretory signals Signal Sequence (Nucleotide) HSV-1 gD SP wild-type ATGGGGGGGGCTGCCGCCAGGTTGGGGGC CGTGATTTTGTTTGTCGTCATAGTGGGCCT CCATGGGGTCCGCAGCAAATAT HSV-1 gD SP Opt10 nt sequence ATGGGAGGAGCCGCCGCCAGACTGGGAG CCGTGATCCTGTTCGTGGTGATCGTGGGAC TGCATGGAGTGAGAAGCAAGTAC SARS-CoV-2-S ATGTTTGTGTTTCTTGTGCTGCTGCCTCTT GTGTCTTCTCAGTGTGTGAATTTGACA human Ig heavy chain signal ATGGATTGGATTTGGAGAATCCTGTTCCTC peptide (huSec) GTGGGAGCCGCTACAGGAGCCCACTCCCA GATG human Ig heavy chain signal ATGGAGTTGGGACTGAGCTGGATTTTCCTT peptide TTGGCTATTTTAAAAGGTGTCCAGTGT human Ig heavy chain signal ATGGAACTGGGGCTCCGCTGGGTTTTCCTT peptide GTTGCTATTTTAGAAGGTGTCCAGTGT Page 55 of 214 12608199v1
Attorney Docket No. 2013237-1122 Signal Sequence (Nucleotide) human Ig heavy chain signal ATGAAACACCTGTGGTTCTTCCTCCTGCTG peptide GTGGCAGCTCCCAGATGGGTCCTGTCC human Ig heavy chain signal ATGGACTGGACCTGGAGGATCCTCTTCTTG peptide GTGGCAGCAGCAACAGGTGCCCACTCG human Ig heavy chain signal ATGGACTGGACCTGGAGGTTCCTCTTTGT peptide GGTGGCAGCAGCTACAGGTGTCCAGTCC human Ig heavy chain signal ATGGAGTTTGGGCTGAGCTGGCTTTTTCTT peptide GTGGCGATTCTAAAAGGTGTCCAGTGT human Ig heavy chain signal ATGGAGTTTGGGCTGAGCTGGGTTTTCCTC peptide GTTGCTCTTTTTAGAGGTGTCCAGTGT human Ig heavy chain signal ATGGACCTCCTGCACAAGAACATGAAACA peptide CCTGTGGTTCTTCCTCCTCCTGGTGGCAGC TCCCAGATGGGTGCTGTCC human Ig kappa chain signal ATGGACATGAGGGTCCCTGCTCAGCTCCT peptide GGGGCTCCTGCTGCTCTGGCTCTCAGGTG CCAGATGT human Ig kappa chain signal ATGAAATACCTATTGCCTACGGCAGCCGCT peptide GGATTGTTATTACTCGCGGCCCAGCCGGCC ATGGCC C. Trafficking Signals [0268] In some embodiments, a polyepitopic vaccine construct described herein includes a trafficking signal. In some embodiments, an MHC trafficking domain is or comprises a transmembrane region and a cytoplasmic region of a chain of an MHC molecule (e.g., a MHC Class I molecule), for example, in some embodiments as described in the International Patent Publication Number WO 2005/038030, the contents of which are incorporated herein by reference in their entireties for the purposes described herein. In some embodiments, an MHC trafficking domain is or comprises an MHC Class I trafficking domain. In some embodiments, Page 56 of 214 12608199v1
Attorney Docket No. 2013237-1122 an MHC class I trafficking domain (MITD) comprises an amino acid sequence that is at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to the amino acid sequence of IVGIVAGLAVLAVVVIGAVVATVMCRRKSSGGKGGSYSQAASSDSAQGSDVSLTA. In some embodiments, an MHC class I trafficking domain (MITD) comprises an amino acid sequence that is identical to the amino acid sequence of IVGIVAGLAVLAVVVIGAVVATVMCRRKSSGGKGGSYSQAASSDSAQGSDVSLTA. D. Linkers [0269] In some embodiments, a polyepitopic vaccine construct described herein includes one or more linkers, e.g., between epitopes in the construct and/or before or after the last construct. In some embodiments, a linker is or comprises 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acids. In some embodiments, a linker is or comprises no more than about 30, 25, 20, 15, 10 or fewer amino acids. A linker can include any amino acid sequence and is not limited to any particular amino acids. In some embodiments, a linker comprises one or more glycine (G) amino acids. In some embodiments, a linker comprises one or more serine (S) amino acids. In some embodiments, a linker includes amino acids selected based on a cleavage predictor to generate highly-cleavable linkers. [0270] In some embodiments, a linker is or comprises S-G4-S-G4-S. In some embodiments, a linker is or comprises GSPGSGSGS. In some embodiments, a linker is or comprises GGSGGGGSGG. In some embodiments, a linker is one presented in Table 3. In some embodiments, a linker is or comprises a sequence as set forth in WO2017/081082, which is incorporated herein by reference in its entirety (see SEQ ID NOs: 1509-1565, or a fragment or variant thereof). [0271] In some embodiments, one or more linker sequences may comprise cleavage sequences. In some embodiments, a linker may have a length of 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acid. In some embodiments, a linker of not more than about 30, 25, 20, 15, 10 or fewer amino acids is used. In general, any amino acid may be present as a linker sequence. In some embodiments, a linker or cleavage sequence contains a lysine (K). In some embodiments, a linker or cleavage sequence contains an arginine (R). In some embodiments, a linker or cleavage sequence contains a methionine (M). In some embodiments, a linker or cleavage sequence contains a tyrosine (Y). In some embodiments, a linker is designed to comprise amino acids based on a cleavage Page 57 of 214 12608199v1
Attorney Docket No. 2013237-1122 predictor to generate highly-cleavable sequences peptide sequences, and is a novel and effective way of delivering immunogenic epitopes in a vaccine setting. In some embodiments, the epitope distribution and their juxtaposition encoded in a polyepitopic vaccine construct are so designed to facilitate cleavage sequences contributed by the amino acid sequences of the epitopes and/or the flanking or linking residues and thereby using minimal linker sequences. Some exemplary cleavage sequences may be one or more of FRAC, KRCF, KKRY, ARMA, RRSG, MRAC, KMCG, ARCA, KKQG, YRSY, SFMN, FKAA, KRNG, YNSF, KKNG, RRRG, KRYS, and ARYA. [0272] In some embodiments, a polyepitopic vaccine construct described herein comprises a linker between each shared tumor antigen epitope and/or between each neoantigen epitope. In some embodiments, linkers included in a particular polyepitopic vaccine construct described herein are the same. In some embodiments, two or more linkers in a particular polyepitopic vaccine construct described herein are different. [0273] Exemplary linker sequences are provided in the following Table 3. Table 3: Exemplary linker sequences. Linker Sequence (Amino Acid) SGGGGSGGGGS GSPGSGSGS GGSGGGGSGG GGS GGGS GGGGSGGGGSGGGGS AGNRVRRSVG GSGSGS GGSGGGGSGG GGSLGGGGSG FRAC KRCF KKRY ARMA Page 58 of 214 12608199v1
Attorney Docket No. 2013237-1122 Linker Sequence (Amino Acid) RRSG MRAC KMCG ARCA KKQG YRSY SFMN FKAA KRNG YNSF KKNG RRRG KRYS ARYA II. Polyribonucleotides A. Exemplary Polyribonucleotides Features [0274] Polyribonucleotides described herein encode one or more polyepitopic vaccine constructs described herein. In some embodiments, polyribonucleotides described herein can comprise a nucleotide sequence that encodes a 5’UTR of interest and/or a 3’ UTR of interest. In some embodiments, polynucleotides described herein can comprise a nucleotide sequence that encodes a polyA tail. In some embodiments, polyribonucleotides described herein may comprise a 5’ cap, which may be incorporated during transcription, or joined to a polyribonucleotide post- transcription. In some embodiments, a first polyribonucleotide encoding a first polyepitopic and a second polyribonucleotide encoding a second polyepitopic vaccine comprise the same 5’ cap, cap proximal sequence(s), 5’ UTR, linker sequences, 3’ UTR, and poly(A) tail. In some embodiments, a first polyribonucleotide encoding a first polyepitopic and a second polyribonucleotide encoding a second polyepitopic vaccine comprise one or more different 5’ cap, cap proximal sequence(s), 5’ UTR, linker sequences, 3’ UTR, and/or poly(A) tail. In some Page 59 of 214 12608199v1
Attorney Docket No. 2013237-1122 embodiments, the 5’ cap, cap proximal sequence(s), 5’ UTR, linker sequences, 3’ UTR, and poly(A) tail cause innate immune stimulation and/or are intrinsic TLR7/8 agonists. 1. 5' Cap [0275] A structural feature of mRNAs is cap structure at five-prime end (5’). Natural eukaryotic mRNA comprises a 7-methylguanosine cap linked to the mRNA via a 5´ to 5´-triphosphate bridge resulting in cap0 structure (m7GpppN). In most eukaryotic mRNA and some viral mRNA, further modifications can occur at the 2'-hydroxy-group (2’-OH) (e.g., the 2'-hydroxyl group may be methylated to form 2'-O-Me) of the first and subsequent nucleotides producing “cap1” and “cap2” five-prime ends, respectively). Diamond, et al., (2014) Cytokine & growth Factor Reviews, 25:543–550, which is incorporated herein by reference in its entirety, reported that cap0-mRNA cannot be translated as efficiently as cap1-mRNA in which the role of 2'-O-Me in the penultimate position at the mRNA 5’ end is determinant. Lack of the 2'-O-met has been shown to trigger innate immunity and activate IFN response. Daffis, et al. (2010) Nature, 468:452-456; and Züst et al. (2011) Nature Immunology, 12:137-143, each of which is incorporated herein by reference in its entirety. [0276] RNA capping is well researched and is described, e.g., in Decroly E et al. (2012) Nature Reviews 10: 51-65; and in Ramanathan A. et al., (2016) Nucleic Acids Res; 44(16): 7511–7526, the entire contents of each of which is hereby incorporated by reference. For example, in some embodiments, a 5’-cap structure which may be suitable in the context of the present invention is a cap0 (methylation of the first nucleobase, e.g., m7GpppN), cap1 (additional methylation of the ribose of the adjacent nucleotide of m7GpppN), cap2 (additional methylation of the ribose of the 2nd nucleotide downstream of the m7GpppN), cap3 (additional methylation of the ribose of the 3rd nucleotide downstream of the m7GpppN), cap4 (additional methylation of the ribose of the 4th nucleotide downstream of the m7GpppN), ARCA (“anti-reverse cap analogue”), modified ARCA (e.g. phosphothioate modified ARCA), inosine, N1 -methyl-guanosine, 2’-fluoro- guanosine, 7-deaza-guanosine, 8-oxo-guanosine, 2-amino-guanosine, LNA-guanosine, and 2- azido-guanosine. [0277] The term “5'-cap” as used herein refers to a structure found on the 5'-end of an RNA, e.g., mRNA, and generally includes a guanosine nucleotide connected to an RNA, e.g., mRNA, via a 5'- to 5'-triphosphate linkage (also referred to as Gppp or G(5')ppp(5')). In some embodiments, a Page 60 of 214 12608199v1
Attorney Docket No. 2013237-1122 guanosine nucleoside included in a 5’ cap may be modified, for example, by methylation at one or more positions (e.g., at the 7-position) on a base (guanine), and/or by methylation at one or more positions of a ribose. In some embodiments, a guanosine nucleoside included in a 5’ cap comprises a 3’O methylation at a ribose (3’OMeG). In some embodiments, a guanosine nucleoside included in a 5’ cap comprises methylation at the 7-position of guanine (m7G). In some embodiments, a guanosine nucleoside included in a 5’ cap comprises methylation at the 7- position of guanine and a 3’ O methylation at a ribose (m7(3’OMeG)). It will be understood that the notation used in the above paragraph, e.g., “(m27,3’-O)G” or “m7(3’OMeG)”, applies to other structures described herein. [0278] In some embodiments, providing an RNA with a 5'-cap disclosed herein may be achieved by in vitro transcription, in which a 5'-cap is co-transcriptionally expressed into an RNA strand, or may be attached to an RNA post-transcriptionally using capping enzymes. In some embodiments, co-transcriptional capping with a cap disclosed improves the capping efficiency of an RNA compared to co-transcriptional capping with an appropriate reference comparator. In some embodiments, improving capping efficiency can increase a translation efficiency and/or translation rate of an RNA, and/or increase expression of an encoded polypeptide. In some embodiments, alterations to polynucleotides generates a non-hydrolyzable cap structure which can, for example, prevent decapping and increase RNA half-life. [0279] In some embodiments, a utilized 5’ caps is a cap0, a cap1, or cap2 structure. See, e.g., FIG.1 of Ramanathan A et al., and FIG.1 of Decroly E et al., each of which is incorporated herein by reference in its entirety. In some embodiments, an RNA described herein comprises a cap1 structure. In some embodiments, an RNA described herein comprises a cap2. [0280] In some embodiments, an RNA described herein comprises a cap0 structure. In some embodiments, a cap0 structure comprises a guanosine nucleoside methylated at the 7-position of guanine ((m7)G). In some embodiments, such a cap0 structure is connected to an RNA via a 5'- to 5'-triphosphate linkage and is also referred to herein as (m7)Gppp. In some embodiments, a cap0 structure comprises a guanosine nucleoside methylated at the 2’-position of the ribose of guanosine. In some embodiments, a cap0 structure comprises a guanosine nucleoside methylated at the 3’-position of the ribose of guanosine. In some embodiments, a guanosine nucleoside included in a 5’ cap comprises methylation at the 7-position of guanine and at the 2’-position of Page 61 of 214 12608199v1
Attorney Docket No. 2013237-1122 the ribose ((m27,2’-O)G). In some embodiments, a guanosine nucleoside included in a 5’ cap comprises methylation at the 7-position of guanine and at the 2’-position of the ribose ((m27,3’- O)G). [0281] In some embodiments, a cap1 structure comprises a guanosine nucleoside methylated at the 7-position of guanine ((m7)G) and optionally methylated at the 2’ or 3’ position of the ribose, and a 2’O methylated first nucleotide in an RNA ((m2’-O)N1). In some embodiments, a cap1 structure comprises a guanosine nucleoside methylated at the 7-position of guanine ((m7)G) and the 3’ position of the ribose, and a 2’O methylated first nucleotide in an RNA ((m2’-O)N1). In some embodiments, a cap1 structure is connected to an RNA via a 5'- to 5'-triphosphate linkage and is also referred to herein as, e.g., ((m7)Gppp(2'-O)N1) or (m2 7,3’-O)Gppp(2'-O)N1), wherein N1 is as defined and described herein. In some embodiments, a cap1 structure comprises a second nucleotide, N2, which is at position 2 and is chosen from A, G, C, or U, e.g., (m7)Gppp(2'- O)N1pN2 or (m2 7,3’-O)Gppp(2'-O)N1pN2 , wherein each of N1 and N2 is as defined and described herein. [0282] In some embodiments, a cap2 structure comprises a guanosine nucleoside methylated at the 7-position of guanine ((m7)G) and optionally methylated at the 2’ or 3’ position of the ribose, and a 2’O methylated first and second nucleotides in an RNA ((m2’-O)N1p(m2’-O)N2). In some embodiments, a cap2 structure comprises a guanosine nucleoside methylated at the 7-position of guanine ((m7)G) and the 3’ position of the ribose, and a 2’O methylated first and second nucleotide in an RNA. In some embodiments, a cap2 structure is connected to an RNA via a 5'- to 5'-triphosphate linkage and is also referred to herein as, e.g., ((m7)Gppp(2'-O)N1p(2'-O)N2) or (m2 7,3’-O)Gppp(2'-O)N1p(2'-O)N2), wherein each of N1 and N2 is as defined and described herein. [0283] In some embodiments, the 5’ cap is a dinucleotide cap structure. In some embodiments, the 5’ cap is a dinucleotide cap structure comprising N1, wherein N1 is as defined and described herein. In some embodiments, the 5’ cap is a dinucleotide cap G*N1, wherein N1 is as defined above and herein, and G* comprises a structure of formula (I): Page 62 of 214 12608199v1
Attorney Docket No. 2013237-1122 or a salt thereof,
wherein each R2 and R3 is -OH or -OCH3; and X is O or S. [0284] In some embodiments, R2 is -OH. In some embodiments, R2 is -OCH3. In some embodiments, R3 is -OH. In some embodiments, R3 is -OCH3. In some embodiments, R2 is -OH and R3 is -OH. In some embodiments, R2 is -OH and R3 is -CH3. In some embodiments, R2 is - CH3 and R3 is -OH. In some embodiments, R2 is -CH3 and R3 is -CH3. [0285] In some embodiments, X is O. In some embodiments, X is S. [0286] In some embodiments, the 5’ cap is a dinucleotide cap0 structure (e.g., (m7)GpppN1, (m27,2’-O)GpppN1, (m27,3’-O)GpppN1, (m7)GppSpN1, (m27,2’-O)GppSpN1, or (m27,3’-O)GppSpN1), wherein N1 is as defined and described herein. In some embodiments, the 5’ cap is a dinucleotide cap0 structure (e.g., (m7)GpppN1, (m2 7,2’-O)GpppN1, (m2 7,3’-O)GpppN1, (m7)GppSpN1, (m2 7,2’-O)GppSpN1, or (m2 7,3’-O)GppSpN1), wherein N1 is G. In some embodiments, the 5’ cap is a dinucleotide cap0 structure (e.g., (m7)GpppN1, (m27,2’-O)GpppN1, (m2 7,3’-O)GpppN1, (m7)GppSpN1, (m2 7,2’-O)GppSpN1, or (m2 7,3’-O)GppSpN1), wherein N1 is A, U, or C. In some embodiments, the 5’ cap is a dinucleotide cap1 structure (e.g., (m7)Gppp(m2’-O)N1, (m27,2’-O)Gppp(m2’-O)N1, (m27,3’-O)Gppp(m2’-O)N1, (m7)GppSp(m2’-O)N1, (m27,2’-O)GppSp(m2’- O)N1, or (m27,3’-O)GppSp(m2’-O)N1), wherein N1 is as defined and described herein. In some embodiments, the 5’ cap is selected from the group consisting of (m7)GpppG (“Ecap0”), (m7)Gppp(m2’-O)G (“Ecap1”), (m27,3’-O)GpppG (“ARCA” or “D1”), and (m27,2’-O)GppSpG (“beta- S-ARCA”). In some embodiments, the 5’ cap is (m7)GpppG (“Ecap0”), having a structure: Page 63 of 214 12608199v1
Attorney Docket No. 2013237-1122 or a salt
[0287] In some embodiments, the 5’ cap is (m7)Gppp(m2’-O)G (“Ecap1”), having a structure: or a salt
[0288] In some embodiments, the 5’ cap is (m27,3’-O)GpppG (“ARCA” or “D1”), having a structure: or a salt
[0289] In some embodiments, the 5’ cap is (m2 7,2’-O)GppSpG (“beta-S-ARCA”), having a structure: Page 64 of 214 12608199v1
Attorney Docket No. 2013237-1122 or a salt
[0290] In some embodiments, the 5’ cap is a trinucleotide cap structure. In some embodiments, the 5’ cap is a trinucleotide cap structure comprising N1pN2, wherein N1 and N2 are as defined and described herein. In some embodiments, the 5’ cap is a dinucleotide cap G*N1pN2, wherein N1 and N2 are as defined above and herein, and G* comprises a structure of formula (I): or a salt thereof,
[0291] In some embodiments, the 5’ cap is a trinucleotide cap0 structure (e.g. (m7)GpppN1pN2, (m2 7,2’-O)GpppN1pN2, or (m2 7,3’-O)GpppN1pN2), wherein N1 and N2 are as defined and described herein). In some embodiments, the 5’ cap is a trinucleotide cap1 structure (e.g., (m7)Gppp(m2’- O)N1pN2, (m27,2’-O)Gppp(m2’-O)N1pN2, (m27,3’-O)Gppp(m2’-O)N1pN2), wherein N1 and N2 are as defined and described herein. In some embodiments, the 5’ cap is a trinucleotide cap2 structure (e.g., (m7)Gppp(m2’-O)N1p(m2’-O)N2, (m2 7,2’-O)Gppp(m2’-O)N1p(m2’-O)N2, (m2 7,3’-O)Gppp(m2’- O)N1p(m2’-O)N2), wherein N1 and N2 are as defined and described herein. In some embodiments, the 5’ cap is selected from the group consisting of (m27,3’-O)Gppp(m2’-O)ApG (“CleanCap AG”, “CC413”), (m2 7,3’-O)Gppp(m2’-O)GpG (“CleanCap GG”), (m7)Gppp(m2’-O)ApG, (m7)Gppp(m2’- O)GpG, (m27,3’-O)Gppp(m26,2’-O)ApG, and (m7)Gppp(m2’-O)ApU. Page 65 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0292] In some embodiments, the 5’ cap is (m27,3’-O)Gppp(m2’-O)ApG (“CleanCap AG”, “CC413”), having a structure: or a salt
[0293] In some embodiments, the 5’ cap is (m2 7,3’-O)Gppp(m2’-O)GpG (“CleanCap GG”), having a structure: or a salt
[0294] In some embodiments, the 5’ cap is (m7)Gppp(m2’-O)ApG, having a structure: Page 66 of 214 12608199v1
Attorney Docket No. 2013237-1122 or a salt
[0295] In some embodiments, the 5’ cap is (m7)Gppp(m2’-O)GpG, having a structure: or a salt
[0296] In some embodiments, the 5’ cap is (m2 7,3’-O)Gppp(m2 6,2’-O)ApG, having a structure: Page 67 of 214 12608199v1
Attorney Docket No. 2013237-1122 or a salt
[0297] In some embodiments, the 5’ cap is (m7)Gppp(m2’-O)ApU, having a structure: or a salt
[0298] In some embodiments, the 5’ cap is a tetranucleotide cap structure. In some embodiments, the 5’ cap is a tetranucleotide cap structure comprising N1pN2pN3, wherein N1, N2, and N3 are as defined and described herein. In some embodiments, the 5’ cap is a tetranucleotide cap G*N1pN2pN3, wherein N1, N2, and N3 are as defined above and herein, and G* comprises a structure of formula (I): Page 68 of 214 12608199v1
Attorney Docket No. 2013237-1122 or a salt thereof,
[0299] In some embodiments, the 5’ cap is a tetranucleotide cap0 structure (e.g. (m7)GpppN1pN2pN3, (m2 7,2’-O)GpppN1pN2pN3, or (m2 7,3’-O)GpppN1N2pN3), wherein N1, N2, and N3 are as defined and described herein). In some embodiments, the 5’ cap is a tetranucleotide Cap1 structure (e.g., (m7)Gppp(m2’-O)N1pN2pN3, (m27,2’-O)Gppp(m2’-O)N1pN2pN3, (m27,3’- O)Gppp(m2’-O)N1pN2N3), wherein N1, N2, and N3 are as defined and described herein. In some embodiments, the 5’ cap is a tetranucleotide Cap2 structure (e.g., (m7)Gppp(m2’-O)N1p(m2’- O)N2pN3, (m27,2’-O)Gppp(m2’-O)N1p(m2’-O)N2pN3, (m27,3’-O)Gppp(m2’-O)N1p(m2’-O)N2pN3), wherein N1, N2, and N3 are as defined and described herein. In some embodiments, the 5’ cap is selected from the group consisting of (m2 7,3’-O)Gppp(m2’-O)Ap(m2’-O)GpG, (m2 7,3’-O)Gppp(m2’- O)Gp(m2’-O)GpC, (m7)Gppp(m2’-O)Ap(m2’-O)UpA, and (m7)Gppp(m2’-O)Ap(m2’-O)GpG. [0300] In some embodiments, the 5’ cap is (m27,3’-O)Gppp(m2’-O)Ap(m2’-O)GpG, having a structure: Page 69 of 214 12608199v1
Attorney Docket No. 2013237-1122 OH O NH2 N or a salt
[0301] In some embodiments, the 5’ cap is (m2 7,3’-O)Gppp(m2’-O)Gp(m2’-O)GpC, having a structure:
or a salt [0302] In some embodiments, the 5’ cap is (m7)Gppp(m2’-O)Ap(m2’-O)UpA, having a structure: Page 70 of 214 12608199v1
Attorney Docket No. 2013237-1122 or a salt
[0303] In some embodiments, the 5’ cap is (m7)Gppp(m2’-O)Ap(m2’-O)GpG, having a structure:
or a salt Page 71 of 214 12608199v1
Attorney Docket No. 2013237-1122 2. Cap Proximal Sequences [0304] In some embodiments, a 5’ UTR utilized in accordance with the present disclosure comprises a cap proximal sequence, e.g., as disclosed herein. In some embodiments, a cap proximal sequence comprises a sequence adjacent to a 5’ cap. In some embodiments, a cap proximal sequence comprises nucleotides in positions +1, +2, +3, +4, and/or +5 of an RNA polynucleotide. [0305] In some embodiments, a cap structure comprises one or more polynucleotides of a cap proximal sequence. In some embodiments, a cap structure comprises an m7 Guanosine cap and nucleotide +1 (N1) of an RNA polynucleotide. In some embodiments, a cap structure comprises an m7 Guanosine cap and nucleotide +2 (N2) of an RNA polynucleotide. In some embodiments, a cap structure comprises an m7 Guanosine cap and nucleotides +1 and +2 (N1 and N2) of an RNA polynucleotide. In some embodiments, a cap structure comprises an m7 Guanosine cap and nucleotides +1, +2, and +3 (N1, N2, and N3) of an RNA polynucleotide. [0306] Those skilled in the art, reading the present disclosure, will appreciate that, in some embodiments, one or more residues of a cap proximal sequence (e.g., one or more of residues +1, +2, +3, +4, and/or +5) may be included in an RNA by virtue of having been included in a cap entity (e.g., a cap1 or cap2 structure, etc.); alternatively, in some embodiments, at least some of the residues in a cap proximal sequence may be enzymatically added (e.g., by a polymerase such as a T7 polymerase). For example, in certain exemplified embodiments where a m2 7,3’- OGppp(m1 2’-O)ApG cap is utilized, +1 (i.e., N1) and +2 (i.e. N2) are the (m1 2’-O)A and G residues of the cap, and +3, +4, and +5 are added by polymerase (e.g., T7 polymerase). [0307] In some embodiments, the 5’ cap is a dinucleotide cap structure, wherein the cap proximal sequence comprises N1 of the 5’ cap, where N1 is any nucleotide, e.g., A, C, G or U. In some embodiments, the 5’ cap is a trinucleotide cap structure (e.g., the trinucleotide cap structures described above and herein), wherein the cap proximal sequence comprises N1 and N2 of the 5’ cap, wherein N1 and N2 are independently any nucleotide, e.g., A, C, G or U. In some embodiments, the 5’ cap is a tetranucleotide cap structure (e.g., the trinucleotide cap structures described above and herein), wherein the cap proximal sequence comprises N1, N2, and N3 of the 5’ cap, wherein N1, N2, and N3 are any nucleotide, e.g., A, C, G or U. Page 72 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0308] In some embodiments, e.g., where the 5’ cap is a dinucleotide cap structure, a cap proximal sequence comprises N1 of a the 5’ cap, and N2, N3, N4 and N5, wherein N1 to N5 correspond to positions +1, +2, +3, +4, and/or +5 of an RNA polynucleotide. In some embodiments, e.g., where the 5’ cap is a trinucleotide cap structure, a cap proximal sequence comprises N1 and N2 of a the 5’ cap, and N3, N4 and N5, wherein N1 to N5 correspond to positions +1, +2, +3, +4, and/or +5 of an RNA polynucleotide. In some embodiments, e.g., where the 5’ cap is a tetranucleotide cap structure, a cap proximal sequence comprises N1, N2, and N3 of a the 5’ cap, and N4 and N5, wherein N1 to N5 correspond to positions +1, +2, +3, +4, and/or +5 of an RNA polynucleotide. [0309] In some embodiments, N1 is A. In some embodiments, N1 is C. In some embodiments, N1 is G. In some embodiments, N1 is U. In some embodiments, N2 is A. In some embodiments, N2 is C. In some embodiments, N2 is G. In some embodiments, N2 is U. In some embodiments, N3 is A. In some embodiments, N3 is C. In some embodiments, N3 is G. In some embodiments, N3 is U. In some embodiments, N4 is A. In some embodiments, N4 is C. In some embodiments, N4 is G. In some embodiments, N4 is U. In some embodiments, N5 is A. In some embodiments, N5 is C. In some embodiments, N5 is G. In some embodiments, N5 is U. It will be understood that, each of the embodiments described above and herein (e.g., for N1 through N5) may be taken singly or in combination and/or may be combined with other embodiments of variables described above and herein (e.g., 5’ caps). 3. 5’ UTR [0310] In some embodiments, a nucleic acid (e.g., DNA, RNA) utilized in accordance with the present disclosure comprises a 5'-UTR. In some embodiments, 5’-UTR may comprise a plurality of distinct sequence elements; in some embodiments, such plurality may be or comprise multiple copies of one or more particular sequence elements (e.g., as may be from a particular source or otherwise known as a functional or characteristic sequence element). In some embodiments a 5’ UTR comprises multiple different sequence elements. [0311] The term “untranslated region” or “UTR” is commonly used in the art to a region in a DNA molecule which is transcribed but is not translated into an amino acid sequence, or to the corresponding region in an RNA polynucleotide, such as an mRNA molecule. An untranslated region (UTR) can be present 5' (upstream) of an open reading frame (5'-UTR) and/or 3' Page 73 of 214 12608199v1
Attorney Docket No. 2013237-1122 (downstream) of an open reading frame (3'-UTR). As used herein, the terms “five prime untranslated region” or “5' UTR” refer to a sequence of a polyribonucleotide between the 5' end of the polyribonucleotide (e.g., a transcription start site) and a start codon of a coding region of the polyribonucleotide. In some embodiments, “5' UTR” refers to a sequence of a polyribonucleotide that begins at the 5' end of the polyribonucleotide (e.g., a transcription start site) and ends one nucleotide (nt) before a start codon (usually AUG) of a coding region of the polyribonucleotide, e.g., in its natural context. In some embodiments, a 5' UTR comprises a Kozak sequence. A 5'-UTR is downstream of the 5'-cap (if present), e.g., directly adjacent to the 5'-cap. In some embodiments, a 5’ UTR disclosed herein comprises a cap proximal sequence, e.g., as defined and described herein. In some embodiments, a cap proximal sequence comprises a sequence adjacent to a 5’ cap. [0312] Exemplary 5’ UTRs include a human alpha globin (hAg) 5’UTR or a fragment thereof, a TEV 5’ UTR or a fragment thereof, a HSP705’ UTR or a fragment thereof, or a c-Jun 5’ UTR or a fragment thereof. [0313] In some embodiments, an RNA disclosed herein comprises a hAg 5’ UTR or a fragment thereof. [0314] In some embodiments, an RNA disclosed herein comprises a 5’ UTR having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity to a 5’ UTR with the sequence AGAATAAACTAGTATTCTTCTGGTCCCCACAGACTCAGAGAGAACCCGCCACC. In some embodiments, an RNA disclosed herein comprises a 5’ UTR having the sequence AGAATAAACTAGTATTCTTCTGGTCCCCACAGACTCAGAGAGAACCCGCCACC. 4. PolyA Tail [0315] In some embodiments, a polynucleotide (e.g., DNA, RNA) disclosed herein comprises a polyadenylate (polyA) sequence, e.g., as described herein. In some embodiments, a polyA sequence is situated downstream of a 3'-UTR, e.g., adjacent to a 3'-UTR. [0316] As used herein, the term “poly(A) sequence” or “poly-A tail” refers to an uninterrupted or interrupted sequence of adenylate residues which is typically located at the 3'-end of an RNA polynucleotide. Poly(A) sequences are known to those of skill in the art and may follow the 3’- Page 74 of 214 12608199v1
Attorney Docket No. 2013237-1122 UTR in the RNAs described herein. An uninterrupted poly(A) sequence is characterized by consecutive adenylate residues. In nature, an uninterrupted poly(A) sequence is typical. In some embodiments, polynucleotides disclosed herein comprise an uninterrupted Poly(A) sequence. In some embodiments, polynucleotides disclosed herein comprise interrupted Poly(A) sequence. In some embodiments, RNAs disclosed herein can have a poly(A) sequence attached to the free 3'- end of the RNA by a template-independent RNA polymerase after transcription or a poly(A) sequence encoded by DNA and transcribed by a template-dependent RNA polymerase. [0317] It has been demonstrated that a poly(A) sequence of about 120 A nucleotides has a beneficial influence on the levels of RNA in transfected eukaryotic cells, as well as on the levels of protein that is translated from an open reading frame that is present upstream (5’) of the poly(A) sequence (Holtkamp et al., 2006, Blood, vol. 108, pp. 4009-4017, which is herein incorporated by reference). [0318] In some embodiments, a poly(A) sequence in accordance with the present disclosure is not limited to a particular length; in some embodiments, a poly(A) sequence is any length. In some embodiments, a poly(A) sequence comprises, essentially consists of, or consists of at least 20, at least 30, at least 40, at least 80, or at least 100 and up to 500, up to 400, up to 300, up to 200, or up to 150 A nucleotides, and, in particular, about 120 A nucleotides. In this context, "essentially consists of" means that most nucleotides in the poly(A) sequence, typically at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% by number of nucleotides in the poly(A) sequence are A nucleotides, but permits that remaining nucleotides are nucleotides other than A nucleotides, such as U nucleotides (uridylate), G nucleotides (guanylate), or C nucleotides (cytidylate). In this context, "consists of" means that all nucleotides in the poly(A) sequence, i.e., 100% by number of nucleotides in the poly(A) sequence, are A nucleotides. The term “A nucleotide” or “A” refers to adenylate. [0319] In some embodiments, a poly(A) sequence is attached during RNA transcription, e.g., during preparation of in vitro transcribed RNA, based on a DNA template comprising repeated dT nucleotides (deoxythymidylate) in the strand complementary to the coding strand. The DNA sequence encoding a poly(A) sequence (coding strand) is referred to as poly(A) cassette. Page 75 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0320] In some embodiments, the poly(A) cassette present in the coding strand of DNA essentially consists of dA nucleotides, but is interrupted by a random sequence of the four nucleotides (dA, dC, dG, and dT). Such random sequence may be 5 to 50, 10 to 30, or 10 to 20 nucleotides in length. Such a cassette is disclosed in WO 2016/005324 A1, hereby incorporated by reference. Any poly(A) cassette disclosed in WO 2016/005324 A1 may be used in accordance with the present disclosure. A poly(A) cassette that essentially consists of dA nucleotides, but is interrupted by a random sequence having an equal distribution of the four nucleotides (dA, dC, dG, dT) and having a length of e.g., 5 to 50 nucleotides shows, on DNA level, constant propagation of plasmid DNA in E. coli and is still associated, on RNA level, with the beneficial properties with respect to supporting RNA stability and translational efficiency is encompassed. In some embodiments, the poly(A) sequence contained in an RNA polynucleotide described herein essentially consists of A nucleotides, but is interrupted by a random sequence of the four nucleotides (A, C, G, U). Such random sequence may be 5 to 50, 10 to 30, or 10 to 20 nucleotides in length. [0321] In some embodiments, no nucleotides other than A nucleotides flank a poly(A) sequence at its 3'-end, i.e., the poly(A) sequence is not masked or followed at its 3'-end by a nucleotide other than A. [0322] In some embodiments, the poly(A) sequence may comprise at least 20, at least 30, at least 40, at least 80, or at least 100 and up to 500, up to 400, up to 300, up to 200, or up to 150 nucleotides. In some embodiments, the poly(A) sequence may essentially consist of at least 20, at least 30, at least 40, at least 80, or at least 100 and up to 500, up to 400, up to 300, up to 200, or up to 150 nucleotides. In some embodiments, the poly(A) sequence may consist of at least 20, at least 30, at least 40, at least 80, or at least 100 and up to 500, up to 400, up to 300, up to 200, or up to 150 nucleotides. In some embodiments, the poly(A) sequence comprises at least 100 nucleotides. In some embodiments, the poly(A) sequence comprises about 150 nucleotides. In some embodiments, the poly(A) sequence comprises about 120 nucleotides. [0323] In some embodiments, a poly A tail comprises a specific number of Adenosines, such as about 50 or more, about 60 or more, about 70 or more, about 80 or more, about 90 or more, about 100 or more, about 120, or about 150 or about 200. In some embodiments a poly A tail of a string construct may comprise 200 A residues or less. In some embodiments, a poly A tail of a Page 76 of 214 12608199v1
Attorney Docket No. 2013237-1122 string construct may comprise about 200 A residues. In some embodiments, a poly A tail of a string construct may comprise 180 A residues or less. In some embodiments, a poly A tail of a string construct may comprise about 180 A residues. In some embodiments, a poly A tail may comprise 150 residues or less. [0324] In some embodiments, RNA comprises a poly(A) sequence comprising the nucleotide sequence of AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCATATGACTAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A, or a nucleotide sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 85%, or 80% identity to the nucleotide sequence of AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCATATGACTAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A. In some embodiments, a poly(A) tail comprises a plurality of A residues interrupted by a linker. In some embodiments, a linker comprises the nucleotide sequence GCATATGAC. 5. 3' UTR [0325] In some embodiments, an RNA utilized in accordance with the present disclosure comprises a 3'-UTR. As used herein, the terms “three prime untranslated region,” “3' untranslated region,” or “3' UTR” refer to a sequence of an mRNA molecule that begins following a stop codon of a coding region of an open reading frame sequence. In some embodiments, the 3' UTR begins immediately after a stop codon of a coding region of an open reading frame sequence, e.g., in its natural context. In other embodiments, the 3' UTR does not begin immediately after stop codon of the coding region of an open reading frame sequence, e.g., in its natural context. The term “3'-UTR” does preferably not include the poly(A) sequence. Thus, the 3'-UTR is upstream of the poly(A) sequence (if present), e.g. directly adjacent to the poly(A) sequence. [0326] In some embodiments, an RNA disclosed herein comprises a 3’ UTR comprising an F element and/or an I element. In some embodiments, a 3’ UTR or a proximal sequence thereto comprises a restriction site. In some embodiments, a restriction site is a BamHI site. In some embodiments, a restriction site is a XhoI site. Page 77 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0327] In some embodiments, an RNA construct comprises an F element. In some embodiments, a F element sequence is a 3’-UTR of amino-terminal enhancer of split (AES). [0328] In some embodiments, an RNA disclosed herein comprises a 3’ UTR having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity to a 3’ UTR with the sequence of CTGGTACTGCATGCACGCAATGCTAGCTGCCCCTTTCCCGTCCTGGGTACCCCGAGTC TCCCCCGACCTCGGGTCCCAGGTATGCTCCCACCTCCACCTGCCCCACTCACCACCTC TGCTAGTTCCAGACACCTCCCAAGCACGCAGCAATGCAGCTCAAAACGCTTAGCCTA GCCACACCCCCACGGGAAACAGCAGTGATTAACCTTTAGCAATAAACGAAAGTTTAA CTAAGCTATACTAACCCCAGGGTTGGTCAATTTCGTGCCAGCCACACC. In some embodiments, an RNA disclosed herein comprises a 3’ UTR with the sequence of CTGGTACTGCATGCACGCAATGCTAGCTGCCCCTTTCCCGTCCTGGGTACCCCGAGTC TCCCCCGACCTCGGGTCCCAGGTATGCTCCCACCTCCACCTGCCCCACTCACCACCTC TGCTAGTTCCAGACACCTCCCAAGCACGCAGCAATGCAGCTCAAAACGCTTAGCCTA GCCACACCCCCACGGGAAACAGCAGTGATTAACCTTTAGCAATAAACGAAAGTTTAA CTAAGCTATACTAACCCCAGGGTTGGTCAATTTCGTGCCAGCCACACC. [0329] In some embodiments, a 3’UTR is an FI element as described in WO2017/060314, which is herein incorporated by reference in its entirety. B. RNA Formats [0330] At least three distinct formats useful for RNA compositions (e.g., pharmaceutical compositions) have been developed, namely non-modified uridine containing mRNA (uRNA), nucleoside-modified mRNA (modRNA), and self-amplifying mRNA (saRNA). Each of these platforms displays unique features. In general, in all three formats, RNA is capped, contains open reading frames (ORFs) flanked by untranslated regions (UTR), and have a polyA-tail at the 3' end. An ORF of an uRNA or modRNA encode a polyepitopic vaccine construct described herein. An saRNA has multiple ORFs. [0331] In some embodiments, the RNA described herein may have modified nucleosides. In some embodiments, the RNA comprises a modified nucleoside in place of at least one (e.g., every) uridine. Page 78 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0332] The term “uracil,” as used herein, describes one of the nucleobases that can occur in the nucleic acid of RNA. The structure of uracil is: . [0333] The term “uridine,” as used
one of the nucleosides that can occur in RNA. The structure of uridine is: .
[0334] UTP (uridine 5’-triphosphate) structure: . [0335] Pseudo-UTP
structure: .
Page 79 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0336] “Pseudouridine” is one example of a modified nucleoside that is an isomer of uridine, where the uracil is attached to the pentose ring via a carbon-carbon bond instead of a nitrogen- carbon glycosidic bond. [0337] Another exemplary modified nucleoside is N1-methyl-pseudouridine (m1Ψ), which has the structure: .
[0338] N1-methyl-pseudo-UTP has the following structure: . [0339] Another exemplary
(m5U), which has the structure: .
[0340] In some embodiments, one or more uridine in the RNA described herein is replaced by a modified nucleoside. In some embodiments, the modified nucleoside is a modified uridine. Page 80 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0341] In some embodiments, RNA comprises a modified nucleoside in place of at least one uridine. In some embodiments, RNA comprises a modified nucleoside in place of each uridine. [0342] In some embodiments, the modified nucleoside is independently selected from pseudouridine (ψ), N1-methyl-pseudouridine (m1ψ), and 5-methyl-uridine (m5U). In some embodiments, the modified nucleoside comprises pseudouridine (ψ). In some embodiments, the modified nucleoside comprises N1-methyl-pseudouridine (m1ψ). In some embodiments, the modified nucleoside comprises 5-methyl-uridine (m5U). In some embodiments, RNA may comprise more than one type of modified nucleoside, and the modified nucleosides are independently selected from pseudouridine (ψ), N1-methyl-pseudouridine (m1ψ), and 5-methyl- uridine (m5U). In some embodiments, the modified nucleosides comprise pseudouridine (ψ) and N1-methyl-pseudouridine (m1ψ). In some embodiments, the modified nucleosides comprise pseudouridine (ψ) and 5-methyl-uridine (m5U). In some embodiments, the modified nucleosides comprise N1-methyl-pseudouridine (m1ψ) and 5-methyl-uridine (m5U). In some embodiments, the modified nucleosides comprise pseudouridine (ψ), N1-methyl-pseudouridine (m1ψ), and 5- methyl-uridine (m5U). [0343] In some embodiments, the modified nucleoside replacing one or more, e.g., all, uridine in the RNA may be any one or more of 3-methyl-uridine (m3U), 5-methoxy-uridine (mo5U), 5-aza- uridine, 6-aza-uridine, 2-thio-5-aza-uridine, 2-thio-uridine (s2U), 4-thio-uridine (s4U), 4-thio- pseudouridine, 2-thio-pseudouridine, 5-hydroxy-uridine (ho5U), 5-aminoallyl-uridine, 5-halo- uridine (e.g., 5-iodo-uridine or 5-bromo-uridine), uridine 5-oxyacetic acid (cmo5U), uridine 5- oxyacetic acid methyl ester (mcmo5U), 5-carboxymethyl-uridine (cm5U), 1-carboxymethyl- pseudouridine, 5-carboxyhydroxymethyl-uridine (chm5U), 5-carboxyhydroxymethyl-uridine methyl ester (mchm5U), 5-methoxycarbonylmethyl-uridine (mcm5U), 5- methoxycarbonylmethyl-2-thio-uridine (mcm5s2U), 5-aminomethyl-2-thio-uridine (nm5s2U), 5- methylaminomethyl-uridine (mnm5U), 1-ethyl-pseudouridine, 5-methylaminomethyl-2-thio- uridine (mnm5s2U), 5-methylaminomethyl-2-seleno-uridine (mnm5se2U), 5-carbamoylmethyl- uridine (ncm5U), 5-carboxymethylaminomethyl-uridine (cmnm5U), 5- carboxymethylaminomethyl-2-thio-uridine (cmnm5s2U), 5-propynyl-uridine, 1-propynyl- pseudouridine, 5-taurinomethyl-uridine (τm5U), 1-taurinomethyl-pseudouridine, 5- taurinomethyl-2-thio-uridine(τm5s2U), 1-taurinomethyl-4-thio-pseudouridine), 5-methyl-2-thio- Page 81 of 214 12608199v1
Attorney Docket No. 2013237-1122 uridine (m5s2U), 1-methyl-4-thio-pseudouridine (m1s4ψ), 4-thio-1-methyl-pseudouridine, 3- methyl-pseudouridine (m3ψ), 2-thio-1-methyl-pseudouridine, 1-methyl-1-deaza-pseudouridine, 2-thio-1-methyl-1-deaza-pseudouridine, dihydrouridine (D), dihydropseudouridine, 5,6- dihydrouridine, 5-methyl-dihydrouridine (m5D), 2-thio-dihydrouridine, 2-thio- dihydropseudouridine, 2-methoxy-uridine, 2-methoxy-4-thio-uridine, 4-methoxy-pseudouridine, 4-methoxy-2-thio-pseudouridine, N1-methyl-pseudouridine, 3-(3-amino-3- carboxypropyl)uridine (acp3U), 1-methyl-3-(3-amino-3-carboxypropyl)pseudouridine (acp3 ψ), 5-(isopentenylaminomethyl)uridine (inm5U), 5-(isopentenylaminomethyl)-2-thio-uridine (inm5s2U), α-thio-uridine, 2′-O-methyl-uridine (Um), 5,2′-O-dimethyl-uridine (m5Um), 2′-O- methyl-pseudouridine (ψm), 2-thio-2′-O-methyl-uridine (s2Um), 5-methoxycarbonylmethyl-2′- O-methyl-uridine (mcm5Um), 5-carbamoylmethyl-2′-O-methyl-uridine (ncm5Um), 5- carboxymethylaminomethyl-2′-O-methyl-uridine (cmnm5Um), 3,2′-O-dimethyl-uridine (m3Um), 5-(isopentenylaminomethyl)-2′-O-methyl-uridine (inm5Um), 1-thio-uridine, deoxythymidine, 2′-F-ara-uridine, 2′-F-uridine, 2′-OH-ara-uridine, 5-(2-carbomethoxyvinyl) uridine, 5-[3-(1-E-propenylamino)uridine, or any other modified uridine known in the art. [0344] In some embodiments, the RNA comprises other modified nucleosides or comprises further modified nucleosides, e.g., modified cytidine. For example, in some embodiments, in the RNA 5-methylcytidine is substituted partially or completely, preferably completely, for cytidine. In some embodiments, the RNA comprises 5-methylcytidine and one or more selected from pseudouridine (ψ), N1-methyl-pseudouridine (m1ψ), and 5-methyl-uridine (m5U). In some embodiments, the RNA comprises 5-methylcytidine and N1-methyl-pseudouridine (m1ψ). In some embodiments, the RNA comprises 5-methylcytidine in place of each cytidine and N1- methyl-pseudouridine (m1ψ) in place of each uridine. [0345] In some embodiments of the present disclosure, the RNA is “replicon RNA” or simply a “replicon,” in particular “self-replicating RNA” or “self-amplifying RNA.” In one particularly preferred embodiment, the replicon or self-replicating RNA is derived from or comprises elements derived from a single-stranded (ss) RNA virus, in particular a positive-stranded ssRNA virus, such as an alphavirus. Alphaviruses are typical representatives of positive-stranded RNA viruses. Alphaviruses replicate in the cytoplasm of infected cells (for review of the alphaviral life cycle see José et al., Future Microbiol., 2009, vol. 4, pp. 837–856, which is incorporated Page 82 of 214 12608199v1
Attorney Docket No. 2013237-1122 herein by reference in its entirety). The total genome length of many alphaviruses typically ranges between 11,000 and 12,000 nucleotides, and the genomic RNA typically has a 5’-cap, and a 3’ poly(A) tail. The genome of alphaviruses encodes non-structural proteins (involved in transcription, modification and replication of viral RNA and in protein modification) and structural proteins (forming the virus particle). There are typically two open reading frames (ORFs) in the genome. The four non-structural proteins (nsP1–nsP4) are typically encoded together by a first ORF beginning near the 5′ terminus of the genome, while alphavirus structural proteins are encoded together by a second ORF which is found downstream of the first ORF and extends near the 3’ terminus of the genome. Typically, the first ORF is larger than the second ORF, the ratio being roughly 2:1. In cells infected by an alphavirus, only the nucleic acid sequence encoding non-structural proteins is translated from the genomic RNA, while the genetic information encoding structural proteins is translatable from a subgenomic transcript, which is an RNA molecule that resembles eukaryotic messenger RNA (mRNA; Gould et al., 2010, Antiviral Res., vol. 87 pp. 111–124). Following infection, i.e. at early stages of the viral life cycle, the (+) stranded genomic RNA directly acts like a messenger RNA for the translation of the open reading frame encoding the non-structural poly-protein (nsP1234). [0346] Alphavirus-derived vectors have been proposed for delivery of foreign genetic information into target cells or target organisms. In simple approaches, a first ORF encodes an alphavirus-derived RNA-dependent RNA polymerase (replicase), which upon translation mediates self-amplification of the RNA. A second ORF encoding alphaviral structural proteins is replaced by an open reading frame encoding a polyepitopic vaccine construct described herein. Alphavirus-based trans-replication systems rely on alphavirus nucleotide sequence elements on two separate nucleic acid molecules: one nucleic acid molecule encodes a viral replicase, and the other nucleic acid molecule is capable of being replicated by said replicase in trans (hence the designation trans-replication system). Trans-replication requires the presence of both these nucleic acid molecules in a given host cell. The nucleic acid molecule capable of being replicated by the replicase in trans must comprise certain alphaviral sequence elements to allow recognition and RNA synthesis by the alphaviral replicase. [0347] Features of a non-modified uridine platform may include, for example, one or more of intrinsic adjuvant effect, as well as good tolerability and safety. Features of modified uridine Page 83 of 214 12608199v1
Attorney Docket No. 2013237-1122 (e.g., pseudouridine) platform may include reduced adjuvant effect, blunted immune innate immune sensor activating capacity and thus good tolerability and safety. Features of self- amplifying platform may include, for example, long duration of protein expression, good tolerability and safety, higher likelihood for efficacy with very low vaccine dose. [0348] The present disclosure provides particular RNA constructs optimized, for example, for improved manufacturability, encapsulation, expression level (and/or timing), etc. Certain components are discussed below, and certain preferred embodiments are exemplified herein. C. Codon Optimization and GC Enrichment [0349] As used herein, the term “codon-optimized” refers to alteration of codons in a coding region of a nucleic acid molecule (e.g., a polyribonucleotide) to reflect the typical codon usage of a host organism (e.g., a subject receiving a nucleic acid molecule (e.g., a polyribonucleotide)) without preferably altering the amino acid sequence encoded by the nucleic acid molecule. Within the context of the present disclosure, in some embodiments, coding regions are codon- optimized for optimal expression in a subject to be treated using the RNA molecules described herein. In some embodiments, codon-optimization may be performed such that codons for which frequently occurring tRNAs are available are inserted in place of “rare codons.” In some embodiments, codon-optimization may include increasing guanosine/cytosine (G/C) content of a coding region of RNA described herein as compared to the G/C content of the corresponding coding sequence of a wild type RNA, wherein the amino acid sequence encoded by the RNA is preferably not modified compared to the amino acid sequence. [0350] In some embodiments, a coding sequence (also referred to as a “coding region”) is codon optimized for expression in the subject to whom a composition (e.g., a pharmaceutical composition) is to be administered (e.g., a human). Thus, in some embodiments, sequences in such a polynucleotide (e.g., a polyribonucleotide) may differ from wild type sequences encoding the relevant antigen or fragment or epitope thereof, even when the amino acid sequence of the antigen or fragment or epitope thereof is wild type. [0351] In some embodiments, strategies for codon optimization for expression in a relevant subject (e.g., a human), and even, in some cases, for expression in a particular cell or tissue. Page 84 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0352] Various species exhibit particular bias for certain codons of a particular amino acid. Without wishing to be bound by any one theory, codon bias (differences in codon usage between organisms) often correlates with the efficiency of translation of messenger RNA (mRNA), which is in turn believed to be dependent on, among other things, the properties of the codons being translated and the availability of particular transfer RNA (tRNA) molecules. The predominance of selected tRNAs in a cell may generally be a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes may be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are available, for example, at the "Codon Usage Database" available at www.kazusa.orjp/codon/ and these tables may be adapted in a number of ways. Computer algorithms for codon optimizing a particular sequence for expression in a particular subject or its cells are also available, such as Gene Forge (Aptagen; Jacobus, PA), are also available. [0353] In some embodiments, a polynucleotide (e.g., a polyribonucleotide) of the present disclosure is codon optimized, wherein the codons in the polynucleotide (e.g., the polyribonucleotide) are adapted to human codon usage (herein referred to as “human codon optimized polynucleotide”). Codons encoding the same amino acid occur at different frequencies in a subject, e.g., a human. Accordingly, in some embodiments, the coding sequence of a polynucleotide of the present disclosure is modified such that the frequency of the codons encoding the same amino acid corresponds to the naturally occurring frequency of that codon according to the human codon usage, e.g., as shown in Table 4. For example, in the case of the amino acid Ala, the wild type coding sequence is preferably adapted in a way that the codon “GCC” is used with a frequency of 0.40, the codon “GCT” is used with a frequency of 0.28, the codon “GCA” is used with a frequency of 0.22 and the codon “GCG” is used with 30 a frequency of 0.10 etc. (see Table 4). Accordingly, in some embodiments, such a procedure (as exemplified for Ala) is applied for each amino acid encoded by the coding sequence of a polynucleotide to obtain sequences adapted to human codon usage. Table 4: Human codon usage table with frequencies indicated for each amino acid. Amino acid Codon Frequency Amino acid Codon Frequency Ala GCG 0.10 Pro CCG 0.11 Ala GCA 0.22 Pro CCA 0.27 Ala GCT 0.28 Pro CCT 0.29 Page 85 of 214 12608199v1
Attorney Docket No. 2013237-1122 Amino acid Codon Frequency Amino acid Codon Frequency Ala GCC* 0.40 Pro CCC* 0.33 Cys TGT 0.42 Gin CAG* 0.73 Cys TGC* 0.58 Gin CAA 0.27 Asp GAT 0.44 Arg AGG 0.22 Asp GAC* 0.56 Arg AGA* 0.21 Glu GAG* 0.59 Arg CGG 0.19 Glu GAA 0.41 Arg CGA 0.10 Phe TTT 0.43 Arg CGT 0.09 Phe TTC* 0.57 Arg CGC 0.19 Gly GGG 0.23 Ser AGT 0.14 Gly GGA 0.26 Ser AGC* 0.25 Gly GGT 0.18 Ser TCG 0.06 Gly GGC* 0.33 Ser TCA 0.15 His CAT 0.41 Ser TCT 0.18 His CAC* 0.59 Ser TCC 0.23 lle ATA 0.14 Thr ACG 0.12 lle ATT 0.35 Thr ACA 0.27 lle ATC* 0.52 Thr ACT 0.23 Lys AAG* 0.60 Tor ACC* 0.38 Lys AAA 0.40 Val GTG* 0.48 Leu TTG 0.12 Val GTA 0.10 Leu TTA 0.06 Val GTT 0.17 Leu CTG* 0.43 Val GTC 0.25 Leu CTA 0.07 Trp TGG* 1 Leu CTT 0.12 Tyr TAT 042 Lou CTC 0.20 Tyr TAC* 0.58 Met ATG* 1 Stop TGA* 061 Asn AAT 0.44 Stop TAG 0.17 Asn AAC* 0.56 Stop TAA 0.22 [0354] Certain strategies for codon optimization and/or G/C enrichment for human expression are described in WO2002/098443, which is incorporated by reference herein in its entirety. In some embodiments, a coding sequence may be optimized using a multiparametric optimization strategy. In some embodiments, optimization parameters may include parameters that influence protein expression, which can be, for example, impacted on a transcription level, an mRNA level, and/or a translational level. In some embodiments, exemplary optimization parameters include, but are not limited to transcription-level parameters (including, e.g., GC content, consensus splice sites, cryptic splice sites, SD sequences, TATA boxes, termination signals, artificial recombination sites, and combinations thereof); mRNA-level parameters (including, Page 86 of 214 12608199v1
Attorney Docket No. 2013237-1122 e.g., RNA instability motifs, ribosomal entry sites, repetitive sequences, and combinations thereof); translation-level parameters (including, e.g., codon usage, premature poly(A) sites, ribosomal entry sites, secondary structures, and combinations thereof); or combinations thereof. In some embodiments, a coding sequence may be optimized by a GeneOptimizer algorithm as described in Fath et al. “Multiparameter RNA and Codon Optimization: A Standardized Tool to Assess and Enhance Autologous Mammalian Gene Expression” PLoS ONE 6(3): e17596; Rabb et al., “The GeneOptimizer Algorithm: using a sliding window approach to cope with the vast sequence space in multiparameter DNA sequence optimization” Systems and Synthetic Biology (2010) 4:215-225; and Graft et al. “Codon-optimized genes that enable increased heterologous expression in mammalian cells and elicit efficient immune responses in mice after vaccination of naked DNA” Methods Mol Med (2004) 94:197-210, the entire content of each of which is incorporated herein for the purposes described herein. In some embodiments, a coding sequence may be optimized by Eurofins’ adaption and optimization algorithm “GENEius” as described in Eurofins’ Application Notes: Eurofins’ adaption and optimization software “GENEius” in comparison to other optimization algorithms, the entire content of which is incorporated by reference for the purposes described herein. [0355] In some embodiments, a coding sequence utilized in accordance with the present disclosure has G/C content that is increased compared to a wild type coding sequence for a polyepitopic vaccine construct described herein, or a portion thereof. In some embodiments, guanosine/cytidine (G/C) content of a coding region is modified relative to a wild type coding sequence for a polyepitopic vaccine construct described herein, but the amino acid sequence encoded by the polyribonucleotide not modified. [0356] Without wishing to be bound by any particular theory, it is proposed that GC enrichment may improve translation of a payload sequence. Typically, sequences having an increased G (guanosine)/C (cytidine) content are more stable than sequences having an increased A (adenosine)/U (uridine) content. In respect to the fact that several codons code for one and the same amino acid (so-called degeneration of the genetic code), the most favorable codons for the stability can be determined (so-called alternative codon usage). Depending on the amino acid to be encoded by a polyribonucleotide, there are various possibilities for modification of the ribonucleic acid sequence, compared to its wild type sequence. In particular, codons which Page 87 of 214 12608199v1
Attorney Docket No. 2013237-1122 contain A and/or U nucleosides can be modified by substituting these codons by other codons, which code for the same amino acids but contain no A and/or U or contain a lower content of A and/or U nucleosides. [0357] In some embodiments, G/C content of a coding region of a polyribonucleotide described herein is increased by at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, at least 6%, or even more compared to the G/C content of the coding region prior to codon optimization, e.g., of the wild type RNA. In some embodiments, G/C content of a coding region of a polyribonucleotide described herein is decreased by at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, at least 6%, or even more compared to the G/C content of the coding region prior to codon optimization, e.g., of the wild type RNA. [0358] In some embodiments, stability and translation efficiency of a polyribonucleotide may incorporate one or more elements established to contribute to stability and/or translation efficiency of the polyribonucleotide; exemplary such elements are described, for example, in PCT/EP2006/009448 incorporated herein by reference. In some embodiments, to increase expression of a polyribonucleotide used according to the present disclosure, a polyribonucleotide may be modified within the coding region, i.e., the sequence encoding the expressed peptide or protein, without altering the sequence of the expressed peptide or protein, for example so as to increase the GC-content to increase mRNA stability and/or to perform a codon optimization and, thus, enhance translation in cells. D. Exemplary Polyribonucleotides Encoding Polyepitopic Vaccine Constructs [0359] Exemplary polyribonucleotides are depicted schematically in FIG.4, FIG.5, and FIG. 8. An exemplary polyribonucleotide encoding a polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes is shown schematically in FIG.4. The exemplary polyribonucleotide depicted in FIG.4 includes a 5’ cap analogue, a 5’ UTR, a coding sequence for a secretory signal (SEC), coding sequences for 8 shared tumor antigen epitopes, a coding sequence for a control epitope, a coding sequence for an MITD, a 3’ UTR, and a polyA tail. [0360] An exemplary polyribonucleotide encoding a polyepitopic vaccine construct comprising a plurality of neoantigen epitopes is shown schematically in FIG.5. The exemplary polyribonucleotide depicted in FIG.5 includes a 5’ cap analogue, a 5’ UTR, a coding sequence Page 88 of 214 12608199v1
Attorney Docket No. 2013237-1122 for a secretory signal (SEC), coding sequences for 10 neoantigen epitopes, a coding sequence for a control epitope, a coding sequence for an MITD, a 3’ UTR, and a polyA tail. III. RNA Delivery Technologies [0361] Provided polyribonucleotides may be delivered for therapeutic applications described herein using any appropriate methods known in the art, including, e.g., delivery as naked RNAs, or delivery mediated by viral and/or non-viral vectors, polymer-based vectors, lipid compositions, nanoparticles (e.g., lipid nanoparticles, polymeric nanoparticles, lipid-polymer hybrid nanoparticles, etc.), and/or peptide-based vectors. See, e.g., Wadhwa et al. “Opportunities and Challenges in the Delivery of mRNA-Based Vaccines” Pharmaceutics (2020) 102 (27 pages), the content of which is incorporated herein by reference, for information on various approaches that may be useful for delivery polyribonucleotides described herein. [0362] In some embodiments, one or more polyribonucleotides can be formulated with lipid nanoparticles for delivery (e.g., administration). [0363] In some embodiments, lipid nanoparticles can be designed to protect polyribonucleotides from extracellular RNases and/or engineered for systemic delivery of the RNA to target cells (e.g., liver cells). In some embodiments, such lipid nanoparticles may be particularly useful to deliver polyribonucleotides when polyribonucleotides are intravenously or intramuscularly administered to a subject. A. Lipid Compositions 1. Lipids and Lipid-Like Materials [0364] The terms "lipid" and "lipid-like material" are broadly defined herein as molecules which comprise one or more hydrophobic moieties or groups and optionally also one or more hydrophilic moieties or groups. Molecules comprising hydrophobic moieties and hydrophilic moieties are also frequently denoted as amphiphiles. Lipids are usually poorly soluble in water. In an aqueous environment, the amphiphilic nature allows the molecules to self-assemble into organized structures and different phases. One of those phases consists of lipid bilayers, as they are present in vesicles, multilamellar/unilamellar liposomes, or membranes in an aqueous environment. Hydrophobicity can be conferred by the inclusion of a polar groups that include, but are not limited to, long-chain saturated and unsaturated aliphatic hydrocarbon groups and Page 89 of 214 12608199v1
Attorney Docket No. 2013237-1122 such groups substituted by one or more aromatic, cycloaliphatic, or heterocyclic group(s). The hydrophilic groups may comprise polar and/or charged groups and include carbohydrates, phosphate, carboxylic, sulfate, amino, sulfhydryl, nitro, hydroxyl, and other like groups. [0365] Often, an amphiphilic compound has a polar head attached to a long hydrophobic tail. In some embodiments, the polar portion is soluble in water, while the non-polar portion is insoluble in water. In addition, the polar portion may have either a formal positive charge, or a formal negative charge. Alternatively, the polar portion may have both a formal positive and a negative charge, and be a zwitterion or inner salt. For purposes of the disclosure, the amphiphilic compound can be, but is not limited to, one or a plurality of natural or non-natural lipids and lipid-like compounds. [0366] A "lipid-like material" is a substance that is structurally and/or functionally related to a lipid but may not be considered a lipid in a strict sense. For example, the term includes compounds that are able to form amphiphilic layers as they are present in vesicles, multilamellar/unilamellar liposomes, or membranes in an aqueous environment and includes surfactants, or synthesized compounds with both hydrophilic and hydrophobic moieties. Generally speaking, the term refers to molecules, which comprise hydrophilic and hydrophobic moieties with different structural organization, which may or may not be similar to that of lipids. [0367] Specific examples of amphiphilic compounds that may be included in an amphiphilic layer include, but are not limited to, phospholipids, aminolipids and sphingolipids. [0368] Generally, lipids may be divided into eight categories: fatty acids, glycerolipids, glycerophospholipids, sphingolipids, saccharolipids, polyketides (derived from condensation of ketoacyl subunits), sterols and prenol lipids (derived from condensation of isoprene subunits). Although the term "lipid" is sometimes used as a synonym for fats, fats are a subgroup of lipids called triglycerides. Lipids also encompass molecules such as fatty acids and their derivatives (including tri-, di-, monoglycerides, and phospholipids), as well as sterol-containing metabolites such as cholesterol. [0369] Fatty acids are a diverse group of molecules made of a hydrocarbon chain that terminates with a carboxylic acid group; this arrangement confers the molecule with a polar, hydrophilic end, and a nonpolar, hydrophobic end that is insoluble in water. The carbon chain, typically between four and 24 carbons long, may be saturated or unsaturated, and may be attached to Page 90 of 214 12608199v1
Attorney Docket No. 2013237-1122 functional groups containing oxygen, halogens, nitrogen, and sulfur. If a fatty acid contains a double bond, there is the possibility of either a cis or trans geometric isomerism, which significantly affects the molecule's configuration. Cis-double bonds cause the fatty acid chain to bend, an effect that is compounded with more double bonds in the chain. Other major lipid classes in the fatty acid category are the fatty esters and fatty amides. [0370] Glycerolipids are composed of mono-, di-, and tri-substituted glycerols, the best-known being the fatty acid triesters of glycerol, called triglycerides. The word "triacylglycerol" is sometimes used synonymously with "triglyceride". In these compounds, the three hydroxyl groups of glycerol are each esterified, typically by different fatty acids. Additional subclasses of glycerolipids are represented by glycosylglycerols, which are characterized by the presence of one or more sugar residues attached to glycerol via a glycosidic linkage. [0371] Glycerophospholipids are amphipathic molecules (containing both hydrophobic and hydrophilic regions) that contain a glycerol core linked to two fatty acid-derived "tails" by ester linkages and to one "head" group by a phosphate ester linkage. Examples of glycerophospholipids, usually referred to as phospholipids (though sphingomyelins are also classified as phospholipids) are phosphatidylcholine (also known as PC, GPCho or lecithin), phosphatidylethanolamine (PE or GPEtn) and phosphatidylserine (PS or GPSer). [0372] Sphingolipids are members of a complex family of compounds that share a common structural feature, a sphingoid base backbone. The major sphingoid base in mammals is commonly referred to as sphingosine. Ceramides (N-acyl-sphingoid bases) are a major subclass of sphingoid base derivatives with an amide-linked fatty acid. The fatty acids are typically saturated or mono-unsaturated with chain lengths from 16 to 26 carbon atoms. The major phosphosphingolipids of mammals are sphingomyelins (ceramide phosphocholines), whereas insects contain mainly ceramide phosphoethanolamines and fungi have phytoceramide phosphoinositols and mannose-containing headgroups. The glycosphingolipids are a diverse family of molecules composed of one or more sugar residues linked via a glycosidic bond to the sphingoid base. Examples of these are the simple and complex glycosphingolipids such as cerebrosides and gangliosides. Page 91 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0373] Sterols, such as cholesterol and its derivatives, or tocopherol and its derivatives, are important components of membrane lipids, along with the glycerophospholipids and sphingomyelins. [0374] Saccharolipids are compounds in which fatty acids are linked directly to a sugar backbone, forming structures that are compatible with membrane bilayers. In the saccharolipids, a monosaccharide substitutes for the glycerol backbone present in glycerolipids and glycerophospholipids. The most familiar saccharolipids are the acylated glucosamine precursors of the Lipid A component of the lipopolysaccharides in Gram-negative bacteria. Typical lipid A molecules are disaccharides of glucosamine, which are derivatized with as many as seven fatty- acyl chains. The minimal lipopolysaccharide required for growth in E. coli is Kdo2-Lipid A, a hexa-acylated disaccharide of glucosamine that is glycosylated with two 3-deoxy-D-manno- octulosonic acid (Kdo) residues. [0375] Polyketides are synthesized by polymerization of acetyl and propionyl subunits by classic enzymes as well as iterative and multimodular enzymes that share mechanistic features with the fatty acid synthases. They comprise a large number of secondary metabolites and natural products from animal, plant, bacterial, fungal and marine sources, and have great structural diversity. Many polyketides are cyclic molecules whose backbones are often further modified by glycosylation, methylation, hydroxylation, oxidation, or other processes. [0376] Lipids and lipid-like materials may be cationic, anionic or neutral. Neutral lipids or lipid- like materials exist in an uncharged or neutral zwitterionic form at a selected pH. [0377] In some embodiments, suitable lipids or lipid-like materials for use in the present disclosure include those described in WO2020/128031 and US20200163878, the entire contents of each of which are incorporated herein by reference for the purposes described herein. 2. Cationic or cationically ionizable lipids or lipid-like materials [0378] In some embodiments cationic or cationically ionizable lipids or lipid-like materials contemplated for use herein include any cationic or cationically ionizable lipids or lipid-like materials which are able to electrostatically bind nucleic acid. In one embodiment, cationic or cationically ionizable lipids or lipid-like materials contemplated for use herein can be associated Page 92 of 214 12608199v1
Attorney Docket No. 2013237-1122 with nucleic acid, e.g. by forming complexes with the nucleic acid or forming vesicles in which the nucleic acid is enclosed or encapsulated. [0379] Cationic lipids or lipid-like materials are characterized in that they have a net positive charge (e.g., at a relevant pH). Cationic lipids or lipid-like materials bind negatively charged nucleic acid by electrostatic interaction. Generally, cationic lipids possess a lipophilic moiety, such as a sterol, an acyl chain, a diacyl or more acyl chains, and the head group of the lipid typically carries the positive charge. [0380] In certain embodiments, a cationic lipid or lipid-like material has a net positive charge only at certain pH, in particular acidic pH, while it has preferably no net positive charge, preferably has no charge, i.e., it is neutral, at a different, preferably higher pH such as physiological pH. This ionizable behavior is thought to enhance efficacy through helping with endosomal escape and reducing toxicity as compared with particles that remain cationic at physiological pH. [0381] In some embodiments, a cationic or cationically ionizable lipid or lipid-like material comprises a head group which includes at least one nitrogen atom (N) which is positive charged or capable of being protonated. [0382] Examples of cationic lipids include, but are not limited to 1,2-dioleoyl-3- trimethylammonium propane (DOTAP); N,N-dimethyl-2,3-dioleyloxypropylamine (DODMA), 1,2-di-O-octadecenyl-3-trimethylammonium propane (DOTMA), 3-(N—(N′,N′- dimethylaminoethane)-carbamoyl)cholesterol (DC-Chol), dimethyldioctadecylammonium (DDAB); 1,2-dioleoyl-3-dimethylammonium-propane (DODAP); 1,2-diacyloxy-3- dimethylammonium propanes; 1,2-dialkyloxy-3-dimethylammonium propanes; dioctadecyldimethyl ammonium chloride (DODAC), 1,2-distearyloxy-N,N-dimethyl-3- aminopropane (DSDMA), 2,3-di(tetradecoxy)propyl-(2-hydroxyethyl)-dimethylazanium (DMRIE), 1,2-dimyristoyl-sn-glycero-3-ethylphosphocholine (DMEPC), l,2-dimyristoyl-3- trimethylammonium propane (DMTAP), 1,2-dioleyloxypropyl-3-dimethyl-hydroxyethyl ammonium bromide (DORIE), and 2,3-dioleoyloxy- N-[2(spermine carboxamide)ethyl]-N,N- dimethyl-l-propanamium trifluoroacetate (DOSPA), 1,2-dilinoleyloxy-N,N- dimethylaminopropane (DLinDMA), 1,2-dilinolenyloxy-N,N-dimethylaminopropane (DLenDMA), dioctadecylamidoglycyl spermine (DOGS), 3-dimethylamino-2-(cholest-5-en-3- Page 93 of 214 12608199v1
Attorney Docket No. 2013237-1122 beta-oxybutan-4-oxy)-1-(cis,cis-9,12-oc-tadecadienoxy)propane (CLinDMA), 2-[5′-(cholest-5- en-3-beta-oxy)-3′-oxapentoxy)-3-dimethyl-1-(cis,cis-9′,12′-octadecadienoxy)propane (CpLinDMA), N,N-dimethyl-3,4-dioleyloxybenzylamine (DMOBA), 1,2-N,N′-dioleylcarbamyl- 3-dimethylaminopropane (DOcarbDAP), 2,3-Dilinoleoyloxy-N,N-dimethylpropylamine (DLinDAP), 1,2-N,N′-Dilinoleylcarbamyl-3-dimethylaminopropane (DLincarbDAP), 1,2- Dilinoleoylcarbamyl-3-dimethylaminopropane (DLinCDAP), 2,2-dilinoleyl-4- dimethylaminomethyl-[1,3]-dioxolane (DLin-K-DMA), 2,2-dilinoleyl-4-dimethylaminoethyl- [1,3]-dioxolane (DLin-K-XTC2-DMA), 2,2-dilinoleyl-4-(2-dimethylaminoethyl)-[1,3]-dioxolane (DLin-KC2-DMA), heptatriaconta-6,9,28,31-tetraen-19-yl-4-(dimethylamino)butanoate (DLin- MC3-DMA), N-(2-Hydroxyethyl)-N,N-dimethyl-2,3-bis(tetradecyloxy)-1-propanaminium bromide (DMRIE), (±)-N-(3-aminopropyl)-N,N-dimethyl-2,3-bis(cis-9-tetradecenyloxy)-1- propanaminium bromide (GAP-DMORIE), (±)-N-(3-aminopropyl)-N,N-dimethyl-2,3- bis(dodecyloxy)-1-propanaminium bromide (GAP-DLRIE), (±)-N-(3-aminopropyl)-N,N- dimethyl-2,3-bis(tetradecyloxy)-1-propanaminium bromide (GAP-DMRIE), N-(2-Aminoethyl)- N,N-dimethyl-2,3-bis(tetradecyloxy)-1-propanaminium bromide (βAE-DMRIE), N-(4- carboxybenzyl)-N,N-dimethyl-2,3-bis(oleoyloxy)propan-1-aminium (DOBAQ), 2-({8-[(3β)- cholest-5-en-3-yloxy]octyl}oxy)-N,N-dimethyl-3-[(9Z,12Z)-octadeca-9,12-dien-1-yloxy]propan- 1-amine (Octyl-CLinDMA), 1,2-dimyristoyl-3-dimethylammonium-propane (DMDAP), 1,2- dipalmitoyl-3-dimethylammonium-propane (DPDAP), N1-[2-((1S)-1-[(3-aminopropyl)amino]- 4-[di(3-amino-propyl)amino]butylcarboxamido)ethyl]-3,4-di[oleyloxy]-benzamide (MVL5), 1,2- dioleoyl-sn-glycero-3-ethylphosphocholine (DOEPC), 2,3-bis(dodecyloxy)-N-(2-hydroxyethyl)- N,N-dimethylpropan-1-amonium bromide (DLRIE), N-(2-aminoethyl)-N,N-dimethyl-2,3- bis(tetradecyloxy)propan-1-aminium bromide (DMORIE), di((Z)-non-2-en-1-yl) 8,8'- ((((2(dimethylamino)ethyl)thio)carbonyl)azanediyl)dioctanoate (ATX), N,N-dimethyl-2,3- bis(dodecyloxy)propan-1-amine (DLDMA), N,N-dimethyl-2,3-bis(tetradecyloxy)propan-1- amine (DMDMA), Di((Z)-non-2-en-1-yl)-9-((4-(dimethylaminobutanoyl)oxy)heptadecanedioate (L319), N-Dodecyl-3-((2-dodecylcarbamoyl-ethyl)-{2-[(2-dodecylcarbamoyl-ethyl)-2-{(2- dodecylcarbamoyl-ethyl)-[2-(2-dodecylcarbamoyl-ethylamino)-ethyl]-amino}- ethylamino)propionamide (lipidoid 98N12-5), 1-[2-[bis(2-hydroxydodecyl)amino]ethyl-[2-[4-[2- [bis(2 hydroxydodecyl)amino]ethyl]piperazin-1-yl]ethyl]amino]dodecan-2-ol (lipidoid C12- 200), LIPOFECTIN® (commercially available cationic liposomes comprising DOTMA and 1 ,2- Page 94 of 214 12608199v1
Attorney Docket No. 2013237-1122 dioleoyl-sn-3phosphoethanolamine (DOPE), from GIBCO/BRL, Grand Island, N.Y.); LIPOFECTAMINE® (commercially available cationic liposomes comprising N-(1 - (2,3dioleyloxy)propyl)-N-(2-(sperminecarboxamido)ethyl)-N,N-dimethylammonium trifluoroacetate (DOSPA) and (DOPE), from GIBCO/BRL); and TRANSFECTAM® (commercially available cationic lipids comprising dioctadecylamidoglycyl carboxyspermine (DOGS) in ethanol from Promega Corp., Madison, Wis.) or any combination of any of the foregoing. Further suitable cationic lipids for use in the present disclosure include those described in WO2020/128031 and US20200163878, the entire contents of each of which are incorporated herein by reference for the purposes described herein. Further suitable cationic lipids for use in the present disclosure include those described in WO2010/053572 (including Cl 2-200 described at paragraph [00225]) and WO2012/170930, both of which are incorporated herein by reference for the purposes described herein. Additional suitable cationic lipids for use in the present disclosure include HGT4003, HGT5000, HGTS001, HGT5001, HGT5002 (see US20150140070A1). [0383] In some embodiments, formulations that are useful for pharmaceutical compositions (e.g., immunogenic compositions, e.g., vaccines) compositions as described herein can comprise at least one cationic lipid. Representative cationic lipids include, but are not limited to, 1 ,2- dilinoleyoxy-3-(dimethylamino)acetoxypropane (DLin-DAC), 1 ,2-dilinoleyoxy- 3morpholinopropane (DLin-MA), 1,2-dilinoleoyl-3-dimethylaminopropane (DLinDAP), 1 ,2- dilinoleylthio-3-dimethylaminopropane (DLin-S-DMA), 1 -linoleoyl-2-linoleyloxy- 3dimethylaminopropane (DLin-2-DMAP), 1 ,2-dilinoleyloxy-3-trimethylaminopropane chloride salt (DLin-TMA.CI), 1 ,2-dilinoleoyl-3-trimethylaminopropane chloride salt (DLin-TAP.CI), 1 ,2-dilinoleyloxy-3-(N-methylpiperazino)propane (DLin-MPZ), 3-(N,Ndilinoleylamino)-1 ,2- propanediol (DLinAP), 3-(N,N-dioleylamino)-1 ,2-propanediol (DOAP), 1 ,2-dilinoleyloxo-3-(2- N,N-dimethylamino)ethoxypropane (DLin-EG-DMA), and 2,2-dilinoleyl-4- dimethylaminomethyl-[1 ,3]-dioxolane (DLin-K-DMA), 2,2-dilinoleyl-4-(2- dimethylaminoethyl)-[1 ,3]-dioxolane (DLin-KC2-DMA); dilinoleyl-methyl-4- dimethylaminobutyrate (DLin-MC3-DMA); MC3 (US20100324120). [0384] In some embodiments, amino or cationic lipids useful in accordance with the present disclosure have at least one protonatable or deprotonatable group, such that the lipid is positively Page 95 of 214 12608199v1
Attorney Docket No. 2013237-1122 charged at a pH at or below physiological pH (e.g. pH 7.4), and neutral at a second pH, preferably at or above physiological pH. It will, of course, be understood that the addition or removal of protons as a function of pH is an equilibrium process, and that the reference to a charged or a neutral lipid refers to the nature of the predominant species and does not require that all of lipids have to be present in the charged or neutral form. Lipids having more than one protonatable or deprotonatable group, or which are zwitterionic, are not excluded and may likewise suitable in the context of the present invention. [0385] In some embodiments, a protonatable lipid has a pKa of the protonatable group in the range of about 4 to about 11, e.g., a pKa of about 5 to about 7. [0386] In some embodiments, a cationic lipid may comprise from about 10 mol % to about 100 mol %, about 20 mol % to about 100 mol %, about 30 mol % to about 100 mol %, about 40 mol % to about 100 mol %, or about 50 mol % to about 100 mol % of total lipid present in a lipid composition utilized in accordance with the present disclosure. 3. Additional lipids or lipid-like materials [0387] In some embodiments, formulations utilized in accordance with the present disclosure may comprise lipids or lipid-like materials other than cationic or cationically ionizable lipids or lipid-like materials, i.e., non-cationic lipids or lipid-like materials (including non-cationically ionizable lipids or lipid-like materials). Collectively, anionic and neutral lipids or lipid-like materials are referred to herein as non-cationic lipids or lipid-like materials. In some embodiments, optimizing a formulation of nucleic acid particles by addition of other hydrophobic moieties, such as cholesterol and lipids, in addition to an ionizable/cationic lipid or lipid-like material may, for example, enhance particle stability and efficacy of nucleic acid delivery. [0388] In some embodiments, a lipid or lipid-like material may be incorporated which may or may not affect the overall charge of particles. In certain embodiments, such lipid or lipid-like material is a non-cationic lipid or lipid-like material. [0389] In some embodiments, a non-cationic lipid may comprise, e.g., one or more anionic lipids and/or neutral lipids. An "anionic lipid" is negatively charged (e.g., at a selected pH). Page 96 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0390] A "neutral lipid" exists either in an uncharged or neutral zwitterionic form (e.g., at a selected pH). In some embodiments, a formulation comprises one of the following neutral lipid components: (1) a phospholipid, (2) cholesterol or a derivative thereof; or (3) a mixture of a phospholipid and cholesterol or a derivative thereof. Examples of cholesterol derivatives include, but are not limited to, cholestanol, cholestanone, cholestenone, coprostanol, cholesteryl- 2'-hydroxyethyl ether, cholesteryl-4'- hydroxybutyl ether, tocopherol and derivatives thereof, and mixtures thereof. [0391] Specific exemplary phospholipids that can be used include, but are not limited to, phosphatidylcholines, phosphatidylethanolamines, phosphatidylglycerols, phosphatidic acids, phosphatidylserines or sphingomyelin. Such phospholipids include in particular diacylphosphatidylcholines, such as distearoylphosphatidylcholine (DSPC), dioleoylphosphatidylcholine (DOPC), dimyristoylphosphatidylcholine (DMPC), dipentadecanoylphosphatidylcholine, dilauroylphosphatidylcholine, dipalmitoylphosphatidylcholine (DPPC), diarachidoylphosphatidylcholine (DAPC), dibehenoylphosphatidylcholine (DBPC), ditricosanoylphosphatidylcholine (DTPC), dilignoceroylphatidylcholine (DLPC), palmitoyloleoyl-phosphatidylcholine (POPC), 1,2-di-O- octadecenyl-sn-glycero-3-phosphocholine (18:0 Diether PC), 1-oleoyl-2- cholesterylhemisuccinoyl-sn-glycero-3-phosphocholine (OChemsPC), 1-hexadecyl-sn-glycero- 3-phosphocholine (C16 Lyso PC) and phosphatidylethanolamines, in particular diacylphosphatidylethanolamines, such as dioleoylphosphatidylethanolamine (DOPE), distearoyl-phosphatidylethanolamine (DSPE), dipalmitoyl-phosphatidylethanolamine (DPPE), dimyristoyl-phosphatidylethanolamine (DMPE), dilauroyl-phosphatidylethanolamine (DLPE), diphytanoyl-phosphatidylethanolamine (DPyPE), and further phosphatidylethanolamine lipids with different hydrophobic chains. [0392] In certain embodiments, a formulation utilized in accordance with the present disclosure includes DSPC or DSPC and cholesterol. [0393] In certain embodiments, formulations utilized in accordance with the present disclosure include both a cationic lipid and an additional (non-cationic) lipid. Page 97 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0394] In some embodiments, formulations herein include a polymer conjugated lipid such as a pegylated lipid. "Pegylated lipids" comprise both a lipid portion and a polyethylene glycol portion. Pegylated lipids are known in the art. [0395] Without wishing to be bound by theory, the amount of (total) cationic lipid compared to the amount of other lipid(s) in formulation may affect important characteristics, such as charge, particle size, stability, tissue selectivity, and bioactivity of the nucleic acid. In some embodiments, the molar ratio of the at least one cationic lipid to the at least one additional lipid is from about 10:0 to about 1:9, about 4:1 to about 1:2, or about 3:1 to about 1:1. [0396] In some embodiments, a non-cationic lipid, in particular a neutral lipid, (e.g., one or more phospholipids and/or cholesterol) may comprise from about 0 mol % to about 90 mol %, from about 0 mol % to about 80 mol %, from about 0 mol % to about 70 mol %, from about 0 mol % to about 60 mol %, or from about 0 mol % to about 50 mol %, of the total lipid present in a formulation. 4. Lipoplex Particles [0397] In certain embodiments of the present disclosure, the RNA described herein may be present in RNA lipoplex particles. [0398] An "RNA lipoplex particle" contains lipid, in particular cationic lipid, and RNA. Electrostatic interactions between positively charged liposomes and negatively charged RNA results in complexation and spontaneous formation of RNA lipoplex particles. Positively charged liposomes may be generally synthesized using a cationic lipid, such as DOTMA, and additional lipids, such as DOPE. In one embodiment, a RNA lipoplex particle is a nanoparticle. [0399] In certain embodiments, RNA lipoplex particles include both a cationic lipid and an additional lipid. In an exemplary embodiment, the cationic lipid is DOTMA and the additional lipid is DOPE. [0400] In some embodiments, the molar ratio of the at least one cationic lipid to the at least one additional lipid is from about 10:0 to about 1:9, about 4:1 to about 1:2, or about 3:1 to about 1:1. In specific embodiments, the molar ratio may be about 3:1, about 2.75:1, about 2.5:1, about 2.25:1, about 2:1, about 1.75:1, about 1.5:1, about 1.25:1, or about 1:1. In an exemplary Page 98 of 214 12608199v1
Attorney Docket No. 2013237-1122 embodiment, the molar ratio of the at least one cationic lipid to the at least one additional lipid is about 2:1. [0401] In some embodiments, RNA lipoplex particles have an average diameter that in one embodiment ranges from about 200 nm to about 1000 nm, from about 200 nm to about 800 nm, from about 250 to about 700 nm, from about 400 to about 600 nm, from about 300 nm to about 500 nm, or from about 350 nm to about 400 nm. In specific embodiments, the RNA lipoplex particles have an average diameter of about 200 nm, about 225 nm, about 250 nm, about 275 nm, about 300 nm, about 325 nm, about 350 nm, about 375 nm, about 400 nm, about 425 nm, about 450 nm, about 475 nm, about 500 nm, about 525 nm, about 550 nm, about 575 nm, about 600 nm, about 625 nm, about 650 nm, about 700 nm, about 725 nm, about 750 nm, about 775 nm, about 800 nm, about 825 nm, about 850 nm, about 875 nm, about 900 nm, about 925 nm, about 950 nm, about 975 nm, or about 1000 nm. In an embodiment, the RNA lipoplex particles have an average diameter that ranges from about 250 nm to about 700 nm. In another embodiment, the RNA lipoplex particles have an average diameter that ranges from about 300 nm to about 500 nm. In an exemplary embodiment, the RNA lipoplex particles have an average diameter of about 400 nm. [0402] RNA lipoplex particles and compositions comprising RNA lipoplex particles described herein are useful for delivery of RNA to a target tissue after parenteral administration, in particular after intravenous administration. The RNA lipoplex particles may be prepared using liposomes that may be obtained by injecting a solution of the lipids in ethanol into water or a suitable aqueous phase. In one embodiment, the aqueous phase has an acidic pH. In one embodiment, the aqueous phase comprises acetic acid, e.g., in an amount of about 5 mM. Liposomes may be used for preparing RNA lipoplex particles by mixing the liposomes with RNA. In one embodiment, the liposomes and RNA lipoplex particles comprise at least one cationic lipid and at least one additional lipid. In one embodiment, the at least one cationic lipid comprises 1,2-di-O-octadecenyl-3-trimethylammonium propane (DOTMA) and/or 1,2-dioleoyl- 3-trimethylammonium-propane (DOTAP). In one embodiment, the at least one additional lipid comprises 1,2-di-(9Z-octadecenoyl)-sn-glycero-3-phosphoethanolamine (DOPE), cholesterol (Chol) and/or 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC). In one embodiment, the at least one cationic lipid comprises 1,2-di-O-octadecenyl-3-trimethylammonium propane Page 99 of 214 12608199v1
Attorney Docket No. 2013237-1122 (DOTMA) and the at least one additional lipid comprises 1,2-di-(9Z-octadecenoyl)-sn-glycero-3- phosphoethanolamine (DOPE). In one embodiment, the liposomes and RNA lipoplex particles comprise 1,2-di-O-octadecenyl-3-trimethylammonium propane (DOTMA) and 1,2-di-(9Z- octadecenoyl)-sn-glycero-3-phosphoethanolamine (DOPE). [0403] Spleen targeting RNA lipoplex particles are described in WO 2013/143683, herein incorporated by reference. It has been found that RNA lipoplex particles having a net negative charge may be used to preferentially target spleen tissue or spleen cells such as antigen- presenting cells, in particular dendritic cells. Accordingly, following administration of the RNA lipoplex particles, RNA accumulation and/or RNA expression in the spleen occurs. Thus, RNA lipoplex particles of the disclosure may be used for expressing RNA in the spleen. In an embodiment, after administration of the RNA lipoplex particles, no or essentially no RNA accumulation and/or RNA expression in the lung and/or liver occurs. In one embodiment, after administration of the RNA lipoplex particles, RNA accumulation and/or RNA expression in antigen presenting cells, such as professional antigen presenting cells in the spleen occurs. Thus, RNA lipoplex particles of the disclosure may be used for expressing RNA in such antigen presenting cells. In one embodiment, the antigen presenting cells are dendritic cells and/or macrophages. 5. Lipid Nanoparticles (LNPs) [0404] In some embodiments, nucleic acid such as RNA described herein is administered in the form of lipid nanoparticles (LNPs). In some embodiments, LNPs may comprise any lipid capable of forming a particle to which the one or more nucleic acid molecules are attached, or in which the one or more nucleic acid molecules are encapsulated. [0405] In some embodiments, an LNP comprises one or more cationic lipids, and one or more stabilizing lipids. Stabilizing lipids include neutral lipids and pegylated lipids. [0406] In some embodiments, an LNP comprises a cationic lipid, a neutral lipid, a sterol, a polymer conjugated lipid; and an RNA, encapsulated within or associated with the lipid nanoparticle. [0407] In some embodiments, a neutral lipid is selected from the group consisting of DSPC, DPPC, DMPC, DOPC, POPC, DOPE, DOPG, DPPG, POPE, DPPE, DMPE, DSPE, and SM. In Page 100 of 214 12608199v1
Attorney Docket No. 2013237-1122 some embodiments, the neutral lipid is selected from the group consisting of DSPC, DPPC, DMPC, DOPC, POPC, DOPE and SM. In some embodiments, the neutral lipid is DSPC. [0408] In some embodiments, a sterol is cholesterol. [0409] In some embodiments, a polymer conjugated lipid is a pegylated lipid. In some embodiments, a pegylated lipid has the following structure: or a pharmaceutically
wherein: R12 and R13 are each independently a straight or branched, saturated or unsaturated alkyl chain containing from 10 to 30 carbon atoms, wherein the alkyl chain is optionally interrupted by one or more ester bonds; and w has a mean value ranging from 30 to 60. In some embodiments, R12 and R13 are each independently straight, saturated alkyl chains containing from 12 to 16 carbon atoms. In some embodiments, w has a mean value ranging from 40 to 55. In some embodiments, the average w is about 45. In some embodiments, R12 and R13 are each independently a straight, saturated alkyl chain containing about 14 carbon atoms, and w has a mean value of about 45. [0410] In some embodiments, a pegylated lipid is DMG-PEG 2000, e.g., having the following structure: .
[0411] some a Formula (III): Page 101 of 214 12608199v1
Attorney Docket No. 2013237-1122 or a pharmaceutically acceptable thereof, wherein:
one of L1 or L2 is –O(C=O)-, -(C=O)O-, -C(=O)-, -O-, -S(O)x-, -S-S-, -C(=O)S-, SC(=O)-, - NRaC(=O)-, -C(=O)NRa-, NRaC(=O)NRa-, -OC(=O)NRa- or -NRaC(=O)O-, and the other of L1 or L2 is –O(C=O)-, -(C=O)O-, -C(=O)-, -O-, -S(O)x-, -S-S-, -C(=O)S-, SC(=O)-, -NRaC(=O)-, - C(=O)NRa-, NRaC(=O)NRa-, -OC(=O)NRa- or -NRaC(=O)O- or a direct bond; G1 and G2 are each independently unsubstituted C1-C12 alkylene or C1-C12 alkenylene; G3 is C1-C24 alkylene, C1-C24 alkenylene, C3-C8 cycloalkylene, C3-C8 cycloalkenylene; Ra is H or C1-C12 alkyl; R1 and R2 are each independently C6-C24 alkyl or C6-C24 alkenyl; R3 is H, OR5, CN, -C(=O)OR4, -OC(=O)R4 or –NR5C(=O)R4; R4 is C1-C12 alkyl; R5 is H or C1-C6 alkyl; and x is 0, 1 or 2. [0412] In some of the foregoing embodiments of Formula (III), the lipid has one of the following structures (IIIA) or (IIIB):
wherein: A is a 3 to 8-membered cycloalkyl or cycloalkylene ring; R6 is, at each occurrence, independently H, OH or C1-C24 alkyl; Page 102 of 214 12608199v1
Attorney Docket No. 2013237-1122 n is an integer ranging from 1 to 15. [0413] In some of the foregoing embodiments of Formula (III), the lipid has structure (IIIA), and in other embodiments, the lipid has structure (IIIB). [0414] In other embodiments of Formula (III), the lipid has one of the following structures (IIIC) or (IIID): R3 R6
wherein y and z are each independently integers ranging from 1 to 12. [0415] In any of the foregoing embodiments of Formula (III), one of L1 or L2 is -O(C=O)-. For example, in some embodiments each of L1 and L2 are -O(C=O)-. In some different embodiments of any of the foregoing, L1 and L2 are each independently -(C=O)O- or -O(C=O)-. For example, in some embodiments each of L1 and L2 is -(C=O)O-. [0416] In some different embodiments of Formula (III), the lipid has one of the following structures (IIIE) or (IIIF): .
[0417] In some of the foregoing embodiments of Formula (III), the lipid has one of the following structures (IIIG), (IIIH), (IIII), or (IIIJ): Page 103 of 214 12608199v1
Attorney Docket No. 2013237-1122 ; . [0418] In
from 2 to 12, for example from 2 to 8 or from 2 to 4. For example, in some embodiments, n is 3, 4, 5 or 6. In some embodiments, n is 3. In some embodiments, n is 4. In some embodiments, n is 5. In some embodiments, n is 6. [0419] In some other of the foregoing embodiments of Formula (III), y and z are each independently an integer ranging from 2 to 10. For example, in some embodiments, y and z are each independently an integer ranging from 4 to 9 or from 4 to 6. [0420] In some of the foregoing embodiments of Formula (III), R6 is H. In other of the foregoing embodiments, R6 is C1-C24 alkyl. In other embodiments, R6 is OH. [0421] In some embodiments of Formula (III), G3 is unsubstituted. In other embodiments, G3 is substituted. In various different embodiments, G3 is linear C1-C24 alkylene or linear C1-C24 alkenylene. [0422] In some other foregoing embodiments of Formula (III), R1 or R2, or both, is C6-C24 alkenyl. For example, in some embodiments, R1 and R2 each, independently have the following structure: Page 104 of 214 12608199v1
Attorney Docket No. 2013237-1122 , wherein: R7a and R7b are, at each occurrence, independently H or C1-C12 alkyl; and a is an integer from 2 to 12, wherein R7a, R7b and a are each selected such that R1 and R2 each independently comprise from 6 to 20 carbon atoms. For example, in some embodiments a is an integer ranging from 5 to 9 or from 8 to 12. [0423] In some of the foregoing embodiments of Formula (III), at least one occurrence of R7a is H. For example, in some embodiments, R7a is H at each occurrence. In other different embodiments of the foregoing, at least one occurrence of R7b is C1-C8 alkyl. For example, in some embodiments, C1-C8 alkyl is methyl, ethyl, n-propyl, iso-propyl, n-butyl, iso-butyl, tert- butyl, n-hexyl or n-octyl. [0424] In different embodiments of Formula (III), R1 or R2, or both, has one of the following structures: ;
, CN, -C(=O)OR4, -OC(=O)R4 or –NHC(=O)R4. In some embodiments, R4 is methyl or ethyl. [0426] In various different embodiments, the cationic lipid of Formula (III) has one of the structures set forth in in Table 5 below. Page 105 of 214 12608199v1
Attorney Docket No. 2013237-1122 Table 5: Exemplary Compounds of Formula (III). No. Structure III-1 H O N O O III-2 O O III-3 III-4 III-5 III-6 Page 106 of 214 12608199v1
Attorney Docket No. 2013237-1122 No. Structure III-7 III-8 III-9 III-10 III-11 III-12 Page 107 of 214 12608199v1
Attorney Docket No. 2013237-1122 No. Structure III-13 III-14 III-15 III-16 III-17 III-18 III-19 Page 108 of 214 12608199v1
Attorney Docket No. 2013237-1122 No. Structure III-20 III-21 III-22 III-23 III-24 III-25 Page 109 of 214 12608199v1
Attorney Docket No. 2013237-1122 No. Structure III-26 III-27 III-28 III-29 O H O N OH O III-30 O O III-31 Page 110 of 214 12608199v1
Attorney Docket No. 2013237-1122 No. Structure III-32 III-33 III-34 III-35 III-36 [0427] In various different embodiments, a cationic lipid has one of the structures set forth in Table 6 below. Page 111 of 214 12608199v1
Attorney Docket No. 2013237-1122 Table 6: Exemplary Cationic Lipid Structures. No. Structure O A N O H O O N O B O O O H O N O C O O H O O N O D O O O O H O E N O O O H O N F O [0428] In some embodiments, an LNP comprises a cationic lipid that is an ionizable lipid-like material (lipidoid). In some embodiments, a cationic lipid has the following structure: .
Page 112 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0429] In some embodiments, lipid nanoparticles can have an average size (e.g., mean diameter) of about 30 nm to about 150 nm, about 40 nm to about 150 nm, about 50 nm to about 150 nm, about 60 nm to about 130 nm, about 70 nm to about 110 nm, about 70 nm to about 100 nm, about 70 to about 90 nm, or about 70 nm to about 80 nm. In some embodiments, lipid nanoparticles in accordance with the present disclosure can have an average size (e.g., mean diameter) of about 50 nm to about 100 nm. In some embodiments, lipid nanoparticles may have an average size (e.g., mean diameter) of about 50 nm to about 150 nm. In some embodiments, lipid nanoparticles may have an average size (e.g., mean diameter) of about 60 nm to about 120 nm. In some embodiments, lipid nanoparticles in accordance with the present disclosure can have an average size (e.g., mean diameter) of about 30 nm, 35 nm, 40 nm, 45 nm, 50 nm, 55 nm, 60 nm, 65 nm, 70 nm, 75 nm, 80 nm, 85 nm, 90 nm, 95 nm, 100 nm, 105 nm, 110 nm, 115 nm, 120 nm, 125 nm, 130 nm, 135 nm, 140 nm, 145 nm, or 150 nm. The term “average diameter” or “mean diameter” refers to the mean hydrodynamic diameter of particles as measured by dynamic laser light scattering (DLS) with data analysis using the so-called cumulant algorithm, which provides as results the so-called Z-average with the dimension of a length, and the polydispersity index (PI), which is dimensionless (Koppel, D., J. Chem. Phys. 57, 1972, pp 4814-4820, ISO 13321, which is herein incorporated by reference). Here “average diameter,” “mean diameter,” “diameter,” or “size” for particles is used synonymously with this value of the Z-average. [0430] In some embodiments, lipid nanoparticles described herein may exhibit a polydispersity index less than about 0.5, less than about 0.4, less than about 0.3, or about 0.2 or less. By way of example, lipid nanoparticles can exhibit a polydispersity index in a range of about 0.1 to about 0.3 or about 0.2 to about 0.3. The “polydispersity index” is preferably calculated based on dynamic light scattering measurements by the so-called cumulant analysis as mentioned in the definition of the “average diameter.” Under certain prerequisites, it can be taken as a measure of the size distribution of an ensemble of ribonucleic acid nanoparticles (e.g., ribonucleic acid nanoparticles). [0431] Lipid nanoparticles described herein can be characterized by an “N/P ratio,” which is the molar ratio of cationic (nitrogen) groups (the “N” in N/P) in the cationic polymer to the anionic (phosphate) groups (the “P” in N/P) in RNA. It is understood that a cationic group is one that is either in cationic form (e.g., N+), or one that is ionizable to become cationic. Use of a single Page 113 of 214 12608199v1
Attorney Docket No. 2013237-1122 number in an N/P ratio (e.g., an N/P ratio of about 5) is intended to refer to that number over 1, e.g., an N/P ratio of about 5 is intended to mean 5:1. In some embodiments, a lipid nanoparticle described herein has an N/P ratio greater than or equal to 5. In some embodiments, a lipid nanoparticle described herein has an N/P ratio that is about 5, 6, 7, 8, 9, or 10. In some embodiments, an N/P ratio for a lipid nanoparticle described herein is from about 10 to about 50. In some embodiments, an N/P ratio for a lipid nanoparticle described herein is from about 10 to about 70. In some embodiments, an N/P ratio for a lipid nanoparticle described herein is from about 10 to about 120. B. Exemplary Methods of Making Lipid Nanoparticles [0432] Lipids and lipid nanoparticles comprising nucleic acids and their method of preparation are known in the art, including, e.g., as described in U.S. Patent Nos. 8,569,256, 5,965,542 and U.S. Patent Publication Nos. 2016/0199485, 2016/0009637, 2015/0273068, 2015/0265708, 2015/0203446, 2015/0005363, 2014/0308304, 2014/0200257, 2013/086373, 2013/0338210, 2013/0323269, 2013/0245107, 2013/0195920, 2013/0123338, 2013/0022649, 2013/0017223, 2012/0295832, 2012/0183581, 2012/0172411, 2012/0027803, 2012/0058188, 2011/0311583, 2011/0311582, 2011/0262527, 2011/0216622, 2011/0117125, 2011/0091525, 2011/0076335, 2011/0060032, 2010/0130588, 2007/0042031, 2006/0240093, 2006/0083780, 2006/0008910, 2005/0175682, 2005/017054, 2005/0118253, 2005/0064595, 2004/0142025, 2007/0042031, 1999/009076 and PCT Pub. Nos. WO 99/39741, WO 2018/081480, WO 2017/004143, WO 2017/075531, WO 2015/199952, WO 2014/008334, WO 2013/086373, WO 2013/086322, WO 2013/016058, WO 2013/086373, W02011/141705, and WO 2001/07548, the full disclosures of which are herein incorporated by reference in their entirety for the purposes described herein. [0433] For example, in some embodiments, cationic lipids, neutral lipids (e.g., DSPC, and/or cholesterol) and polymer-conjugated lipids can be solubilized in ethanol at a pre-determined molar ratio (e.g., ones described herein). In some embodiments, lipid nanoparticles (lipid nanoparticle) are prepared at a total lipid to polyribonucleotides weight ratio of approximately 10: 1 to 30: 1. In some embodiments, such polyribonucleotides can be diluted to 0.2 mg/mL in acetate buffer. In some embodiments, preformed lipid nanoparticles (pre-LNPs) are prepared by mixing (i) lipids in an organic solvent with (ii) an acidified aqueous phase. In some embodiments, the resulting pre-LNPs are subjected to buffer exchange (e.g., to remove the Page 114 of 214 12608199v1
Attorney Docket No. 2013237-1122 organic solvent) prior to mixing with polyribonucleotide solutions, thereby forming pre-LNPs. In some embodiments, pre-LNPs are mixed with a polyribonucleotide solution to form polyribonucleotide lipid nanoparticles (RNA-LNPs). [0434] In some embodiments, using an ethanol injection technique, a colloidal lipid dispersion comprising polyribonucleotides can be formed as follows: an ethanol solution comprising lipids, such as cationic lipids, neutral lipids, and polymer-conjugated lipids, is injected into an aqueous solution comprising polyribonucleotides (e.g., ones described herein). [0435] In some embodiments, lipid and polyribonucleotide solutions can be mixed at room temperature by pumping each solution at controlled flow rates into a mixing unit, for example, using piston pumps. In some embodiments, the flow rates of a lipid solution and a RNA solution into a mixing unit are maintained at a ratio of 1:3. Upon mixing, nucleic acid-lipid particles are formed as the ethanolic lipid solution is diluted with aqueous polyribonucleotides. The lipid solubility is decreased, while cationic lipids bearing a positive charge interact with the negatively charged RNA. [0436] In some embodiments, a solution comprising RNA-encapsulated lipid nanoparticles can be processed by one or more of concentration adjustment, buffer exchange, formulation, and/or filtration. [0437] In some embodiments, RNA-encapsulated lipid nanoparticles can be processed through filtration. [0438] In some embodiments, particle size and/or internal structure of lipid nanoparticles (with or without RNAs) may be monitored by appropriate techniques such as, e.g., small-angle X-ray scattering (SAXS) and/or transmission electron cryomicroscopy (CryoTEM). IV. Pharmaceutical Compositions [0439] The present disclosure provides compositions, e.g., pharmaceutical compositions comprising one or more polyribonucleotides described herein. Pharmaceutical formulations may additionally comprise a pharmaceutically acceptable excipient, which, as used herein, includes any and all solvents, dispersion media, diluents, or other liquid vehicles, dispersion or suspension aids, surface active agents, isotonic agents, thickening or emulsifying agents, preservatives, solid binders, lubricants and the like, as suited to the particular dosage form desired. Remington's The Page 115 of 214 12608199v1
Attorney Docket No. 2013237-1122 Science and Practice of Pharmacy, 21st Edition, A. R. Gennaro (Lippincott, Williams & Wilkins, Baltimore, MD, 2006; incorporated herein by reference) discloses various excipients used in formulating pharmaceutical compositions and known techniques for the preparation thereof. Except insofar as any conventional excipient medium is incompatible with a substance or its derivatives, such as by producing any undesirable biological effect or otherwise interacting in a deleterious manner with any other component(s) of the pharmaceutical composition, its use is contemplated to be within the scope of this disclosure. [0440] In some embodiments, an excipient is approved for use in humans and for veterinary use. In some embodiments, an excipient is approved by the United States Food and Drug Administration. In some embodiments, an excipient is pharmaceutical grade. In some embodiments, an excipient meets the standards of the United States Pharmacopoeia (USP), the European Pharmacopoeia (EP), the British Pharmacopoeia, and/or the International Pharmacopoeia. [0441] Pharmaceutically acceptable excipients used in the manufacture of pharmaceutical compositions include, but are not limited to, inert diluents, dispersing and/or granulating agents, surface active agents and/or emulsifiers, disintegrating agents, binding agents, preservatives, buffering agents, lubricating agents, and/or oils. Such excipients may optionally be included in pharmaceutical formulations. Excipients such as cocoa butter and suppository waxes, coloring agents, coating agents, sweetening, flavoring, and/or perfuming agents can be present in the composition, according to the judgment of the formulator. [0442] General considerations in the formulation and/or manufacture of pharmaceutical agents may be found, for example, in Remington: The Science and Practice of Pharmacy 21st ed., Lippincott Williams & Wilkins, 2005 (incorporated herein by reference). [0443] In some embodiments, pharmaceutical compositions provided herein may be formulated with one or more pharmaceutically acceptable carriers or diluents as well as any other known adjuvants and excipients in accordance with conventional techniques such as those disclosed in Remington: The Science and Practice of Pharmacy 21st ed., Lippincott Williams & Wilkins, 2005 (incorporated herein by reference). [0444] Pharmaceutical compositions described herein can be administered by appropriate methods known in the art. As will be appreciated by a skilled artisan, the route and/or mode of Page 116 of 214 12608199v1
Attorney Docket No. 2013237-1122 administration may depend on a number of factors, including, e.g., but not limited to stability and/or pharmacokinetics and/or pharmacodynamics of pharmaceutical compositions described herein. [0445] In some embodiments, pharmaceutical compositions described herein are formulated for parenteral administration, which includes modes of administration other than enteral and topical administration, usually by injection, and includes, without limitation, intravenous, intramuscular, intraarterial, intradermal, subcutaneous, subcuticular, or intraarticular injection and infusion. In preferred embodiments, pharmaceutical compositions described herein are formulated for intravenous, intramuscular, or subcutaneous administration. [0446] In some embodiments, pharmaceutical compositions described herein are formulated for intravenous administration. In some embodiments, pharmaceutically acceptable excipients that may be useful for intravenous administration include sterile aqueous solutions or dispersions and sterile powders for preparation of sterile injectable solutions or dispersions. [0447] Therapeutic compositions typically must be sterile and stable under the conditions of manufacture and storage. The composition can be formulated as a solution, microemulsion, lipid nanoparticles, or other ordered structure suitable to high drug concentration. The carrier can be a solvent or dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyethylene glycol, and the like), and suitable mixtures thereof. Proper fluidity can be maintained, for example, by the use of surfactants. In many cases, it will be preferable to include isotonic agents, for example, sugars, polyalcohols such as mannitol, sorbitol, or sodium chloride in the composition. In some embodiments, prolonged absorption of the injectable compositions can be brought about by including in the composition an agent that delays absorption, for example, monostearate salts and gelatin. [0448] Sterile injectable solutions can be prepared by incorporating the active compound in the required amount in an appropriate solvent with one or a combination of ingredients enumerated above, as required, followed by sterilization and/or microfiltration. In some embodiments, pharmaceutical compositions can be prepared as described herein and/or methods known in the art. [0449] These compositions may also contain adjuvants such as preservatives, wetting agents, emulsifying agents and dispersing agents. Prevention of the presence of microorganisms may be Page 117 of 214 12608199v1
Attorney Docket No. 2013237-1122 ensured both by sterilization procedures, and by the inclusion of various antibacterial and antifungal agents, for example, paraben, chlorobutanol, phenol sorbic acid, and the like. It may also be desirable to include isotonic agents, such as sugars, sodium chloride, and the like into pharmaceutical compositions described herein. In addition, prolonged absorption of the injectable pharmaceutical form may be brought about by the inclusion of agents which delay absorption such as aluminum monostearate and gelatin. [0450] Formulations of pharmaceutical compositions described herein may be prepared by any method known or hereafter developed in the art of pharmacology. In general, such preparatory methods include the step of bringing active ingredient(s) into association with a diluent or another excipient and/or one or more other accessory ingredients, and then, if necessary and/or desirable, shaping and/or packaging the product into a desired single- or multi-dose unit. [0451] A pharmaceutical composition in accordance with the present disclosure may be prepared, packaged, and/or sold in bulk, as a single unit dose, and/or as a plurality of single unit doses. As used herein, a “unit dose” is discrete amount of the pharmaceutical composition comprising a predetermined amount of at least one RNA product produced using a system and/or method described herein. [0452] Relative amounts of polyribonucleotides encapsulated in lipid nanoparticles, a pharmaceutically acceptable excipient, and/or any additional ingredients in a pharmaceutical composition can vary, depending upon the subject to be treated, target cells, diseases or disorders, and may also further depend upon the route by which the composition is to be administered. [0453] In some embodiments, pharmaceutical compositions described herein are formulated into pharmaceutically acceptable dosage forms by conventional methods known to those of skill in the art. Actual dosage levels of the active ingredients (e.g., polyribonucleotides encapsulated in lipid nanoparticles) in the pharmaceutical compositions described herein may be varied so as to obtain an amount of the active ingredient which is effective to achieve the desired therapeutic response for a particular patient, composition, and mode of administration, without being toxic to the patient. The selected dosage level will depend upon a variety of pharmacokinetic factors including the activity of the particular compositions of the present disclosure employed, the route of administration, the time of administration, the rate of excretion of the particular compound Page 118 of 214 12608199v1
Attorney Docket No. 2013237-1122 being employed, the duration of the treatment, other drugs, compounds and/or materials used in combination with the particular compositions employed, the age, sex, weight, condition, general health and prior medical history of the patient being treated, and like factors well known in the medical arts. [0454] A physician having ordinary skill in the art can readily determine and prescribe the effective amount of the pharmaceutical composition required. For example, a physician could start doses of active ingredients (e.g., polyribonucleotides encapsulated in lipid nanoparticles) employed in the pharmaceutical composition at levels lower than that required in order to achieve the desired therapeutic effect and gradually increase the dosage until the desired effect is achieved. [0455] In some embodiments, a pharmaceutical composition is formulated (e.g., but not limited to, for intravenous, intramuscular, or subcutaneous administration) to deliver a dose of about 5 mg RNA/kg. [0456] In some embodiments, a pharmaceutical composition described herein may further comprise one or more additives, for example, in some embodiments that may enhance stability of such a composition under certain conditions. Examples of additives may include but are not limited to salts, buffer substances, preservatives, and carriers. For example, in some embodiments, a pharmaceutical composition may further comprise a cryoprotectant (e.g., sucrose) and/or an aqueous buffered solution, which may in some embodiments include one or more salts, including, e.g., alkali metal salts or alkaline earth metal salts such as, e.g., sodium salts, potassium salts, and/or calcium salts. [0457] In some embodiments, a pharmaceutical composition provided herein is a preservative- free, sterile RNA-lipid nanoparticle dispersion in an aqueous buffer for intravenous or intramuscular administration. [0458] Although the descriptions of pharmaceutical compositions provided herein are principally directed to pharmaceutical compositions that are suitable for administration to humans, it will be understood by the skilled artisan that such compositions are generally suitable for administration to animals of all sorts. Modification of pharmaceutical compositions suitable for administration to humans in order to render the compositions suitable for administration to various animals is Page 119 of 214 12608199v1
Attorney Docket No. 2013237-1122 well understood, and the ordinarily skilled veterinary pharmacologist can design and/or perform such modification with merely ordinary, if any, experimentation. V. Treatment Methods [0459] In some embodiments, one or more pharmaceutical compositions comprising one or more polyribonucleotides described herein can be taken up by target cells (e.g., dendritic cells) for translation of polyepitopic-encoding RNA(s) thereby, inducing CD4+ and/or CD8+ T cell immunity against one or more of the epitopes. Accordingly, another aspect of the present disclosure relates to methods of using pharmaceutical compositions described herein. For example, one aspect provided herein is a method comprising administering a provided pharmaceutical composition to a subject suffering from cancer. In some embodiments, a provided pharmaceutical composition is administered by intravenous injection or infusion. [0460] In some embodiments, technologies of the present disclosure may be administered to subjects according to a particular dosing regimen. In some embodiments, a dosing regimen may involve a single administration; in some embodiments, a dosing regimen may comprise one or more “booster” administrations after the initial administration. In some embodiments, initial and boost doses are the same amount; in some embodiments they differ. In some embodiments, two or more booster doses are administered. In some embodiments, a plurality of doses are administered at regular intervals. In some embodiments, periods of time between doses become longer. [0461] In some embodiments, administered pharmaceutical compositions (e.g., immunogenic compositions, e.g., vaccines) comprising RNA constructs that encode polyepitopic vaccine constructs are administered in RNA doses of from about 0.1 µg to about 300 µg, about 0.5 µg to about 200 µg, or about 1 µg to about 100 µg, such as about 1 µg, about 3 µg, about 10 µg, about 30 µg, about 50 µg, or about 100 µg. In some embodiments, an saRNA construct is administered at a lower dose (e.g., 2, 4, 5, 10 fold or more lower) than a modRNA or uRNA construct. [0462] In some embodiments, a first booster dose is administered within about six months of the initial dose, and preferably within about 5, 4, 3, 2, or 1 months. In some embodiments, a first booster dose is administered in a time period that begins about 1, 2, 3, or 4 weeks after the first dose, and ends about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 weeks of the first dose (e.g., between about 1 Page 120 of 214 12608199v1
Attorney Docket No. 2013237-1122 and about 12 weeks after the first dose, or between about 2 or 3 weeks and about 5 and 6 weeks after the first dose, or about 3 weeks or about 4 weeks after the first dose). [0463] In some embodiments, a plurality of booster doses (e.g., 2, 3, or 4) doses are administered within 6 months of the first dose, or within 12 months of the first dose. [0464] In some embodiments, an RNA dose is about 60 µg or lower, 50 µg or lower, 40 µg or lower, 30 µg or lower, 20 µg or lower, 10 µg or lower, 5 µg or lower, 2.5 µg or lower, or 1 µg or lower. In some embodiments, an RNA dose is about 0.25 µg, at least 0.5 µg, at least 1 µg, at least 2 µg, at least 3 µg, at least 4 µg, at least 5 µg, at least 10 µg, at least 20 µg, at least 30 µg, or at least 40 µg. In some embodiments, an RNA dose is about 0.25 µg to 60 µg, 0.5 µg to 55 µg, 1 µg to 50 µg, 5 µg to 40 µg, or 10 µg to 30 µg may be administered per dose. In some embodiments, an RNA dose is about 30 µg. In some embodiments, at least two such doses are administered. For example, a second dose may be administered about 21 days following administration of the first dose. In some embodiments, a first booster dose is administered about one month after an initial dose. In some such embodiments, at least one further booster is administered at one-month interval(s). In some embodiments, after 2 or 3 boosters, a longer interval is introduced and no further booster is administered for at least 6, 9, 12, 18, 24, or more months. In some embodiments, a single further booster is administered after about 18 months. [0465] In some embodiments, a pharmaceutical composition comprises (i) a first polyribonucleotide encoding a first polyepitopic vaccine construct that includes one or more shared tumor antigen epitopes, and (ii) a second polyribonucleotide encoding a second polyepitopic vaccine construct that includes one or more neoantigen epitopes. In some embodiments, (i) a first polyribonucleotide encoding a first polyepitopic vaccine construct that includes one or more shared tumor antigen epitopes, and (ii) a second polyribonucleotide encoding a second polyepitopic vaccine construct that includes one or more neoantigen epitopes, are combined (e.g., mixed) and stored in a single vial. In some such embodiments, the first polyribonucleotide and the second polyribonucleotide are administered simultaneously with each dose. [0466] In some embodiments, (i) a first pharmaceutical composition comprises a first polyribonucleotide encoding a first polyepitopic vaccine construct that includes one or more shared tumor antigen epitopes, and (ii) a second pharmaceutical composition comprises a second Page 121 of 214 12608199v1
Attorney Docket No. 2013237-1122 polyribonucleotide encoding a second polyepitopic vaccine construct that includes one or more neoantigen epitopes. In some embodiments, the first pharmaceutical composition and the second pharmaceutical composition are combined (e.g., mixed) and administered simultaneously with each dose. In some embodiments, the first pharmaceutical composition and the second pharmaceutical composition are not combined (e.g., mixed) and are individually administered as a single dose or as separate doses. For example, in some embodiments, the first pharmaceutical composition and the second pharmaceutical composition are not combined (e.g., mixed) and a single dose comprises administration of the first and the second pharmaceutical composition concurrently or within about 1, 2, 3, 4, 5, 10, 15, or 30 minutes of each other. In other embodiments, the first pharmaceutical composition and the second pharmaceutical composition are not combined (e.g., mixed) and the first pharmaceutical composition is administered as a first dose, and the second pharmaceutical composition is administered as a separate dose. For example, in some embodiments, the first pharmaceutical composition and the second pharmaceutical composition are not combined (e.g., mixed) and the first pharmaceutical composition is administered as a first dose, and the second pharmaceutical composition is administered as a separate dose at least 1 hour, 2 hours, 3 hours, 4 hours, 6 hours, 12 hours, 24 hours, 48 hours, 1 week, 2 weeks, 4 weeks, 6 weeks, 8 weeks, 12 weeks, 16 weeks, 20 weeks, 24 weeks, 36 weeks, 48 weeks, 52 weeks, or longer, following administration of the first dose. In some embodiments, the first pharmaceutical composition and the second pharmaceutical composition are not combined (e.g., mixed) and the first pharmaceutical composition is administered as a first administration (e.g., as described herein), and the second pharmaceutical composition is administered as a “booster” administration (e.g., as described herein). In some embodiments, the first pharmaceutical composition and the second pharmaceutical composition are not combined (e.g., mixed) and the second pharmaceutical composition is administered as a first dose, and the first pharmaceutical composition is administered as a separate dose. For example, in some embodiments, the first pharmaceutical composition and the second pharmaceutical composition are not combined (e.g., mixed) and the second pharmaceutical composition is administered as a first dose, and the first pharmaceutical composition is administered as a separate dose at least 1, 2, 3, 4, 6, 12, 24, 48 hours, or longer, following administration of the first dose. In some embodiments, the first pharmaceutical composition and the second pharmaceutical composition are not combined (e.g., mixed) and the second Page 122 of 214 12608199v1
Attorney Docket No. 2013237-1122 pharmaceutical composition is administered as a first administration (e.g., as described herein), and the first pharmaceutical composition is administered as a “booster” administration (e.g., as described herein). [0467] In some embodiments, provided treatment methods are administered to subjects in combination with one or more other therapies (e.g., surgery, radiation, chemotherapy, other biologic therapy, etc.). VI. Methods of Manufacture [0468] Individual polyribonucleotides can be produced by methods known in the art. For example, in some embodiments, polyribonucleotides can be produced by in vitro transcription, for example, using a DNA template. A plasmid DNA used as a template for in vitro transcription to generate a polyribonucleotide described herein is also within the scope of the present disclosure. In some embodiments, two or more different polyribonucleotides described herein are produced together by in vitro transcription. In some embodiments, two or more different polyribonucleotides described herein are produced separately by in vitro transcription and are subsequently formulated together. [0469] A DNA template is used for in vitro RNA synthesis in the presence of an appropriate RNA polymerase (e.g., a recombinant RNA-polymerase such as a T7 RNA-polymerase) with ribonucleotide triphosphates (e.g., ATP, CTP, GTP, UTP). In some embodiments, polyribonucleotides (e.g., ones described herein) can be synthesized in the presence of modified ribonucleotide triphosphates. By way of example only, in some embodiments, pseudouridine (ψ), N1-methyl-pseudouridine (m1ψ), or 5-methyl-uridine (m5U) can be used to replace uridine triphosphate (UTP). In some embodiments, pseudouridine (ψ) can be used to replace uridine triphosphate (UTP). In some embodiments, N1-methyl-pseudouridine (m1ψ) can be used to replace uridine triphosphate (UTP). In some embodiments, 5-methyl-uridine (m5U) can be used to replace uridine triphosphate (UTP). [0470] As will be clear to those skilled in the art, during in vitro transcription, an RNA polymerase (e.g., as described and/or utilized herein) typically traverses at least a portion of a single-stranded DNA template in the 3'→ 5' direction to produce a single-stranded complementary RNA in the 5'→ 3' direction. Page 123 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0471] In some embodiments where a polyribonucleotide comprises a polyA tail, one of those skill in the art will appreciate that such a polyA tail may be encoded in a DNA template, e.g., by using an appropriately tailed PCR primer, or it can be added to a polyribonucleotide after in vitro transcription, e.g., by enzymatic treatment (e.g., using a poly(A) polymerase such as an E. coli Poly(A) polymerase). Suitable poly(A) tails are described herein above. For example, in some embodiments, a poly(A) tail comprises a nucleotide sequence of AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCATATGACTAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A. In some embodiments, a poly(A) tail comprises a plurality of A residues interrupted by a linker. In some embodiments, a linker comprises the nucleotide sequence GCATATGAC. [0472] In some embodiments, those skilled in the art will appreciate that addition of a 5' cap to an RNA (e.g., mRNA) can facilitate recognition and attachment of the RNA to a ribosome to initiate translation and enhances translation efficiency. Those skilled in the art will also appreciate that a 5' cap can also protect an RNA product from 5' exonuclease mediated degradation and thus increases half-life. Methods for capping are known in the art; one of ordinary skill in the art will appreciate that in some embodiments, capping may be performed after in vitro transcription in the presence of a capping system (e.g., an enzyme-based capping system such as, e.g., capping enzymes of vaccinia virus). In some embodiments, a cap may be introduced during in vitro transcription, along with a plurality of ribonucleotide triphosphates such that a cap is incorporated into a polyribonucleotide during transcription (also known as co- transcriptional capping). In some embodiments, a GTP fed-batch procedure with multiple additions in the course of the reaction may be used to maintain a low concentration of GTP in order to effectively cap the RNA. In some embodiments, in vitro transcription is performed in a single reaction (i.e., “one pot”) non-fedbatch procedure. Suitable 5' cap are described herein above. For example, in some embodiments, a 5' cap comprises m7(3'OMeG)(5')ppp(5')(2'OMeA)pG. [0473] Following RNA transcription, a DNA template is digested. In some embodiments, digestion can be achieved with the use of DNase I under appropriate conditions. [0474] In some embodiments, in-vitro transcribed polyribonucleotides may be provided in a buffered solution, for example, in a buffer such as HEPES, a phosphate buffer solution, a citrate Page 124 of 214 12608199v1
Attorney Docket No. 2013237-1122 buffer solution, an acetate buffer solution; in some embodiments, such solution may be buffered to a pH within a range of, for example, about 6.5 to about 7.5; in some embodiments approximately 7.0. In some embodiments, production of polyribonucleotides may further include one or more of the following steps: purification, mixing, filtration, and/or filling. [0475] In some embodiments, polyribonucleotides can be purified (e.g., in some embodiments after in vitro transcription reaction), for example, to remove components utilized or formed in the course of the production, like, e.g., proteins, DNA fragments, and/or or nucleotides. Various nucleic acid purifications that are known in the art can be used in accordance with the present disclosure. Certain purification steps may be or include, for example, one or more of precipitation, column chromatography (including, e.g., but not limited to anionic, cationic, hydrophobic interaction chromatography (HIC)), solid substrate-based purification (e.g., magnetic bead-based purification). In some embodiments, polyribonucleotides may be purified using magnetic bead-based purification, which in some embodiments may be or comprise magnetic bead-based chromatography. In some embodiments, polyribonucleotides may be purified using hydrophobic interaction chromatography (HIC) and/or diafiltration. In some embodiments, polyribonucleotides may be purified using HIC followed by diafiltration. [0476] In some embodiments, dsRNA may be obtained as side product during in vitro transcription. In some such embodiments, a second purification step may be performed to remove dsRNA contamination. For example, in some embodiments, cellulose materials (e.g., microcrystalline cellulose) may be used to remove dsRNA contamination, for examples in some embodiments in a chromatographic format. In some embodiments, cellulose materials (e.g., microcrystalline cellulose) can be pretreated to inactivate potential RNase contamination, for example in some embodiments by autoclaving followed by incubation with aqueous basic solution, e.g., NaOH. In some embodiments, cellulose materials may be used to purify polyribonucleotides according to methods described in WO 2017/182524, the entire content of which is incorporated herein by reference. [0477] In some embodiments, a batch of polyribonucleotides may be further processed by one or more steps of filtration and/or concentration. For example, in some embodiments, polyribonucleotide(s), for example, after removal of dsRNA contamination, may be further subject to diafiltration (e.g., in some embodiments by tangential flow filtration), for example, to Page 125 of 214 12608199v1
Attorney Docket No. 2013237-1122 adjust the concentration of polyribonucleotides to a desirable RNA concentration and/or to exchange buffer to a drug substance buffer. [0478] In some embodiments, polyribonucleotides may be processed through 0.2 μm filtration before they are filled into appropriate containers. [0479] In some embodiments, polyribonucleotides and compositions thereof may be manufactured in accordance with a process as described herein, or as otherwise known in the art. [0480] In some embodiments, polyribonucleotides and compositions thereof may be manufactured at a large scale. For example, in some embodiments, a batch of polyribonucleotides can be manufactured at a scale of greater than 1 g, greater than 2 g, greater than 3 g, greater than 4 g, greater than 5 g, greater than 6 g, greater than 7 g, greater than 8 g, greater than 9 g, greater than 10 g, greater than 15 g, greater than 20 g, or higher. [0481] In some embodiments, RNA quality control may be performed and/or monitored at any time during production process of polyribonucleotides and/or compositions comprising the same. For example, in some embodiments, RNA quality control parameters, including one or more of RNA identity (e.g., sequence, length, and/or RNA natures), RNA integrity, RNA concentration, residual DNA template, and residual dsRNA, may be assessed and/or monitored after each or certain steps of a polyribonucleotide manufacturing process, e.g., after in vitro transcription, and/or each purification step. [0482] In some embodiments, the stability of polyribonucleotides (e.g., produced by in vitro transcription) and/or compositions comprising polyribonucleotides can be assessed under various test storage conditions, for example, at room temperatures vs. fridge or sub-zero temperatures over a period of time (e.g., at least 3 months, at least 6 months, at least 9 months, at least 12 months, or longer). In some embodiments, polyribonucleotides (e.g., ones described herein) and/or compositions thereof may be stored stable at a fridge temperature (e.g., about 4^C to about 10^C) for at least 1 month or longer including, at least 2 months, at least 3 months, at least 4 months, at least 5 months, at least 6 months, at least 7 months, at least 8 months, at least 9 months, at least 10 months, at least 11 months, or at least 12 months or longer. In some embodiments, polyribonucleotides (e.g., ones described herein) and/or compositions thereof may be stored stable at a sub-zero temperature (e.g., -20^C or below) for at least 1 month or longer Page 126 of 214 12608199v1
Attorney Docket No. 2013237-1122 including, at least 2 months, at least 3 months, at least 4 months, at least 5 months, at least 6 months, at least 7 months, at least 8 months, at least 9 months, at least 10 months, at least 11 months, or at least 12 months or longer. In some embodiments, polyribonucleotides (e.g., ones described herein) and/or compositions thereof may be stored stable at room temperature (e.g., at about 25°C) for at least 1 month or longer. [0483] In some embodiments, one or more assessments may be utilized during manufacture, or other preparation or use of polyribonucleotides (e.g., as a release test). [0484] In some embodiments, one or more quality control parameters may be assessed to determine whether polyribonucleotides described herein meet or exceed acceptance criteria (e.g., for subsequent formulation and/or release for distribution). In some embodiments, such quality control parameters may include, but are not limited to RNA integrity, RNA concentration, residual DNA template and/or residual dsRNA. Certain methods for assessing RNA quality are known in the art; for example, one of skill in the art will recognize that in some embodiments, one or more analytical tests can be used for RNA quality assessment. Examples of such certain analytical tests may include but are not limited to gel electrophoresis, UV absorption, and/or PCR assay. [0485] In some embodiments, a batch of polyribonucleotides may be assessed for one or more features as described herein to determine next action step(s). For example, a batch of polyribonucleotides can be designated for one or more further steps of manufacturing and/or formulation and/or distribution if RNA quality assessment indicates that such a batch of polyribonucleotides meet or exceed the relevant acceptance criteria. Otherwise, an alternative action can be taken (e.g., discarding the batch) if such a batch of polyribonucleotides does not meet or exceed the acceptance criteria. [0486] In some embodiments, a batch of polyribonucleotides that satisfy assessment results can be utilized for one or more further steps of manufacturing and/or formulation and/or distribution. VII. DNA Constructs [0487] Among other things, the present disclosure provides DNA constructs, for example that may encode one or more polyepitopic vaccine constructs as described herein, or components Page 127 of 214 12608199v1
Attorney Docket No. 2013237-1122 thereof. In some embodiments, DNA constructs provided by and/or utilized in accordance with the present disclosure are comprised in a vector. [0488] Non-limiting examples of a vector include plasmid vectors, cosmid vectors, phage vectors such as lambda phage, viral vectors such as retroviral, adenoviral or baculoviral vectors, or artificial chromosome vectors such as bacterial artificial chromosomes (BAC), yeast artificial chromosomes (YAC), or P1 artificial chromosomes (PAC). In some embodiments, a vector is an expression vector. In some embodiments, a vector is a cloning vector. In general, a vector is a nucleic acid construct that can receive or otherwise become linked to a nucleic acid element of interest (e.g., a construct that is or encodes a payload, or that imparts a particular functionality, etc.). [0489] Expression vectors, which may be plasmid or viral or other vectors, can include an expressible sequence of interest (e.g., a coding sequence) that is functionally linked with one or more control elements (e.g., promoters, enhancers, transcription terminators, etc.). Typically, such control elements are selected for expression in a system of interest. In some embodiments, a system is ex vivo (e.g., an in vitro transcription system); in some embodiments, a system is in vivo (e.g., a bacterial, yeast, plant, insect, fish, vertebrate, mammalian cell or tissue, etc.). [0490] Cloning vectors are generally used to modify, engineer, and/or duplicate (e.g., by replication in vivo, for example in a simple system such as bacteria or yeast, or in vitro, such as by amplification such as polymerase chain reaction or other amplification process). In some embodiments, a cloning vector may lack expression signals. [0491] In many embodiments, a vector may include replication elements such as primer binding site(s) and/or origin(s) of replication. In many embodiments, a vector may include insertion or modification sites such as restriction endonuclease recognition sites and/or guide RNA binding sites, etc. [0492] In some embodiments, a vector is a viral vector (e.g., an AAV vector). In some embodiments, a vector is a non-viral vector. In some embodiments, a vector is a plasmid. [0493] Those skilled in the art are aware of a variety of technologies useful for the production of recombinant polynucleotides (e.g., DNA or RNA) as described herein. For example, restriction digestion, reverse transcription, amplification (e.g., by polymerase chain reaction), Gibson Page 128 of 214 12608199v1
Attorney Docket No. 2013237-1122 assembly, etc., are well established and useful tools and technologies. Alternatively or additionally, certain nucleic acids may be prepared or assembled by chemical and/or enzymatic synthesis. In some embodiments, a combination of known methods is utilized to prepare a recombinant polynucleotide. [0494] In some embodiments, polynucleotide(s) of the present disclosure are included in a DNA construct (e.g., a vector) amenable to transcription and/or translation. [0495] In some embodiments, an expression vector comprises a polynucleotide that encodes a polyepitopic vaccine construct of the present disclosure operatively linked to a sequence or sequences that control expression (e.g., promoters, start signals, stop signals, polyadenylation signals, activators, repressors, etc.). In some embodiments, a sequence or sequences that control expression are selected to achieve a desired level of expression. In some embodiments, more than one sequence that controls expression (e.g., promoters) are utilized. In some embodiments, more than one sequence that controls expression (e.g., promoters) are utilized to achieve a desired level of expression of a plurality of polynucleotides that encode a plurality proteins and/or polypeptides. In some embodiments, a plurality of polyepitopic vaccine constructs are expressed from the same vector (e.g., a bi-cistronic vector, a tri-cistronic vector, multi-cistronic). In some embodiments, a plurality of polyepitopic vaccine constructs are expressed, each of which is expressed from a separate vector. [0496] In some embodiments, an expression vector comprising a polynucleotide of the present disclosure is used to produce a RNA and/or protein and/or polypeptide in a host cell. In some embodiments, a host cell may be in vitro (e.g., a cell line) – for example a cell or cell line (e.g., Human Embryonic Kidney (HEK cells), Chinese Hamster Ovary cells, etc.) suitable for producing polynucleotides of the present disclosure and proteins and/or polypeptides encoded by said polynucleotides. [0497] In some embodiments, an expression vector is an RNA expression vector. In some embodiments, an RNA expression vector comprises a polynucleotide template used to produce a RNA in cell-free enzymatic mix. In some embodiments, an RNA expression vector comprising a polynucleotide template is enzymatically linearized prior to in vitro transcription. In some embodiments, a polynucleotide template is generated through PCR as a linear polynucleotide template. In some embodiments, a linearized polynucleotide is mixed with enzymes suitable for Page 129 of 214 12608199v1
Attorney Docket No. 2013237-1122 RNA synthesis, RNA capping and/or purification. In some embodiments, the resulting RNA is suitable for producing polyepitopic vaccine constructs encoded by the RNA. [0498] A variety of methods are known in the art to introduce an expression vector into host cells. In some embodiments, a vector may be introduced into host cells using transfection. In some embodiments, transfection is completed, for example, using calcium phosphate transfection, lipofection, or polyethylenimine-mediated transfection. In some embodiments, a vector may be introduced into a host cell using transduction. [0499] In some embodiments, transformed host cells are cultured following introduction of a vector into a host cell to allow for expression of said recombinant polynucleotides. In some embodiments, a transformed host cells are cultured for at least 12 hours, 16 hours, 20 hours, 24 hours, 28 hours, 32 hours, 36 hours 40 hours, 44 hours, 48 hours, 52 hours, 56 hours, 60 hours, 64 hours, 68 hours, 72 hours or longer. Transformed host cells are cultured in growth conditions (e.g., temperature, carbon-dioxide levels, growth medium) in accordance with the requirements of a host cell selected. A skilled artisan would recognize culture conditions for host cells selected are well known in the art. VIII. Computer System and Network Environment [0500] As shown in FIG.58, an implementation of a network environment 700 for use in providing systems and methods as described herein is shown and described. In brief overview, referring now to FIG.58, a block diagram of an exemplary cloud computing environment 700 is shown and described. The cloud computing environment 700 may include one or more resource providers 702a, 702b, 702c (collectively, 702). Each resource provider 702 may include computing resources. In some implementations, computing resources may include any hardware and/or software used to process data. For example, computing resources may include hardware and/or software capable of executing algorithms, computer programs, and/or computer applications. In some implementations, exemplary computing resources may include application servers and/or databases with storage and retrieval capabilities. Each resource provider 702 may be connected to any other resource provider 702 in the cloud computing environment 700. In some implementations, the resource providers 702 may be connected over a computer network 708. Each resource provider 702 may be connected to one or more computing device 704a, 704b, 704c (collectively, 704), over the computer network 708. Page 130 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0501] The cloud computing environment 700 may include a resource manager 706. The resource manager 706 may be connected to the resource providers 702 and the computing devices 704 over the computer network 708. In some implementations, the resource manager 706 may facilitate the provision of computing resources by one or more resource providers 702 to one or more computing devices 704. The resource manager 706 may receive a request for a computing resource from a particular computing device 704. The resource manager 706 may identify one or more resource providers 702 capable of providing the computing resource requested by the computing device 704. The resource manager 706 may select a resource provider 702 to provide the computing resource. The resource manager 706 may facilitate a connection between the resource provider 702 and a particular computing device 704. In some implementations, the resource manager 706 may establish a connection between a particular resource provider 702 and a particular computing device 704. In some implementations, the resource manager 706 may redirect a particular computing device 704 to a particular resource provider 702 with the requested computing resource. [0502] FIG.59 shows an example of a computing device 800 and a mobile computing device 850 that can be used to implement the techniques described in this disclosure. The computing device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The mobile computing device 850 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart-phones, and other similar computing devices. The components shown here, their connections and relationships, and their functions, are meant to be examples only, and are not meant to be limiting. [0503] The computing device 800 includes a processor 802, a memory 804, a storage device 806, a high-speed interface 808 connecting to the memory 804 and multiple high-speed expansion ports 810, and a low-speed interface 812 connecting to a low-speed expansion port 814 and the storage device 806. Each of the processor 802, the memory 804, the storage device 806, the high-speed interface 808, the high-speed expansion ports 810, and the low-speed interface 812, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate. The processor 802 can process instructions for Page 131 of 214 12608199v1
Attorney Docket No. 2013237-1122 execution within the computing device 800, including instructions stored in the memory 804 or on the storage device 806 to display graphical information for a GUI on an external input/output device, such as a display 816 coupled to the high-speed interface 808. In other implementations, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system). Thus, as the term is used herein, where a plurality of functions are described as being performed by “a processor”, this encompasses embodiments wherein the plurality of functions are performed by any number of processors (one or more) of any number of computing devices (one or more). Furthermore, where a function is described as being performed by “a processor”, this encompasses embodiments wherein the function is performed by any number of processors (one or more) of any number of computing devices (one or more) (e.g., in a distributed computing system). [0504] The memory 804 stores information within the computing device 800. In some implementations, the memory 804 is a volatile memory unit or units. In some implementations, the memory 804 is a non-volatile memory unit or units. The memory 804 may also be another form of computer-readable medium, such as a magnetic or optical disk. [0505] The storage device 806 is capable of providing mass storage for the computing device 800. In some implementations, the storage device 806 may be or contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations. Instructions can be stored in an information carrier. The instructions, when executed by one or more processing devices (for example, processor 802), perform one or more methods, such as those described above. The instructions can also be stored by one or more storage devices such as computer- or machine- readable mediums (for example, the memory 804, the storage device 806, or memory on the processor 802). [0506] The high-speed interface 808 manages bandwidth-intensive operations for the computing device 800, while the low-speed interface 812 manages lower bandwidth-intensive operations. Such allocation of functions is an example only. In some implementations, the high-speed Page 132 of 214 12608199v1
Attorney Docket No. 2013237-1122 interface 808 is coupled to the memory 804, the display 816 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 810, which may accept various expansion cards (not shown). In the implementation, the low-speed interface 812 is coupled to the storage device 806 and the low-speed expansion port 814. The low-speed expansion port 814, which may include various communication ports (e.g., USB, Bluetooth®, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter. [0507] The computing device 800 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 820, or multiple times in a group of such servers. In addition, it may be implemented in a personal computer such as a laptop computer 822. It may also be implemented as part of a rack server system 824. Alternatively, components from the computing device 800 may be combined with other components in a mobile device (not shown), such as a mobile computing device 850. Each of such devices may contain one or more of the computing device 800 and the mobile computing device 850, and an entire system may be made up of multiple computing devices communicating with each other. [0508] The mobile computing device 850 includes a processor 852, a memory 864, an input/output device such as a display 854, a communication interface 866, and a transceiver 868, among other components. The mobile computing device 850 may also be provided with a storage device, such as a micro-drive or other device, to provide additional storage. Each of the processor 852, the memory 864, the display 854, the communication interface 866, and the transceiver 868, are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate. [0509] The processor 852 can execute instructions within the mobile computing device 850, including instructions stored in the memory 864. The processor 852 may be implemented as a chipset of chips that include separate and multiple analog and digital processors. The processor 852 may provide, for example, for coordination of the other components of the mobile computing device 850, such as control of user interfaces, applications run by the mobile computing device 850, and wireless communication by the mobile computing device 850. Page 133 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0510] The processor 852 may communicate with a user through a control interface 858 and a display interface 856 coupled to the display 854. The display 854 may be, for example, a TFT (Thin-Film-Transistor Liquid Crystal Display) display or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology. The display interface 856 may comprise appropriate circuitry for driving the display 854 to present graphical and other information to a user. The control interface 858 may receive commands from a user and convert them for submission to the processor 852. In addition, an external interface 862 may provide communication with the processor 852, so as to enable near area communication of the mobile computing device 850 with other devices. The external interface 862 may provide, for example, for wired communication in some implementations, or for wireless communication in other implementations, and multiple interfaces may also be used. [0511] The memory 864 stores information within the mobile computing device 850. The memory 864 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memory unit or units. An expansion memory 874 may also be provided and connected to the mobile computing device 850 through an expansion interface 872, which may include, for example, a SIMM (Single In Line Memory Module) card interface. The expansion memory 874 may provide extra storage space for the mobile computing device 850, or may also store applications or other information for the mobile computing device 850. Specifically, the expansion memory 874 may include instructions to carry out or supplement the processes described above, and may include secure information also. Thus, for example, the expansion memory 874 may be provide as a security module for the mobile computing device 850, and may be programmed with instructions that permit secure use of the mobile computing device 850. In addition, secure applications may be provided via the SIMM cards, along with additional information, such as placing identifying information on the SIMM card in a non-hackable manner. [0512] The memory may include, for example, flash memory and/or NVRAM memory (non- volatile random access memory), as discussed below. In some implementations, instructions are stored in an information carrier. The instructions, when executed by one or more processing devices (for example, processor 852), perform one or more methods, such as those described above. The instructions can also be stored by one or more storage devices, such as one or more Page 134 of 214 12608199v1
Attorney Docket No. 2013237-1122 computer- or machine-readable mediums (for example, the memory 864, the expansion memory 874, or memory on the processor 852). In some implementations, the instructions can be received in a propagated signal, for example, over the transceiver 868 or the external interface 862. [0513] The mobile computing device 850 may communicate wirelessly through the communication interface 866, which may include digital signal processing circuitry where necessary. The communication interface 866 may provide for communications under various modes or protocols, such as GSM voice calls (Global System for Mobile communications), SMS (Short Message Service), EMS (Enhanced Messaging Service), or MMS messaging (Multimedia Messaging Service), CDMA (code division multiple access), TDMA (time division multiple access), PDC (Personal Digital Cellular), WCDMA (Wideband Code Division Multiple Access), CDMA2000, or GPRS (General Packet Radio Service), among others. Such communication may occur, for example, through the transceiver 868 using a radio-frequency. In addition, short- range communication may occur, such as using a Bluetooth®, Wi-Fi™, or other such transceiver (not shown). In addition, a GPS (Global Positioning System) receiver module 870 may provide additional navigation- and location-related wireless data to the mobile computing device 850, which may be used as appropriate by applications running on the mobile computing device 850. [0514] The mobile computing device 850 may also communicate audibly using an audio codec 860, which may receive spoken information from a user and convert it to usable digital information. The audio codec 860 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the mobile computing device 850. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on the mobile computing device 850. [0515] The mobile computing device 850 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 880. It may also be implemented as part of a smart-phone 882, personal digital assistant, or other similar mobile device. [0516] Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific Page 135 of 214 12608199v1
Attorney Docket No. 2013237-1122 integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. [0517] These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms machine-readable medium and computer-readable medium refer to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor. [0518] To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input. [0519] The systems and techniques described here can be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication Page 136 of 214 12608199v1
Attorney Docket No. 2013237-1122 network). Examples of communication networks include a local area network (LAN), a wide area network (WAN), and the Internet. [0520] The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. [0521] All publications, patent applications, patents, and other references mentioned herein, including GenBank Accession Numbers, are incorporated by reference in their entirety. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described herein. EXEMPLARY EMBODIMENTS [0522] Exemplary embodiments as described below are also within the scope of the present disclosure: [0523] Embodiment 1. A method of producing a cancer vaccine for a subject, the method comprising combining: (a) a first pharmaceutical composition comprising a first polyribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes; and (b) a second pharmaceutical composition comprising a second polyribonucleotide encoding a second polyepitopic vaccine construct comprising a plurality of neoantigen epitopes. [0524] Embodiment 2. The method of Embodiment 1, wherein the plurality of shared tumor antigen epitopes comprises at least one shared tumor antigen epitope expressed by at least 15% of subjects having the same cancer type. [0525] Embodiment 3. The method of Embodiment 1 or 2, further comprising producing the second pharmaceutical composition. Page 137 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0526] Embodiment 4. The method of Embodiment 3, further comprising: (a) producing the second pharmaceutical composition by a method comprising: (i) identifying cancer-specific somatic mutations in a tumor specimen of a subject to provide a cancer mutation signature of the subject; and (ii) producing the second pharmaceutical composition comprising the second polyribonucleotide encoding the second polyepitopic vaccine construct, wherein the plurality of neoantigen epitopes comprises a plurality of cancer-specific somatic mutations of the cancer mutation signature identified in step (i); and/or (b) producing the first pharmaceutical composition by a method comprising: (iii) identifying one or more shared tumor antigen epitopes expressed in a tumor specimen of the subject; and (iv) producing the first pharmaceutical composition comprising the first polyribonucleotide encoding the first polyepitopic vaccine construct, wherein the plurality of individualized shared tumor antigen epitopes comprises a plurality of expressed shared tumor antigen epitopes identified in step (iii). [0527] Embodiment 5. The method of Embodiment 4, wherein the step of identifying cancer- specific somatic mutations comprises using next generation sequencing (NGS). [0528] Embodiment 6. The method of Embodiment 3, further comprising producing the second pharmaceutical composition by a method comprising: (i) providing a tumor specimen from the subject and providing a non-tumor specimen; (ii) identifying sequence differences between (1) the genome, exome and/or transcriptome of the tumor specimen and (2) the genome, exome and/or transcriptome of the non-tumor specimen; and (iii) producing the second pharmaceutical composition comprising the second polyribonucleotide, wherein the neoantigen epitopes comprise one or more of the sequence differences identified in step (ii). [0529] Embodiment 7. The method of Embodiment 6, wherein the step of identifying sequence differences comprises using NGS. [0530] Embodiment 8. The method of Embodiment 6 or Embodiment 7, wherein the non-tumor specimen is from the subject. [0531] Embodiment 9. The method of any one of Embodiments 6-8, wherein the step of identifying sequence differences comprises identifying sequence differences between the exome of the tumor specimen and the exome of the non-tumor specimen. Page 138 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0532] Embodiment 10. The method of any one of Embodiments 1-9, wherein the first polyepitopic vaccine construct comprises about 2 to about 50, or about 2 to about 40, or about 2 to about 30, or about 2 to about 20 shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes). [0533] Embodiment 11. The method of any one of Embodiments 1-10, wherein the first polyepitopic vaccine construct comprises shared tumor antigen epitopes from one shared tumor antigen. [0534] Embodiment 12. The method of any one of Embodiments 1-10, wherein the first polyepitopic vaccine construct comprises at least one shared tumor antigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 different shared tumor antigens. [0535] Embodiment 13. The method of any one of Embodiments 1-12, wherein the second polyepitopic vaccine construct comprises about 2 to about 50, or about 2 to about 40, or about 2 to about 30, or about 2 to about 20 neoantigen epitopes. [0536] Embodiment 14. The method of any one of Embodiments 1-13, wherein the second polyepitopic vaccine construct comprises neoantigen epitopes from one neoantigen. [0537] Embodiment 15. The method of any one of Embodiments 1-13, wherein the second polyepitopic vaccine construct comprises at least one neoantigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 different neoantigens. [0538] Embodiment 16. The method of any one of Embodiments 1-15, wherein each shared tumor antigen epitope (e.g., individualized shared tumor antigen epitope) and each neoantigen epitope comprises about 8 to about 50 amino acids. [0539] Embodiment 17. The method of any one of Embodiments 1-16, wherein each shared tumor antigen epitope (e.g., individualized shared tumor antigen epitope) and each neoantigen epitope comprises about 30 amino acids. [0540] Embodiment 18. The method of any one of Embodiments 1-17, wherein each shared tumor antigen epitope (e.g., individualized shared tumor antigen epitope) and each neoantigen epitope comprises about 9 amino acids. Page 139 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0541] Embodiment 19. The method of any one of Embodiments 1-18, wherein the first polyepitopic construct comprises shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes) arranged in a head-to-tail configuration. [0542] Embodiment 20. The method of any one of Embodiments 1-19, wherein the first polyepitopic construct comprises a linker between each shared tumor antigen epitope (e.g., individualized shared tumor antigen epitope). [0543] Embodiment 21. The method of any one of Embodiments 1-20, wherein the second polyepitopic construct comprises neoantigen epitopes arranged in a head-to-tail configuration. [0544] Embodiment 22. The method of any one of Embodiments 1-21, wherein the second polyepitopic construct comprises a linker between each neoantigen epitope. [0545] Embodiment 23. The method of any one of Embodiments 20-22, wherein the linker comprises about 3 to about 50 amino acids. [0546] Embodiment 24. The method of any one of Embodiments 20-23, wherein the linker comprises about 6 to about 30 amino acids. [0547] Embodiment 25. The method of any one of Embodiments 20-24, wherein the linker comprises (i) glycine amino acids or (ii) serine and glycine amino acids. [0548] Embodiment 26. The method of any one of Embodiments 1-25, wherein the first and the second polyribonucleotides comprise a 5’ cap, a 5’ UTR, a 3’ UTR, and a polyA tail. [0549] Embodiment 27. A pharmaceutical composition comprising: (1) a first polyribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes); and a second polyribonucleotide encoding a second polyepitopic vaccine construct comprising a plurality of neoantigen epitopes; or (2) at least one polyribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes); and a second polyepitopic vaccine construct comprising a plurality of neoantigen epitopes; or (3) at least one polyribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes) and a plurality of neoantigen epitopes. Page 140 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0550] Embodiment 28. The pharmaceutical composition of Embodiment 27, wherein the plurality of shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes) comprises at least one shared tumor antigen epitope expressed by at least 15% of subjects having the same cancer type. [0551] Embodiment 29. The pharmaceutical composition of Embodiment 27 or 28, wherein the first polyepitopic vaccine construct comprises about 2 to about 50, or about 2 to about 40, or about 2 to about 30, or about 2 to about 20 shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes). [0552] Embodiment 30. The pharmaceutical composition of any one of Embodiments 27-29, wherein the first polyepitopic vaccine construct comprises shared tumor antigen epitopes from one shared tumor antigen. [0553] Embodiment 31. The pharmaceutical composition of any one of Embodiments 27-29, wherein the first polyepitopic vaccine construct comprises at least one shared tumor antigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 different shared tumor antigens. [0554] Embodiment 32. The pharmaceutical composition of any one of Embodiments 27-31, wherein the second polyepitopic vaccine construct comprises about 2 to about 50, or about 2 to about 40, or about 2 to about 30, or about 2 to about 20 neoantigen epitopes. [0555] Embodiment 33. The pharmaceutical composition of any one of Embodiments 27-32, wherein the second polyepitopic vaccine construct comprises neoantigen epitopes from one neoantigen. [0556] Embodiment 34. The pharmaceutical composition of any one of Embodiments 27-32, wherein the second polyepitopic vaccine construct comprises at least one neoantigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 different neoantigens. [0557] Embodiment 35. The pharmaceutical composition of any one of Embodiments 27-34, wherein each shared tumor antigen epitope (e.g., individualized shared tumor antigen epitope) and each neoantigen epitope comprises about 8 to about 50 amino acids. [0558] Embodiment 36. The pharmaceutical composition of any one of Embodiments 27-35, wherein each shared tumor antigen epitope (e.g., individualized shared tumor antigen epitope) and each neoantigen epitope comprises about 30 amino acids. Page 141 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0559] Embodiment 37. The pharmaceutical composition of any one of Embodiments 27-36, wherein each shared tumor antigen epitope (e.g., individualized shared tumor antigen epitope) and each neoantigen epitope comprises about 9 amino acids. [0560] Embodiment 38. The pharmaceutical composition of any one of Embodiments 27-37, wherein the first polyepitopic construct comprises shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes) arranged in a head-to-tail configuration. [0561] Embodiment 39. The pharmaceutical composition of any one of Embodiments 27-38, wherein the first polyepitopic construct comprises a linker between each shared tumor antigen epitope (e.g., individualized shared tumor antigen epitope). [0562] Embodiment 40. The pharmaceutical composition of any one of Embodiments 27-39, wherein the second polyepitopic construct comprises neoantigen epitopes arranged in a head-to- tail configuration. [0563] Embodiment 41. The pharmaceutical composition of any one of Embodiments 27-40, wherein the second polyepitopic construct comprises a linker between each neoantigen epitope. [0564] Embodiment 42. The pharmaceutical composition of any one of Embodiments 39-41, wherein the linker comprises about 3 to about 50 amino acids. [0565] Embodiment 43. The pharmaceutical composition of any one of Embodiments 39-42, wherein the linker comprises about 6 to about 30 amino acids. [0566] Embodiment 44. The pharmaceutical composition of any one of Embodiments 39-43, wherein the linker comprises (i) glycine amino acids or (ii) serine and glycine amino acids. [0567] Embodiment 45. The pharmaceutical composition of any one of Embodiments 27-44, wherein the first and the second polyribonucleotides comprise a 5’ cap, a 5’ UTR, a 3’ UTR, and a polyA tail. [0568] Embodiment 46. A method of treating a subject having cancer, the method comprising: (a) administering to the subject a first pharmaceutical composition comprising a first polyribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes); and (b) administering to the subject a second pharmaceutical composition comprising a second Page 142 of 214 12608199v1
Attorney Docket No. 2013237-1122 polyribonucleotide construct encoding a second polyepitopic vaccine comprising a plurality of neoantigen epitopes. [0569] Embodiment 47. The method of Embodiment 46, further comprising before the administering steps, mixing the first pharmaceutical composition and the second composition to form an admixture comprising the first pharmaceutical composition and the second pharmaceutical composition, and administering the admixture to the subject. [0570] Embodiment 48. The method of Embodiment 46, comprising administering the first pharmaceutical composition concurrently with the second pharmaceutical composition. [0571] Embodiment 49. The method of Embodiment 46, comprising administering the second pharmaceutical composition about 1, 2, 3, 4, 5, 10, 15, or 30 minutes before or after administering the first pharmaceutical composition. [0572] Embodiment 50. The method of Embodiment 46, comprising administering the second pharmaceutical composition at least 1 hour, 2 hours, 3 hours, 4 hours, 6 hours, 12 hours, 24 hours, 48 hours, 1 week, 2 weeks, 4 weeks, 6 weeks, 8 weeks, 12 weeks, 16 weeks, 20 weeks, 24 weeks, 36 weeks, 48 weeks, or 52 weeks before or after administering the first pharmaceutical composition. [0573] Embodiment 51. The method of any one of Embodiments 46-50, wherein the plurality of shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes) comprises at least one shared tumor antigen epitope expressed by at least 15% of subjects having the same cancer type. [0574] Embodiment 52. The method of any one of Embodiments 46-51, further comprising producing the second pharmaceutical composition. [0575] Embodiment 53. The method of Embodiment 52, further comprising producing the second pharmaceutical by a method comprising:(i) identifying cancer-specific somatic mutations in a tumor specimen of a subject to provide a cancer mutation signature of the subject; and (ii) producing the second pharmaceutical composition comprising the second polyribonucleotide encoding the second polyepitopic vaccine construct, wherein the plurality of neoantigen epitopes comprises a plurality of cancer-specific somatic mutations of the cancer mutation signature identified in step (i). Page 143 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0576] Embodiment 54. The method of Embodiment 53, wherein the step of identifying cancer- specific somatic mutations comprises using next generation sequencing (NGS). [0577] Embodiment 55. The method of Embodiment 52, further comprising producing the second pharmaceutical by a method comprising:(i) providing a tumor specimen from the subject and providing a non-tumor specimen; (ii) identifying sequence differences between (1) the genome, exome and/or transcriptome of the tumor specimen and (2) the genome, exome and/or transcriptome of the non-tumor specimen; and (iii) producing the second pharmaceutical composition comprising the second polyribonucleotide, wherein the neoantigen epitopes comprise one or more of the sequence differences identified in step (ii). [0578] Embodiment 56. The method of Embodiment 55, wherein the step of identifying sequence differences comprises using NGS. [0579] Embodiment 57. The method of Embodiment 55 or Embodiment 56, wherein the non- tumor specimen is from the subject. [0580] Embodiment 58. The method of any one of Embodiments 55-57, wherein the step of identifying sequence differences comprises identifying sequence differences between the exome of the tumor specimen and the exome of the non-tumor specimen. [0581] Embodiment 59. The method of any one of Embodiments 46-58, wherein the first polyepitopic vaccine construct comprises about 2 to about 50, or about 2 to about 40, or about 2 to about 30, or about 2 to about 20 shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes). [0582] Embodiment 60. The method of any one of Embodiments 46-59, wherein the first polyepitopic vaccine construct comprises shared tumor antigen epitopes from one shared tumor antigen. [0583] Embodiment 61. The method of any one of Embodiments 46-59, wherein the first polyepitopic vaccine construct comprises at least one shared tumor antigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 different shared tumor antigens. [0584] Embodiment 62. The method of any one of Embodiments 46-61, wherein the second polyepitopic vaccine construct comprises about 2 to about 50, or about 2 to about 40, or about 2 to about 30, or about 2 to about 20 neoantigen epitopes. Page 144 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0585] Embodiment 63. The method of any one of Embodiments 46-62, wherein the second polyepitopic vaccine construct comprises neoantigen epitopes from one neoantigen. [0586] Embodiment 64. The method of any one of Embodiments 46-62, wherein the second polyepitopic vaccine construct comprises at least one neoantigen epitope from each of about 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 different neoantigens. [0587] Embodiment 65. The method of any one of Embodiments 46-64, wherein each shared tumor antigen epitope (e.g., individualized shared tumor antigen epitope) and each neoantigen epitope comprises about 8 to about 50 amino acids. [0588] Embodiment 66. The method of any one of Embodiments 46-65, wherein each shared tumor antigen epitope (e.g., individualized shared tumor antigen epitope) and each neoantigen epitope comprises about 30 amino acids. [0589] Embodiment 67. The method of any one of Embodiments 46-66, wherein each shared tumor antigen epitope (e.g., individualized shared tumor antigen epitope) and each neoantigen epitope comprises about 9 amino acids. [0590] Embodiment 68. The method of any one of Embodiments 46-67, wherein the first polyepitopic construct comprises shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes) arranged in a head-to-tail configuration. [0591] Embodiment 69. The method of any one of Embodiments 46-68, wherein the first polyepitopic construct comprises a linker between each shared tumor antigen epitope (e.g., individualized shared tumor antigen epitope). [0592] Embodiment 70. The method of any one of Embodiments 46-69, wherein the second polyepitopic construct comprises neoantigen epitopes arranged in a head-to-tail configuration. [0593] Embodiment 71. The method of any one of Embodiments 46-70, wherein the second polyepitopic construct comprises a linker between each neoantigen epitope. [0594] Embodiment 72. The method of any one of Embodiments 69-71, wherein the linker comprises about 3 to about 50 amino acids. [0595] Embodiment 73. The method of any one of Embodiments 69-72, wherein the linker comprises about 6 to about 30 amino acids. Page 145 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0596] Embodiment 74. The method of any one of Embodiments 69-73, wherein the linker comprises (i) glycine amino acids or (ii) serine and glycine amino acids. [0597] Embodiment 75. The method of any one of Embodiments 46-74, wherein the first and the second polyribonucleotides comprise a 5’ cap, a 5’ UTR, a 3’ UTR, and a polyA tail. [0598] Embodiment 76. A method of treating a subject having cancer, the method comprising a step of administering to the subject one or both of a first pharmaceutical composition and a second pharmaceutical composition so that the subject receives both, wherein: (a) the first composition comprises a first polyribonucleotide encoding a first polyepitopic vaccine construct comprising a plurality of shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes); and (b) the second pharmaceutical composition comprises a second polyribonucleotide construct encoding a second polyepitopic vaccine comprising a plurality of neoantigen epitopes. [0599] Embodiment 77. A method of treating cancer, the method comprising a step of administering to each of a plurality of cancer patients a combination therapy that delivers each of: (a) a first polypeptide comprising a plurality of neoepitopes determined to have arisen in a tumor in the cancer patients; and (b) a second polypeptide comprising a plurality of shared tumor antigen epitopes (e.g., individualized shared tumor antigen epitopes) detectable in a sample from the subject. [0600] Embodiment 78. The method of Embodiment 77, wherein one or both of the first and second polypeptides is delivered by expression from an administered polynucleotide encoding the polypeptide. [0601] Embodiment 79. The method of Embodiment 78, wherein the polynucleotide is a polyribonucleotide. [0602] Embodiment 80. The method of Embodiment 1, wherein the first pharmaceutical composition contains no neoantigen epitopes. [0603] Embodiment 81. A method of selecting neoantigen epitopes for inclusion in a polyepitopic construct [e.g., a vaccine construct (e.g., for immunotherapy, for a cancer therapy)] for a subject, the method comprising: (a) obtaining a candidate epitope list identifying a plurality of candidate epitopes, MHC presentation data, and expression data comprising, for each Page 146 of 214 12608199v1
Attorney Docket No. 2013237-1122 particular candidate epitope of the plurality, (i) and (ii) as follows: (i) a corresponding MHC presentation score representing a known and/or predicted likelihood and/or strength of binding between the particular candidate epitope and an MHC of the subject; and (ii) a corresponding expression score representing measured expression in the subject; (b) for each candidate epitope of at least a portion of the plurality of candidate epitopes, determining (e.g., by the processor) a corresponding rank (e.g., relative to other candidate epitopes of the plurality) based at least in part on the MHC presentation and expression data; (c) selecting (e.g., by the processor) a subset of the plurality of candidate epitopes for inclusion in the polyepitopic construct based at least in part on their corresponding rankings as a set of neoantigen epitopes; and optionally (d) storing and/or providing (e.g., for display and/or further processing) (e.g., by the processor) the set of neoantigen epitopes. [0604] Embodiment 82. The method of Embodiment 81, further comprising obtaining germline DNA sequence from the subject and tumor DNA sequence from the subject. [0605] Embodiment 83. The method of 82, further comprising comparing germline DNA sequence from the subject and tumor DNA sequence from the subject to obtain the candidate epitope list. [0606] Embodiment 84. The method of Embodiment 82, further comprising determining the subject’s HLA type from the germline DNA sequence. [0607] Embodiment 85. The method of any one of Embodiments 81-84, wherein, for each of at least a portion of the candidate epitopes, the corresponding MHC presentation score classifies the candidate epitope as a known T-cell epitope and/or a known ligand. [0608] Embodiment 86. The method of any one of Embodiments 81-85, wherein, for each of at least a portion of the candidate epitopes, the corresponding MHC presentation score represents a predicted MHC binding of the candidate epitope (e.g., wherein the MHC presentation score is a numerical value that quantifies a predicted binding strength and/or likelihood; e.g., wherein the MHC presentation score is a percentile rank classifying the candidate epitope as a predicted strong, intermediate, or weak binder). [0609] Embodiment 87. A method of selecting shared antigen epitopes for inclusion in a polyepitopic construct [e.g., a vaccine construct (e.g., for immunotherapy, for a cancer therapy)] Page 147 of 214 12608199v1
Attorney Docket No. 2013237-1122 for a subject, the method comprising: (a) obtaining a candidate epitope list identifying a plurality of candidate epitopes, MHC presentation data, and expression data comprising, for each particular candidate epitope of the plurality, (i) and (ii) as follows: (i) a corresponding MHC presentation score representing a known and/or predicted likelihood and/or strength of binding between the particular candidate epitope and an MHC of the subject; and (ii) a corresponding expression score representing measured expression in the subject, wherein the measured expression exceeds a predetermined threshold level; (b) for each candidate epitope of at least a portion of the plurality of candidate epitopes, determining (e.g., by the processor) a corresponding rank (e.g., relative to other candidate epitopes of the plurality) based at least in part on the MHC presentation and expression data; (c) selecting (e.g., by the processor) a subset of the plurality of candidate epitopes for inclusion in the polyepitopic construct based at least in part on their corresponding rankings as a set of shared antigen epitopes; and optionally (d) storing and/or providing (e.g., for display and/or further processing) (e.g., by the processor) the set of shared antigen epitopes. [0610] Embodiment 88. The method of Embodiment 87, further comprising obtaining germline DNA sequence from the subject and tumor DNA sequence from the subject. [0611] Embodiment 89. The method of 88, further comprising comparing germline DNA sequence from the subject and tumor DNA sequence from the subject to identify one or more germline mutations and one or more somatic mutations. [0612] Embodiment 90. The method of Embodiment 89, wherein the plurality of candidate epitopes do not comprise a somatic mutation. [0613] Embodiment 91. The method of Embodiment 88, further comprising determining the subject’s HLA type from the germline DNA sequence. [0614] Embodiment 92. The method of any one of Embodiments 87-91, wherein, for each of at least a portion of the candidate epitopes, the corresponding MHC presentation score classifies the candidate epitope as a known T-cell epitope and/or a known ligand. [0615] Embodiment 93. The method of any one of Embodiments 87-92, wherein, for each of at least a portion of the candidate epitopes, the corresponding MHC presentation score represents a predicted MHC binding of the candidate epitope (e.g., wherein the MHC presentation score is a Page 148 of 214 12608199v1
Attorney Docket No. 2013237-1122 numerical value that quantifies a predicted binding strength and/or likelihood; e.g., wherein the MHC presentation score is a percentile rank classifying the candidate epitope as a predicted strong, intermediate, or weak binder). [0616] Embodiment 94. A method of selecting shared antigen epitopes for inclusion in a polyepitopic construct [e.g., a vaccine construct (e.g., for immunotherapy, for a cancer therapy)] for a subject, the method comprising: (a) obtaining (e.g., receiving, accessing, and/or generating) (e.g., by a processor of a computing device) a candidate epitope list identifying a plurality of candidate epitopes (e.g., non-neoepitopes, neoepitopes) and MHC presentation and immunogenicity data comprising, for each particular candidate epitope of the plurality, (i) and (ii) as follows: (i) a corresponding MHC presentation score representing a known and/or predicted likelihood and/or strength of binding between the particular candidate epitope and an MHC of the subject; and (ii) a corresponding immunogenicity score representing a known and/or predicted immunogenicity of the particular candidate epitope; (b) for each candidate epitope of at least a portion of the plurality of candidate epitopes, determining (e.g., by the processor) a corresponding rank (e.g., relative to other candidate epitopes of the plurality) based at least in part on the MHC presentation and immunogenicity data; (c) selecting (e.g., by the processor) a subset of the plurality of candidate epitopes for inclusion in the polyepitopic construct based at least in part on their corresponding rankings as a set of shared antigen epitopes; and optionally (d) storing and/or providing (e.g., for display and/or further processing) (e.g., by the processor) the set of shared antigen epitopes. [0617] Embodiment 95. The method of Embodiment 94, wherein, for each of at least a portion of the candidate epitopes, the corresponding MHC presentation score classifies the candidate epitope as a known T-cell epitope and/or a known ligand. [0618] Embodiment 96. The method of Embodiment 94 or 95, wherein, for each of at least a portion of the candidate epitopes, the corresponding MHC presentation score represents a predicted MHC binding of the candidate epitope (e.g., wherein the MHC presentation score is a numerical value that quantifies a predicted binding strength and/or likelihood; e.g., wherein the MHC presentation score is a percentile rank classifying the candidate epitope as a predicted strong, intermediate, or weak binder). Page 149 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0619] Embodiment 97. The method of any one of Embodiments 94-96, wherein for each of at least a portion (e.g., up to all) of the candidate epitopes, the corresponding immunogenicity score categorizes the candidate epitope according to a level of evidence of immunogenicity (e.g., and, optionally, safety) (e.g., based on pre-existing data). [0620] Embodiment 98. The method of any one of Embodiments 94-97, comprising: identifying (e.g., by the processor) one or more of the candidate epitopes in a proteome of the subject; and filtering (e.g., by the processor) the plurality of candidate epitopes to exclude the one or more candidate epitopes identified the proteome of the subject (e.g., if at least a part of a candidate out of the plurality of candidate polynucleotide sequences matches to an 8-mer long sequence in the proteome data). [0621] Embodiment 99. The method of any one of Embodiments 94-98, comprising [e.g., at step (b) or (c)] subdividing (e.g., by the processor), the plurality of candidate epitopes into a plurality of batches according to their corresponding MHC presentation scores [e.g., each batch associated a particular classification of the candidate epitopes (e.g., a list of “known” epitopes, the list of “known” ligands) and/or a particular range of MHC binding data (e.g., predicted )]. [0622] Embodiment 100. The method of Embodiment 99, wherein, each batch is associated with one or more (e.g., one or two) tiers, each tier associated with, and comprising candidate epitopes having, a particular immunogenicity categorization. [0623] Embodiment 101. The method of any one of Embodiments 94-100, comprising determining, for one or more candidate epitopes, a corresponding target cluster by: selecting (e.g., by the processor) a first candidate epitope as a target core; identifying (e.g., by the processor) one or more ligands belonging to a same transcript window as, and overlapping (e.g., by at least one amino acid) with, the first candidate epitope; and extending the target core to include the one or more ligands, thereby forming a target cluster [e.g., and checking if a final length of the core does not exceed 40 amino acids and/or if immunogenicity associated data of the core satisfies immunogenicity associated thresholds of associated batch]. [0624] Embodiment 102. The method of any one of Embodiments 94-101, wherein two cores of a plurality of candidate cores are combined into a combined core if the two cores overlap by at least one amino acid and correspond to a same transcript window (e.g., if the two cores belong to a same batch) (e.g., wherein a rank of the combined core is determined by a best rank of the two Page 150 of 214 12608199v1
Attorney Docket No. 2013237-1122 cores) (e.g., if a length of the combined core is longer than 40 amino acids, the combined core is split). [0625] Embodiment 103. The method of any one of Embodiments 94-102, comprising selecting candidate epitopes for inclusion in the construct in an iterative fashion, based at least in part on each candidate epitope’s corresponding rank (e.g., and on a pre-defined maximal number of target clusters and/or cores available for inclusion in the construct and/or a maximum construct length). [0626] Embodiment 104. The method of any one of Embodiments 94-103, comprising updating a rank of one or more candidate epitopes while selecting the subset for inclusion in the construct (e.g., to ensure inclusion of epitopes from a diversity of antigens). [0627] Embodiment 105. The method of any one of Embodiments 94-104, wherein step (c) comprises: selecting (e.g., by the processor) two or more candidate epitopes [e.g., within a particular batch] from a same first antigen; determining (e.g., by the processor) a number of candidate epitopes selected from the first antigen to be greater than or equal to a threshold value; and reducing (e.g., by the processor) a rank of at least a portion of remaining candidate epitopes from the first antigen [e.g., all remaining candidate epitopes from the first antigen and within the particular batch], thereby downranking remaining candidate epitopes from the first antigen. [0628] Embodiment 106. A system comprising a processor of a computing device and a memory having instructions stored thereon, wherein the instructions, when executed by the processor, cause the processor to perform the method of any one of Embodiments 94-105. [0629] Embodiment 107. A method of manufacturing the pharmaceutical composition of any one of Embodiments 27-45, the method comprising selecting a plurality of neoantigen epitopes and/or a plurality of shared antigen epitopes by the method of any one of Embodiments 81-105, and producing or having produced the pharmaceutical composition of any one of Embodiments 27-45. [0630] Embodiment 108. The method of any one of Embodiments 1-26, further comprising selecting the plurality of neoantigen epitopes and/or the plurality of shared antigen epitopes by the method of any one of Embodiments 81-105. Page 151 of 214 12608199v1
Attorney Docket No. 2013237-1122 EXEMPLIFICATION Example 1: Anti-tumoral efficacy of vaccine candidates in prophylactic setting using MC38 tumor mouse model A. Summary [0631] In the present example, tolerability, immunogenicity, and anti-tumoral efficacy of administering RNA vaccines were assessed. RNA vaccines were administered that encoded neoepitopes, non-neoepitopes, or RNA vaccines encoding both neoepitopes and non-neoepitopes were dually administered. Further, tolerability, immunogenicity, and anti-tumoral efficacy of different RNA-LNP vaccine formats and formulations were assessed following intramuscular (i.m.) administration. Tested RNA-LNP vaccine formats and formulations included (i) RNA containing modified (modRNA) or unmodified uridines (uRNA), and (ii) lipid nanoparticles (LNPs) and lipoplexes (LPXs), respectively. Induction of T-cell responses against neo- as well as non-neoepitopes using i.m. administered LNP formulations, and immunogenicity and T-cell functionality following vaccination with different RNA formats were assessed. [0632] Finally, the present example demonstrated that anti-tumoral efficacy was strongest upon administration of RNA vaccines encoding neoepitopes and non-neoepitopes, relative to administration of RNA vaccines encoding neoepitopes or non-neoepitopes individually. B. Materials and Methods 1. Neoepitope and Non-neoepitope selection [0633] Table 7 depicts the neoepitopes and non-neoepitopes encoded on the two RNA vaccines used in the present example. Both neoepitopes and non-neoepitopes were identified and selected based on previous experimental data demonstrating their ability to induce T cell responses in MC38 tumor bearing mouse models. Table 7: Neoepitopes and non-neoepitopes encoded by the two RNA Decatopes. ID Category Gene name Response after RNA vaccination MC38-M111 Neoepitope Aatf CD8 MC38-M016 Neoepitope Adpgk CD8 MC38-M170 Neoepitope Cpne1 CD8 MC38-M002 Neoepitope Irgq CD8 Page 152 of 214 12608199v1
Attorney Docket No. 2013237-1122 MC38-M020 Neoepitope N4bp2l2 CD8 MC38-M010 Neoepitope Reps1 CD8 MC38-M086 Neoepitope Rpl18 CD8 MC38-M143 Neoepitope Spire1 CD8 MC38_CD4_Deca_4-03 Neoepitope Tmtc3 CD4 MC38-M47 Neoepitope Car7 CD4 p15e Non-Neoepitope ENV_MLVAV Not determined Luzp4_99-111 Non-Neoepitope Luzp4 Not determined Luzp4_65-74 Non-Neoepitope Luzp4 Not determined Prl2c2_28-50 Non-Neoepitope Prl2c2 Not determined Prl2c2_58-69 Non-Neoepitope Prl2c2 Not determined Prl2c2_100-116 Non-Neoepitope Prl2c2 Not determined Prl2c2_132-141 Non-Neoepitope Prl2c2 Not determined Prl2c2_153-173 Non-Neoepitope Prl2c2 Not determined Prl2c2_192-200 Non-Neoepitope Prl2c2 Not determined Prl2c2_210-219 Non-Neoepitope Prl2c2 Not determined 2. Construct Backbone [0634] The RNA vaccines used in the present example included the following genetic elements, from 5’ to 3’: 5’ cap (CleanCap413), 5’ UTR derived from the human alpha-globin (hAg) gene, a modified Kozak sequence, mmsec secretory peptide coding sequence, ten epitope-encoding sequences each separated by linker sequences, an MHC class I trafficking peptide coding sequence (MITD), a 3’ UTR resulting from a combination of two sequence elements (FI element) derived from the "amino terminal enhancer of split" (AES) mRNA (called F) and the mitochondrial encoded 12S ribosomal RNA (called I), and a poly(A) tail (Table 8). Two types of poly(A) tail were tested, including a poly(A) tail of 50 consecutive A residues, and a ‘split’ poly(A) tail of 100 A residues separated by a ten nucleotide linker. All non-neoepitope RNA vaccines included a sequence encoding a control peptide P2P16 immediately upstream of the MITD sequence. In all cases, constructs were codon and GC optimized. Table 8. Description of the included RNAs. Group Construct Plasmid 5’ Cap 3’ Antigen RNA RNA Synthesis No. Backbone Poly(A) Format Process 1 N/A N/A N/A N/A N/A N/A N/A 1 N/A N/A N/A N/A N/A N/A N/A Page 153 of 214 12608199v1
Attorney Docket No. 2013237-1122 Group Construct Plasmid 5’ Cap 3’ Antigen RNA RNA Synthesis No. Backbone Poly(A) Format Process 2 A45 pIEmm403 CC413 A30L70 Neo uRNA Process A 2 A134 pIEmm406 CC413 A30L70 Non- uRNA Process A Neo 3 A45 pIEmm403 CC413 A30L70 Neo uRNA Process A 3 A134 pIEmm406 CC413 A30L70 Non- uRNA Process A Neo 4 A45 pIEmm403 CC413 A30L70 Neo m1Ψ Process A 4 A020 pIEmm601 CC413 polyA50 None m1Ψ Process B 5 A134 pIEmm406 CC413 A30L70 Non- m1Ψ Process A neo 5 A020 pIEmm601 CC413 polyA50 None m1Ψ Process B 6 A45 pIEmm403 CC413 A30L70 Neo m1Ψ Process A 6 A134 pIEmm406 CC413 A30L70 Non- m1Ψ Process A neo 3. In vitro transcription [0635] Vaccine RNAs were synthesized via IVT using either Process A or Process B and either uridine or m1Ψ to generate uRNA or modRNA (see Table 8). Briefly, the Process B is a IVT reaction using a G/U Fed Batch at 42°C for 105 min. The Process A is a IVT reaction using a G/U Fed Batch at 42°C for 180 min. [0636] In vitro transcribed RNA were purified via magnetic bead purification and quantified using ultraviolet absorption spectroscopy via Nanodrop. In all cases, the RNA vaccines were mixed at a 1:1 w/w ratio at a final concentration of 0.11-0.12 mg/ml, and used for subsequent LNP and LPX production. 4. LNP and LPX manufacturing [0637] The LPX RNA vaccines of Group 2 were diluted to a desired concentration with 10 mM HEPES, mixed together in a 1:1 ratio and formulated to RNA-LPX through consecutive addition of water, 10 mM HEPES/0.1 mM EDTA, 1.5 M NaCl, and L2 liposomes in 5 mM acetic acid. Page 154 of 214 12608199v1
Attorney Docket No. 2013237-1122 The LNP RNA vaccines of Groups 3-6 were produced using a two-step manufacturing process. Briefly, pre-formed LNPs (pre-LNPs) were prepared by mixing of an organic phase containing dissolved lipids (lipid mixture (80.0 mM total concentration) composed of the cationically ionizable lipid ALC-0315, cholesterol, DSPC, and the grafted lipid ALC-0159 at a molar ratio of 47.5:40.7:10:1.8, dissolved in ethanol), with an aqueous phase (5 mM acetic acid), at a volume ratio of 1:3 (organic: aqueous), and the organic solvent in the obtained raw colloid nanoparticles was subsequently removed by tangential flow filtration (TFF) against 5 mM acetic acid. The pre-LNPs were then diluted with a storage matrix to a sucrose concentration of 10% w/v. RNA- LNPs were prepared by mixing the pre-LNP phase with an RNA phase (0.25mg/mL provided in 10 mM HEPES, 0.1 mM EDTA, pH 7), in a flow rate ratio of 1:1. The obtained RNA-LNPs were further processed by dilution with a storage matrix (60 mM HEPES, 3 mM Tris, 30% sucrose (w/v), pH 6.3) to a target pH of 5.3 and a target RNA concentration of 0.1 mg/mL in the final formulation. RNA-LNPs were stored in 16 mM Hepes, 0.6 mM Tris, 0.04 mM EDTA, 2 mM Acetic Acid, and 10% sucrose. 5. Treatment schedule, dose, and route of administration [0638] C57BL/6 mice were delivered at the age of at least eight weeks. Delivered mice were used for experiments after at least one week of acclimatization. A total of 96 female mice were used in the experiments, all of which were 13 weeks of age at study start. Mice were separated into 6 groups and vaccinated as shown in FIG.10. Briefly, group 2 received three doses at weekly intervals (20 µg/dose, 100 uL total volume, administered i.v.), each of which included LPX-formulated uRNA encoding neoepitopes and non-neoepitopes. Group 3 received three doses at triweekly intervals (2 µg/dose, 20 uL total volume, administered i.m into the musculus gastrocnemius), each of which encoded LNP formulated uRNA encoding neoepitopes and non- neoepitopes. Groups 4-6 received three 2 ug doses administered i.m. at triweekly intervals (2 µg/dose, 20 uL total volume, administered i.m into the musculus gastrocnemius), each of which included modRNA encoding either neoepitopes and irrelevant peptides (group 4), non- neoepitopes and irrelevant peptides (group 5), or neoepitopes and non-neoepitopes (group 6). Group 1 mice were left untreated and served as a control. Page 155 of 214 12608199v1
Attorney Docket No. 2013237-1122 6. Tumor cell inoculation [0639] Animals were shaved at the injection site. Anesthesia was induced through inhalation of 2.5% isoflurane in oxygen. A suspension of 5x105 tumor cells (diluted in PBS) was applied subcutaneously (s.c.) into the right flank in a volume of 100 µL. After tumor cell inoculation and a short recovery phase from anesthesia, the animals were observed for any immediate signs of discomfort. 7. Tumor volumes [0640] Subcutaneous tumor growth was monitored by calculating the tumor volume (TV) over time. To this end, the largest diameter (length) and the perpendicular diameter (width) were measured every two to four days. TVs were calculated according to the following formula presuming that the tumors are idealized ellipsoids: TV = [(width)2 x (length)]/2 = mm3 (width < length). Group median TVs were calculated on each measurement day based on TVs from live animals. Additionally, TVs of animals euthanized due to tumor load were included as long as at least five mice remained in the group, according to the Last Observation Carried Forward principle. Anti-tumor efficacy was evaluated as test over control (T/C) value based on group median TV in test and control group on the last day where at least 50% animals were still alive in all groups: T/C = [TVtest]/[TVcontrol] x 100%. 8. Tetramer generation & Flow Cytometry [0641] For customized tetramer production, peptide receptive H2-Kb or H2-Db easYmers (Immunaware) were diluted in PBS to reach a concentration of 500 nM. After adding the respective peptides in a final concentration of 3 µM the monomers were incubated for 48 h at 18ºC. Next, streptavidin-fluorochrome conjugates were added in three consecutive steps to reach a final concentration of 125 nM. In a first step one third of the total streptavidin-fluorochrome volume was added to the peptide loaded easYmers. The solution was mixed and incubated in the dark for 10 min at 4ºC. Another third of the total streptavidin-fluorochrome volume was added followed by 10 min incubation in the dark at 4ºC. After adding the last third of the total volume, the solution was incubated for another 30 min at 4ºC in the dark. The tetramers were stored at 4ºC until used for flow cytometry staining. Page 156 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0642] Immunophenotyping was performed using blood sampled from the vena facialis. Briefly, 50 µL whole blood were dispensed to a 96-well U bottom plate and 100 µL PBS was added. After centrifugation (3 min, 460 x g, 2–8°C), the supernatants were carefully aspirated, and the cell pellet resuspended in 200 µl Ammonium-Chloride-Potassium (ACK) lysis buffer (erythrocyte lysis). The cell suspension was incubated for 5 min at room temperature, centrifuged (3 min, 460 x g, 2–8°C) and the supernatant was carefully aspirated. After repeating the erythrocyte lysis steps, the cells were washed once with 200 µL Fluorescence Activated Cell Sorting (FACS) buffer (PBS supplemented with 5 mM EDTA and 5% fetal bovine serum [FBS]) and were subsequently stained with antibodies (CD45, CD90.2, CD44, CD4, CD18, CD8a, CD62L, PD1, ICOS, CD127, CD25, KLRG1) and tetramers (Spire1, Adpgk, Rpl18, N4bp2l2 [Pool 1]) in 40 µL FACS buffer for 30 minutes at 2-8°C in the dark. After washing once with 200 µL PBS (3 min, 460 x g, 2–8°C), cells were incubated with 40 µL tetramers (Irgq, Reps1, p15e, Luzp4 [Pool 2]) in 40 µl FACS buffer for 30 min at 2–8°C. After an additional washing step with 200 µL PBS (3 min, 460 x g, 2–8°C), cells were resuspended thoroughly in 100 µL Fix/Perm buffer (Foxp3/Transcription Factor Staining Buffer Set) and incubated for 30 minutes at 2–8°C. After washing twice with 200 µL Perm/Wash buffer (Foxp3/Transcription Factor Staining Buffer Set; 3 min, 460 x g, 2–8°C), cells were stained with antibodies in 50 µL Perm/Wash buffer for 30 min at 2–8°C. After washing twice with 200 µL Perm/Wash buffer (3 min, 460 x g, 2–8°C), cells were resuspended in 200 µL FACS buffer and stored at 2–8°C until measurement. Data were acquired on a BD FACSymphony flow cytometer and analyzed with FlowJo software version 10.8.1. C. Results 1. Blood immunophenotyping by flow cytometry [0643] Neoepitope-specific T cells were analyzed by flow cytometry seven days after the first, second, and third vaccination, as well as 14 days after the third vaccination (FIG.11). Continuous expansion of neoepitope-specific CD8+ T cells in all treatment groups was observed throughout the study. The strength of T cell response was significantly stronger upon i.m. RNA- LNP treatment compared to i.v. RNA-LPX (Gr. 2). uRNA vaccines encoding both neoepitopes and non-neoepitopes resulted in the strongest neoepitope-specific T-cell response. Page 157 of 214 12608199v1
Attorney Docket No. 2013237-1122 2. Anti-tumoral efficacy [0644] In all cases, treatment with combined RNA vaccines encoding neoepitopes and non- neoepitopes resulted in 100% prophylactic efficacy with no observable tumor growth (FIG.12, Gr. 2, 3, and 6). In contrast, tumor growth was observed in all treatment groups that received RNA vaccines encoding only neoepitopes (Gr. 4) or non-neoepitopes (Gr. 6). These data suggest that co-administration of RNA vaccines encoding both neoepitopes and non-neoepitopes yields an enhanced anti-tumoral efficacy when compared to RNA vaccines encoding either neoepitopes or non-neoepitopes alone. 3. Summary of results [0645] The study observed a continuous expansion of neoepitope-specific CD8+ T cells in all treatment groups, with the strongest T cell response seen upon i.m. RNA-LNP treatment compared to i.v. RNA-LPX. The results also showed that the observed anti-tumoral efficacy of co-administered RNA vaccines encoding both neoepitopes and non-neoepitopes was 100% prophylactic efficacy and no observable tumor growth in all cases, while tumor growth was observed in the groups receiving RNA vaccines encoding only neoepitopes or non-neoepitopes. Example 2: Anti-tumoral efficacy of coadministration of polyepitopic RNA vaccines encoding shared tumor antigen epitopes and individualized neoantigen epitopes in humans A. Summary [0646] The present example describes coadministration of RNA vaccines encoding selected neoantigen epitopes and individualized non-neoantigen epitopes. Such neoantigen epitopes and individualized non-neoantigen epitopes are identified, ranked, and selected from samples isolated from a human subject suffering from cancer. B. Materials and Methods 1. Identification of neoantigen epitope and individualized non-neoantigen epitopes [0647] To identify neoantigen epitopes and non-neoantigen epitopes (i.e., shared tumor antigen epitopes), whole exome sequencing is performed from RNA purified from isolated cancerous and non-cancerous healthy tissue. The resulting exome data is compared to identify genes with Page 158 of 214 12608199v1
Attorney Docket No. 2013237-1122 cancer-specific mutations (i.e., candidate neoantigen epitopes). Non-neoantigens epitopes (i.e., shared tumor antigen epitopes) are initially identified starting with a list of genes that encode validated non-neoantigenic tumor associated antigens. Among this list, non-neoantigens whose sequence exactly matches non-tumor associated antigens are masked in order to avoid autoimmune responses. Further, any neo-antigen found to encode a mutation relative to healthy, non-tumorous tissue (as identified by next generation sequencing) is removed from the list. 2. Neoepitope Ranking and Selection [0648] Neoantigen epitopes are selected for inclusion in vaccines via a modified iCaM pipeline, as summarized in FIG.1. For all identified neoantigens, multiple characteristics are determined (either experimentally and/or computationally predicted), including MHC I and MHC II presentation, SNVs (with subclonality), InDels (Strelka) & Fusions (EasyFuse). Neoantigens are ranked and selected using a Target Selection Algorithm (TSA) as shown in FIG.2 and FIG.3. The TSA is performed iteratively for 16 rounds, incorporating expression data into the MHC presentation models. VAF is used when clonality is not analyzable. The top 10 highest ranking neoepitopes are selected for inclusion in the vaccine. 3. Individualized non-neoantigen epitope ranking and selection [0649] Individualized non-neoantigen epitopes (i.e., individualized shared tumor antigen epitopes) are initially identified by determining whether the non-neoantigen is expressed within the tumor and/or predicting MHC class I and II presentation of epitopes within the non- neoantigen, VAF in tumor, and clonality. Individualized non-neoantigen epitopes are ranked. The top 10 highest ranking individualized non-neoepitopes are selected for inclusion in the vaccine. 4. RNA vaccines and administration [0650] RNA vaccines are constructed that encode selected epitopes, as described in Example 1. IVT mRNAs are formulated for intramuscular delivery in lipid nanoparticles (LNPs) via the two- step manufacturing process (described in Example 1), resulting in modRNA-LNP and uRNA- LNP vaccines (i.e., LNP formulations with mRNA constructs including modified and unmodified nucleotides, respectively). RNA vaccines are administered to human subjects via i.m. or i.v. Page 159 of 214 12608199v1
Attorney Docket No. 2013237-1122 C. Results [0651] RNA vaccines are well tolerated and induce neoepitope and non-neoepitope-specific T- cell responses. Example 3. Detecting Antigen Based Neotherapeutic Target Epitopes A. Summary [0652] The present example describes an exemplary process for detecting antigen based neotherapeutic target epitopes: the “dANTe” pipeline. The dANTe pipeline is modular and may be used, e.g., for predicting, generating, and/or selecting peptide target epitopes (e.g., neoantigen and/or non-neoantigen epitopes) by analyzing next generation sequencing (NGS) data (tumor and germline DNA as well as tumor RNA). [0653] In the present example, DNA sequencing data was used to construct patient-specific germline targets for non-neoantigens and to identify tumor-specific non-synonymous somatic mutations for neoantigen target creation. The tumor-derived RNA data was further used to determine patient-specific gene expression profiles that are used for both neoantigen and non- neoantigen target generation. Based on the non-synonymous somatic mutations identified, the patient-specific mutated peptide sequences (MPSs) were determined, annotated, and scored based on features likely to impact their immunogenic potential. This process resulted in a set of potential neoepitope target candidates. To detect potential non-neoepitope targets, a curated list of potential non-neoantigens was derived from the literature, in combination with analysis of RNA-seq expression data and expression validation. The provided list of potential non- neoantigens was subsequently filtered by analyzing patient-specific information including HLA type, gene expression, and somatic as well as germline mutations. B. Materials and Methods 1. Patient data [0654] Whole-exome sequencing (WES) and RNA sequencing data from 10 BRCA, 10 COAD, 10 HNSC, 10 KIRC, 10 KIRP, 10 LIHC, 10 LUAD, 10 LUSC, 11 OV, 10 PAAD, 10 PRAD, 12 READ, 10 SKCM and 10 STAD patients were obtained from The Cancer Genome Atlas Program (TCGA) for analysis (Table 9). Additionally, data from 16 patients with triple negative breast cancer were obtained for analysis from the TNBC-MERIT trial. (NCT02316457). Page 160 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0655] Specifically, aligned NGS data (BAM) files for 143 available patients was obtained from the TCGA data base, consisting of matched germline DNA, tumor DNA and tumor RNA samples for each patient (single replicate) and converted into raw FASTQ format. In addition, raw NGS data (FASTQ) files for 16 available patients was obtained from the TNBC-MERIT cohort (NCT02316457), consisting of matched germline DNA, tumor DNA and tumor RNA samples for each patient in duplicate. Table 9: TCGA IDs of downloaded and processed TCGA patients. Indication TCGA patient identifiers (IDs) BRCA TCGA-A1-A0SF, TCGA-A1-A0SI, TCGA-A1-A0SQ, TCGA-A2-A04Q, TCGA-A2- A04R, TCGA-A2-A04W, TCGA-A2-A0CK, TCGA-A2-A0CM, TCGA-A2-A0CP, TCGA-C8-A1HF COAD TCGA-4N-A93T, TCGA-4T-AA8H, TCGA-5M-AAT5, TCGA-5M-AAT6, TCGA-5M- AATA, TCGA-5M-AATE, TCGA-A6-2674, TCGA-A6-2686, TCGA-A6-4105, TCGA- CM-5344 HNSC TCGA-4P-AA8J, TCGA-BA-4074, TCGA-BA-4075, TCGA-BA-4076, TCGA-BA- 4077, TCGA-BA-4078, TCGA-BA-5149, TCGA-BA-5151, TCGA-BA-5152, TCGA- BA-5153 KIRC TCGA-3Z-A93Z, TCGA-6D-AA2E, TCGA-A3-A6NI, TCGA-A3-A6NJ, TCGA-A3- A6NL, TCGA-A3-A6NN, TCGA-A3-A8CQ, TCGA-A3-A8OU, TCGA-A3-A8OV, TCGA-A3-A8OW KIRP TCGA-2Z-A9JL, TCGA-5P-A9KC, TCGA-5P-A9KH, TCGA-A4-8630, TCGA-A4- A57E, TCGA-B1-A654, TCGA-B3-8121, TCGA-B9-4115, TCGA-DW-7837, TCGA- G7-6793 LIHC TCGA-2Y-A9GS, TCGA-2Y-A9GT, TCGA-2Y-A9GV, TCGA-2Y-A9GZ, TCGA-2Y- A9H3, TCGA-2Y-A9H5, TCGA-2Y-A9H7, TCGA-2Y-A9H9, TCGA-5R-AAAM, TCGA-ED-A4XI LUAD TCGA-05-4244, TCGA-05-4250, TCGA-05-4382, TCGA-05-4384, TCGA-05-4389, TCGA-05-4390, TCGA-05-4395, TCGA-05-4396, TCGA-05-4397, TCGA-05-4398 LUSC TCGA-21-5782, TCGA-21-5784, TCGA-21-5787, TCGA-22-A5C4, TCGA-33- A5GW, TCGA-34-5231, TCGA-34-5232, TCGA-34-5234, TCGA-34-5240, TCGA- 34-5927 OV TCGA-04-1332, TCGA-04-1362, TCGA-04-1655, TCGA-09-0366, TCGA-09-0369, TCGA-09-1670, TCGA-13-0724, TCGA-13-0765, TCGA-13-1487, TCGA-13-1507, TCGA-13-1509 PAAD TCGA-2J-AAB1, TCGA-2J-AAB4, TCGA-2J-AAB6, TCGA-2J-AAB9, TCGA-2J- AABH, TCGA-2J-AABK, TCGA-2J-AABT, TCGA-3A-A9I7, TCGA-3A-A9IC, TCGA- 3A-A9IH PRAD TCGA-CH-5741, TCGA-CH-5752, TCGA-EJ-7325, TCGA-EJ-7793, TCGA-G9- 6377, TCGA-HC-8257, TCGA-KC-A7FE, TCGA-XJ-A83F, TCGA-XK-AAJT, TCGA- YL- A8SA Page 161 of 214 12608199v1
Attorney Docket No. 2013237-1122 Indication TCGA patient identifiers (IDs) READ TCGA-AF-2687, TCGA-AF-2690, TCGA-AF-2693, TCGA-AF-4110, TCGA-AF- A56K, TCGA-AF-A56L, TCGA-AF-A56N, TCGA-AG-3591, TCGA-AG-3592, TCGA- AG-3902, TCGA-DY-A1DC, TCGA-DY-A1DF SKCM TCGA-BF-AAP1, TCGA-BF-AAP2, TCGA-BF-AAP, TCGA-BF-AAP6, TCGA-D3- A5GT, TCGA-D9-A3Z4, TCGA-EB-A3XB, TCGA-EB-A42Y, TCGA-EB-A44P, TCGA- EB-A6QZ STAD TCGA-3M-AB4, TCGA-B7-A5TI, TCGA-B7-A5TN, TCGA-BR-6563, TCGA-BR- 6705, TCGA-BR-7197, TCGA-BR-7715, TCGA-BR-7723, TCGA-BR-7958, TCGA- BR-6710 Abbreviations: BRCA = Breast invasive carcinoma; COAD = Colon adenocarcinoma; HNSC = Head and neck squamous cell carcinoma; KIRP = Kidney renal papillary cell carcinoma; LIHC = Liver hepatocellular carcinoma; LUAD = Lung adenocarcinoma; LUSC = Lung squamous cell carcinoma; OV = Ovarian serous cystadenocarcinoma; PAAD = Pancreatic adenocarcinoma; PRAD = Prostate adenocarcinoma; READ = Rectum adenocarcinoma; SKCM = Skin Cutaneous Melanoma; STAD = Stomach adenocarcinoma. 2. Computational pipeline and infrastructure [0656] The dANTe pipeline was used to analyze patient data (germline DNA, tumor DNA, and tumor RNA). In total, data from 159 patients were analyzed. For each patient, a list of neo and non-neoepitopes was obtained using the dANTe pipeline. The dANTe pipeline was run on a Linux-based parallel cluster. 3. Processing of sequencing data [0657] DNA raw reads from paired-end FASTQ files were aligned with BWA mem to the human reference genome (hg19) (Li & Durbin. Fast and accurate long-read alignment with Burrows- Wheeler transform. Bioinformatics. 2010;26(5):589-595). Using samtools, the SAM files were sorted and converted to BAM files with subsequent indexing, and duplicated reads were marked. [0658] RNA raw reads from paired-end FASTQ files were aligned to the human reference genome (hg19) using the STAR aligner (Dobin et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15-21). The resulting SAM files were further compressed into BAM files via samtools with subsequent sorting and indexing. [0659] Relative transcript abundance was estimated from RNAseq reads using the Sailfish library (Patro R et al. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat Biotechnol. 2014;32(5):462-464). Page 162 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0660] Replicate availability varied between patients, with most patients having only one replicate per sample (tumor/germline). To ensure consistent processing of all TCGA input data, DNA and RNA FASTQ files where more than one replicate was available were therefore merged prior to alignment. For all TNBC-MERIT patients, two replicates were available which were treated separately in this processing step. 4. HLA calling [0661] HLAHD (Kawaguchi et al. HLA-HD: An accurate HLA typing algorithm for next- generation sequencing data. Hum Mutat. 2017;38(7):788-797) was used to do inline HLA type calling using DNA reads from germline sequences. If there was only one WT replicate available, the first cleaned call pair per locus was chosen. In cases where two replicates were available, the HLA type was generated by comparing the WT alleles and finding the best matching allele pairs with four-digit precision. 5. Mutation calling [0662] SNVs were identified using aligned DNA reads of both tumor and WT samples. InDels were detected on the same base data, using the Strelka algorithm (Saunders et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics. 2012;28(14):1811-1817). Germline variants were computed both in the vicinity of all detected somatic mutations as well as for all exonic regions within the exemplary list of non-neoantigen genes. In both cases, germline variants were called using GATK HaplotypeCaller (McKenna et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297-1303). Gene fusions were detected using the EasyFuse algorithm (Haas et al. STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq. BioRXiv 2014. Doi: 10.1101/120295; Weber et al. Accurate detection of tumor-specific gene fusions reveals strongly immunogenic personal neo-antigens. Nat Biotechnol. 2022;40(8):1276- 1284; Nicorici et al. FusionCatcher: a tool for finding somatic fusion genes in paired-end RNA-sequencing data. BioRXiv 2017, Doi: 10.1101/011650). Somatic mutations and germline variants across all available replicates were combined and checked for uniqueness of the mutations both by ID and by checking the affected coordinates. In case of no collision (i.e., a mutation not occurring at the same locus), the mutation/germline variant was used as is. In case of a collision (i.e., a mutation occurring at the Page 163 of 214 12608199v1
Attorney Docket No. 2013237-1122 same locus), the mutations were checked for equivalence, and if so, mutations were merged; otherwise, both mutations were marked as unusable. Additionally, somatic mutations for potential target neoepitopes were marked as unusable if an InDel mutation was not called in all replicates or if a non-reference wildtype allele was detected for a particular mutation. 6. Peptide-MHC scoring [0663] Presentation scores for peptide and MHC allele combinations were computed using a prediction model that utilized expression and cleavage features. 7. Determination of unusable transcript regions [0664] The following regions were defined as unusable: Transcripts with any called mutation affecting the first codon annotated as translated (even if that is not ATG). Transcript parts in or downstream of any called mutation affecting a splice site. Transcript parts in or downstream of a stop codon that is not annotated as such (i.e., premature TGA, TAG, or TAA). Transcript parts in or downstream of a codon that is annotated as stop codon but is not a stop codon (i.e., not one of TGA, TAG, or TAA). [0665] Unusable transcript regions were excluded from candidate target generation. For non- neoepitope target candidates, an additional constraint was imposed to treat any region downstream of an unusable mutation as an unusable transcript region (see “Mutation Calling” section B(4) herein, above). Gene fusion candidates were not considered for the determination of unusable regions. 8. Determination of untargetable transcript regions [0666] The following regions were defined as untargetable: Downstream region of an InDel causing a shift of the reading frame. Downstream region of a usable mutation individually introducing a stop codon into the sequence. Page 164 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0667] Untargetable transcript regions were not used to generate candidate neoepitope target sequences if a neoepitope target-associated mutation was located within an untargetable sequence. However, in cases where a neoepitope target-associated mutation was located upstream of an untargetable region, the untargetable region was included in the generated neoepitope target candidate sequence. For non-neoepitope detection, for which target generation did not depend on the existence of detected somatic mutations, untargetable regions were treated as unusable. Gene fusion candidates were not considered for the determination of untargetable regions. 9. Target generation neoepitopes [0668] Based on all targetable and usable somatic mutation events, mutated peptide sequences were generated. Only sequences which contained at least one somatic mutation (SNV, InDel, gene fusion) were considered. Targets with multiple mutations whose phase could not be resolved were excluded. For each patient, the peptide-MHC scores were computed using a prediction model including expression and cleavage features. Potential neoepitope targets were ranked based on the resulting MHC I and II presentation scores and expression values. A total of 46 final targets were designated for subsequent assessment by sANTe (described in Example 4). Out of these 46 (i.e., out of 46 total “slots”), up to 5 target slots were filled with InDels having a top-ranked MHC I score, 20 slots were filled by SNV target candidates ranked by MHC I presentation score, and 20 slots were filled by SNV target candidates ranked by MHC II presentation score. If an insufficient number of target candidate InDels was observed, target candidate slots were filled with more SNV candidates. If less than 46 suitable SNVs and InDels were available, remaining slots were filled with fusion genes based on their MHC I presentation prediction score. However, no more than 5 fusion gene targets were allowed in total. If any of the available slots were still not filled, target candidates with an expression value FPKM >0 used to fill the remaining slots. 10. Target generation non-neoepitopes [0669] To generate non-neoepitope targets, the primary input was an exemplary list of genes defined by a gene symbol (hg19) and corresponding UCSC (University of California Santa Cruz) transcript IDs as well as an expression threshold per gene. The gene list was filtered for each Page 165 of 214 12608199v1
Attorney Docket No. 2013237-1122 patient to contain only genes whose expression was above a predefined expression threshold (Table 10). Table 10: Exemplary list of non-neoantigens and expression thresholds used as input to generate non-neoepitope targets. Gene symbol Expression cutoff TPM ACPP 935 ACTL8 20 ANKRD30A 20 BIRC5 20 BRDT 20 CDH17 436 CEACAM5 47525 CLDN6 20 CTAG1A 20 CTAG2 20 CXorf61 20 DEPDC1 20 EDDM3B 20 FLT1 20 GUCY2C 54 HEPHL1 20 HOXB13 1985 IGF2BP1 20 IGF2BP3 20 KDR 20 KIF20A 20 KLK2 2975 KLK3 6837 MAGEA1 20 MAGEA2 20 MAGEA3 20 MAGEA4 20 MAGEA9B 20 MAGEC1 20 MAGEC2 20 MLANA 765 MUC13 2405 NKX3-1 4115 Page 166 of 214 12608199v1
Attorney Docket No. 2013237-1122 Gene symbol Expression cutoff TPM PAEP 20 PAGE4 387 PLAC1 20 PMEL 1955 PRAME 44 TDRD1 20 TP53 20 TPBG 20 TPTE 20 TYR 49 UPK2 219 ZFP42 20 Targets and expression thresholds list were derived from manual literature research combined with analysis of publicly available RNAseq expression data and expression analysis using qRT-PCR(quantitative real-time polymerase chain reaction). [0670] In case of multiple replicates, average expression values were computed for each isoform individually. For all transcripts that passed the expression filter, somatic and germline variants were called and annotated. Based on the detected mutation events and corresponding phasing of all relevant alleles, potential peptide target sequences were derived by generating all possible peptide combinations within a uniquely identifiable sequence space. During this step, all possible target sequences that contained a somatic mutation were discarded. To reduce the risk of including false-positive sequences, all predicted fusion events within potential target genes were flagged and deprioritized for the target selection (FIG.14). [0671] Potential target sequences were generated based on a sliding window approach to allow phasing of alleles (n = 108 windows). Target windows were generated throughout the entire transcript unless there were conflicting events or untargetable regions. In heterozygous cases, two window sequences can be obtained, one with a SNP and one WT sequence. After phasing, overlapping windows were merged back together if no mutation events interfered with the identifiability of the generated sequence. [0672] If there were no targets without fusion events, targets with fusion events were kept for target selection. Potential targets were scored with the respective peptide-allele combination using a prediction model that utilized expression and cleavage features. Page 167 of 214 12608199v1
Attorney Docket No. 2013237-1122 11. Target validation non-neoepitopes [0673] All generated potential targets were matched against the reference transcript sequence (exons). While matching, up to two mismatches of the generated target sequence and the reference sequence were allowed to consider potential SNPs within the generated sequences. 12. Lookup table [0674] To reduce compute at run time, a Lookup table was generated by precomputing peptide- MHC binding scores for peptide-allele combinations derived from WT sequences from all genes within the list of non-neoantigens together with the 100 most frequent alleles occurring in the human population. At run time, the Lookup table was queried, and peptide-MHC binding scores were computed only for peptide-allele combinations that did not have a match in the Lookup table. 13. Linker homology, maximum expression risk, and whitelist [0675] Potential neoepitopes were checked for being full matches to the proteome and flagged if they were found to be full matches to the proteome. Additionally, peptides of potential neoepitopes were matched against a predefined list of tissue-specific expression risk factors for autoimmunity. The same list was used to determine allowed neoepitope-linker combinations, as the transition sequence from neoepitope to linker could contain maximum expression risk peptides. [0676] Potential non-neoepitopes were checked against the proteome excluding a gene-specific, manually curated whitelist of genes with homologous regions. C. Results [0677] The dANTe pipeline was used to analyze individual sets of germline (blood) DNA, tumor DNA, and tumor RNA samples from cancer patients across several indications to generate a set of neo- and non-neoepitope targets (FIG.13). [0678] An overview on the resulting target space is provided in Table 11. The number of potential targets generated for neoepitopes was lower than for non-neoepitopes. Without wishing to be bound by any particular theory, this may be due to the fact that neoepitope targets were derived from sequences in the exonic regions containing somatic mutations only, whereas non- Page 168 of 214 12608199v1
Attorney Docket No. 2013237-1122 neoepitope targets could be derived from suitable WT sequences and were therefore pooled from a significantly larger target space. Table 11: Average number of potential targets for neo- and non-neoepitopes. Cancer indication Potential targets non-neoepitopes Potential targets neoepitopes BRCA 6850 57 COAD 8844 377 HNSC 8084 227 KIRC 4340 59 KIRP 3807 56 TNBC_MERIT 5690 80 LIHC 4175 104 LUAD 7903 461 LUSC 7587 273 OV 4620 52 PAAD 5714 41 PRAD 5652 20 READ 7778 64 SKCM 8776 499 STAD 6869 154 Results represent average potential target numbers across all analyzed patients (n = 159 total). Mean values were rounded to the closest integer. Abbreviations: BRCA = Breast invasive carcinoma; COAD = Colon adenocarcinoma; HNSC = Head and neck squamous cell carcinoma; KIRP = Kidney renal papillary cell carcinoma; LIHC = Liver hepatocellular carcinoma; LUAD = Lung adenocarcinoma; LUSC = Lung squamous cell carcinoma; OV = Ovarian serous cystadenocarcinoma; PAAD = Pancreatic adenocarcinoma; PRAD = Prostate adenocarcinoma; READ = Rectum adenocarcinoma; SKCM = Skin Cutaneous Melanoma; STAD = Stomach adenocarcinoma; TNBC_MERIT = Triple negative breast carcinoma_Mutanome engineered RNA immuno-therapy. 1. Non-neoepitope target processing [0679] A total of 159 patients (Table 9) were processed with the dANTe pipeline. As a first step, the effect of patient-specific tumor mRNA expression filters was analyzed to establish a baseline for the detection of potential non-neoantigens. Across indications analyzed in this example, on average between 26 (LIHC) and 67 (SKCM) transcripts of the dANTe non-neoantigen input list were obtained with above-threshold expression levels (FIG.15 and Table 11). Transcripts Page 169 of 214 12608199v1
Attorney Docket No. 2013237-1122 passing the expression filters were derived from different genes depending on the specific cancer type, highlighting the cancer-specific nature of the present approach (FIG.16). [0680] To derive potential non-neoepitope targets in a patient-specific fashion, germline variants were included whereas somatic mutations as targets were excluded to avoid reducing the potential neoantigen target space. To this end, somatic mutations were detected for the exonic regions within the dANTe non-neoantigen input list. Additionally, germline variants were detected for the same exonic regions. All mutation events (germline and somatic) were then used to extract usable regions for target generation from each transcript. Between 1 and 41 high- confidence mutation events were found to be associated with the dANTe non-neoantigen input list across the full set of patients, of which the majority were germline variants (FIG.17). 2. Non-neoepitope target generation [0681] After patient-specific expression and associated mutation events were analyzed, potential target sequences were generated based on usable sequence regions within each transcript. To this end, each mutation event was investigated based on a set of predefined exclusion criteria as described herein, and each transcript was truncated in case of an event which led to an unusable downstream sequence. Based on the remaining sequence, potential targets were generated using a sliding window approach where each window was checked for somatic mutations, and in case of no somatic mutations, the window was further considered for target generation, as described herein. This approach resulted in a broad range of generated targets with some patients having over 10,000 targets, and other patients having around 1,000 targets (FIG.18). [0682] Analysis across different indications revealed substantial variability in the number of detected targets between different cancer types, with some indications having on average 3,807 targets (KIRP) and others up to 8,844 targets (COAD) (FIG.18). The number of generated targets was significantly correlated with the number of transcripts which remained after expression filtering (FIG.19; R=0.78, p<2.2 x 10-16). For this analysis, only 9-mers were reported as target numbers to omit artificially inflated target numbers arising through overlap between n-mers of different lengths. However, other target lengths were also considered in the final output. [0683] Next, the extent to which individual genes from the list of non-neoantigens contributed to the pool of generated targets was analyzed. Some genes were consistently expressed across all Page 170 of 214 12608199v1
Attorney Docket No. 2013237-1122 indications, such as FLT1, KDR, as well as the oncogene TP53. However, additional genes that contributed to the pool of potential targets varied between different cancer types (FIG.20A, FIG.20B, and FIG.21). For example, targets derived from ANKD30A were mainly generated within TNBC and BRCA patient samples, while targets derived from HEPHL1 were exclusively generated for HNSC and READ patient samples (FIG.20A, FIG.20B, and FIG.21). Overall, the results highlight the diverse expression patterns across cancer types leading to a highly individualized target generation pipeline for non-neoepitopes. [0684] To validate target generation, additional verification was performed by independently matching all generated targets against the exon sequence of the respective transcript (FIG.22). While matching the target sequences to the reference, up to two mismatches were allowed to include germline SNPs that were inserted in the sequence during target creation. All targets were recovered as expected (data not shown). [0685] MHC binding scores for all potential non-neoepitope targets were computed using a prediction model. To reduce computational processing, a Lookup table (described herein) with precomputed MHC binding scores was generated for all possible non-neoepitopes using the 100 most frequent HLA haplotypes together with the WT sequence of all genes in the input list. The lookup approach led to a 93.9% reduction in the number of peptide-allele combinations which need to be computed at run time (FIG.23). The scores from MHC binding (lookup and newly computed) were then used as input for a MHC presentation prediction model. 3. Neoepitope target generation [0686] In addition to non-neoepitope target detection, neoepitope targets were detected using dANTe. After somatic mutation detection, quality controls, and filtering (FIG.24, FIG.25, FIG.26, and FIG.28), between 5 and 3754 neoepitope targets were detected for each patient across all analyzed patients and cancer indications (FIG.27 and Table 12). To focus on the most suitable mutation events, the mutations were further ranked and prioritized based on a predefined set of criteria as described herein, and then the top candidates were chosen for further target selection based on these criteria (FIG.28 and Table 13). The majority of neoepitope targets were found only once across all patients, highlighting the individualized nature of the employed approach (FIG.29). Overall, these targets were derived from a broad target space ranging from 100 to 500 unique target genes per cancer indication (FIG.30). However, known oncogenic Page 171 of 214 12608199v1
Attorney Docket No. 2013237-1122 drivers such as TP53 and KRAS were also part of the discovered target pool as could be expected in an environment with high mutational burden (FIG.31). The number of total detected somatic mutations was highly patient-specific (FIG.32 and FIG.33). It was observed that within two patients, more than 1500 InDels were detected, while for the majority of analyzed patients fewer than 100 InDels were detected (FIG.33 and Table 12). [0687] Overall, the developed dANTe pipeline provided sets of potential neo- and non- neoepitope targets. Table 12: Number of detected somatic mutation events across all cancer indications. Total somatic mutations No. SNV No. InDel mean min max mean min max mean min max BRCA 64 16 156 57 13 152 7 2 16 COAD 817 45 3754 377 22 1912 441 11 2620 HNSC 244 41 539 227 37 496 17 4 43 KIRC 80 21 198 59 15 148 21 5 50 KIRP 73 39 127 56 30 98 17 6 29 LIHC 120 82 171 104 67 162 16 9 28 LUAD 489 56 1624 461 39 1589 27 8 68 LUSC 293 121 725 273 118 676 20 3 49 OV 86 35 331 52 30 80 34 2 271 PAAD 44 22 72 41 21 67 3 0 7 PRAD 24 5 47 20 3 37 5 1 10 READ 89 35 195 64 16 163 25 15 47 SKCM 505 24 1490 499 23 1481 6 1 10 STAD 207 60 926 154 58 527 52 2 399 TNBC- 86 12 196 80 12 182 6 0 26 MERIT N = 10-16 patients per indication. Mean values were rounded to the closest integer. Abbreviations: BRCA = Breast invasive carcinoma; COAD = Colon adenocarcinoma; HNSC = Head and neck squamous cell carcinoma; KIRP = Kidney renal papillary cell carcinoma; LIHC = Liver hepatocellular carcinoma; LUAD = Lung adenocarcinoma; LUSC = Lung squamous cell carcinoma; OV = Ovarian serous cystadenocarcinoma; PAAD = Pancreatic adenocarcinoma; PRAD = Prostate adenocarcinoma; READ = Rectum adenocarcinoma; SKCM = Skin Cutaneous Melanoma; STAD = Stomach adenocarcinoma; TNBC_MERIT = Triple negative breast carcinoma_Mutanome engineered RNA immuno-therapy. Page 172 of 214 12608199v1
Attorney Docket No. 2013237-1122 Table 13: Number of selected potential neoepitope targets across all cancer indications after application of prioritization thresholds. Selected neoepitope Selected Selected targets InDels fusions mean min max mean min max mean min max BRCA 23 8 46 1 0 5 1 0 3 COAD 33 12 46 2 0 5 0 0 2 HNSC 42 21 46 3 0 5 0 0 3 KIRC 28 6 46 4 0 5 0 0 2 KIRP 28 12 46 3 1 5 0 0 3 LIHC 37 29 46 3 1 5 0 0 1 LUAD 44 27 46 4 1 5 0 0 1 LUSC 46 45 46 3 0 5 0 0 1 OV 27 14 42 2 0 4 2 0 4 PAAD 20 8 34 1 0 3 0 0 2 PRAD 11 4 21 1 0 4 2 0 3 READ 22 4 46 2 0 3 0 0 3 SKCM 39 7 46 1 0 3 0 0 1 STAD 42 27 46 2 0 5 0 0 1 TNBC- 32 10 46 2 0 5 1 0 4 MERIT Max. 46 somatic mutations are allowed in total which are chosen based on mutation type as well as quantitative selection criteria. N=10-16 patients per indication. Mean values were rounded to the closest integer. Abbreviations: BRCA = Breast invasive carcinoma; COAD = Colon adenocarcinoma; HNSC = Head and neck squamous cell carcinoma; KIRP = Kidney renal papillary cell carcinoma; LIHC = Liver hepatocellular carcinoma; LUAD = Lung adenocarcinoma; LUSC = Lung squamous cell carcinoma; OV = Ovarian serous cystadenocarcinoma; PAAD = Pancreatic adenocarcinoma; PRAD = Prostate adenocarcinoma; READ = Rectum adenocarcinoma; SKCM = Skin Cutaneous Melanoma; STAD = Stomach adenocarcinoma; TNBC_MERIT = Triple negative breast carcinoma_Mutanome engineered RNA immuno-therapy. D. Conclusions [0688] This example demonstrates use of a modular bioinformatics workflow, the dANTe pipeline, which was utilized to generate a list of neoepitope and non-neoepitope targets based on a list of non-neoantigens and input data consisting of matched tumor and WT DNA as well as Page 173 of 214 12608199v1
Attorney Docket No. 2013237-1122 tumor RNA. Several factors such as patient-specific HLA and mutation calling as well as information from RNA expression data were used to filter and prioritize potential targets. The resulting target space is highly heterogenous and tailored towards each analyzed patient. [0689] The results in this example demonstrate that the modular dANTe pipeline can provide patient-specific neo- and non-neoepitope target candidates across a diverse range of cancer types. Example 4: Example Process for Selecting Neotherapeutic Target Epitopes (“sANTe”) A. Summary [0690] This Example describes an exemplary process for selecting particular neoepitope targets for inclusion in a construct, such as an RNA therapeutic, in accordance with certain embodiments of the present disclosure. The particular implementation of the present Example is referred to, for brevity, as “sANTe”, standing for “Selecting Antigen-Based Nontherapeutic Target Epitopes.” This Example sets out an example framework for utilizing non-neoantigens alongside neoantigens in an individualized therapy. [0691] As described in further detail herein, e.g., in Example 3, and illustrated in FIG.34, the non-neoantigen selection approach of the present example was designed to follow a detection pipeline that computed an exhaustive list of candidates based on predefined non-neoantigens and including peptide-major histocompatibility complex (MHC) predictions. The detection pipeline detects an initial set of candidates by identifying antigens determined to exceed a predefined antigen-wise expression threshold. Following detection of candidates, the sANTe procedure described herein selects a set of target epitopes and arranges them (e.g., into clusters) for inclusion in a non-neoepitope cassette of a cancer vaccine. [0692] In brief, the sANTe method selects target epitopes by arranging the initial candidates into batches and tiers by classifying various candidates according to whether they match “direct hit” epitopes, known epitopes, known HLA ligands, or predicted HLA ligands and evaluating criteria including an expression fold change metric and peptide-MHC binding predictions. After an identifying an initial set of target epitopes, a clustering was applied to extend sequences of the initial targets hit by additional overlapping ligands within the same range of assumed HLA presentation (e.g., thereby creating “target clusters”), as well as to combine overlapping Page 174 of 214 12608199v1
Attorney Docket No. 2013237-1122 downstream target clusters (e.g., thereby creating, “combined target clusters”). Final target clusters were selected considering a certain degree of antigen diversity. [0693] The selection procedure (sANTe) was evaluated using data derived from The Cancer Genome Atlas (TCGA) and the BioNTech TNBC-MERIT cohort (NCT02316457), comprising 159 patients representing 15 different cancer indications. [0694] It was found that the highest number of antigens were available for skin cutaneous melanoma (SKCM). Final target clusters selected by the sANTe procedure were derived from a wide variety of antigens. Target clusters from differentiation antigens with evidence for immunogenicity were most often selected and prioritized over those without evidence or with known immunogenicity, but a broader tissue expression. It was observed that for the two highest ranked categories used in the selection method (tiers 1 and 2, corresponding to antigens with highest evidence for immunogenicity/evidence in tumor cell killing and cancer-testis/tumor specific antigens with known immunogenicity/differentiation antigens with strong immunogenicity, respectively) fewer antigens passed the expression threshold than for other categories (e.g., lower tiers) in most patients. The median number of different selected antigens ranged from four to six using a clusters batch approach, which applied a patient-specific threshold on the number of target clusters from the same antigen and combined overlapping target clusters within the same batch. [0695] Accordingly, among other things, these results show that the selection procedure described herein (i.e., the sANTe method) can successfully be applied to all the 159 tested patients and that for each of those patients, a sufficient number of target clusters can be identified to create a non-neoepitope cancer vaccine cassette. B. Materials and Methods 1. Study Design [0696] Data was obtained from The Cancer Genome Atlas (TCGA) for patient samples associated with a variety of different cancer types, as follows (Table 14 lists the particular TCGA identifies for samples used for each cancer indication):10 breast cancer (BRCA) samples, 10 colon adenocarcinoma (COAD) samples, 10 Head and neck squamous cell carcinoma (HNSC) patients, 10 Kidney renal clear cell carcinoma (KIRC) patients, 10 Kidney renal papillary cell Page 175 of 214 12608199v1
Attorney Docket No. 2013237-1122 carcinoma (KIRP) patients, 10 Liver hepatocellular carcinoma (LIHC) patients, 10 Lung adenocarcinoma (LUAD) patients, 10 Lung squamous cell carcinoma (LUSC) patients, 11 ovarian cancer (OV) patients, 10 Pancreatic adenocarcinoma (PAAD) patients, 10 Prostate adenocarcinoma (PRAD) patients, 12 Rectum adenocarcinoma (READ) patients, 10 Skin cutaneous melanoma (SKCM) patients, and 10 Stomach adenocarcinoma (STAD) patients. Table 14: TCGA IDs of downloaded and processed TCGA patients. Indication TCGA patient identifiers (IDs) Number of patients BRCA TCGA-A1-A0SF, TCGA-A1-A0SI, TCGA-A1-A0SQ, TCGA-A2- 10 A04Q, TCGA- A2-A04R, TCGA-A2-A04W, TCGA-A2-A0CK, TCGA-A2-A0CM, TCGA-A2- A0CP, TCGA-C8-A1HF COAD TCGA-4N-A93T, TCGA-4T-AA8H, TCGA-5M-AAT5, TCGA- 10 5M-AAT6, TCGA- 5M-AATA, TCGA-5M-AATE, TCGA-A6- 2674, TCGA-A6-2686, TCGA-A6- 4105, TCGA-CM-5344 HNSC TCGA-4P-AA8J, TCGA-BA-4074, TCGA-BA-4075, TCGA-BA- 10 4076, TCGA- BA-4077, TCGA-BA-4078, TCGA-BA-5149, TCGA-BA-5151, TCGA-BA-5152, TCGA-BA-5153 KIRC TCGA-3Z-A93Z, TCGA-6D-AA2E, TCGA-A3-A6NI, TCGA- 10 A3-A6NJ, TCGA- A3-A6NL, TCGA-A3-A6NN, TCGA-A3- A8CQ, TCGA-A3-A8OU, TCGA-A3- A8OV, TCGA-A3-A8OW KIRP TCGA-2Z-A9JL, TCGA-5P-A9KC, TCGA-5P-A9KH, TCGA-A4- 10 8630, TCGA- A4-A57E, TCGA-B1-A654, TCGA-B3-8121, TCGA- B9-4115, TCGA-DW-7837, TCGA-G7-6793 LIHC TCGA-2Y-A9GS, TCGA-2Y-A9GT, TCGA-2Y-A9GV, TCGA- 10 2Y-A9GZ, TCGA- 2Y-A9H3, TCGA-2Y-A9H5, TCGA-2Y-A9H7, TCGA-2Y-A9H9, TCGA-5R- AAAM, TCGA-ED-A4XI LUAD TCGA-05-4244, TCGA-05-4250, TCGA-05-4382, TCGA-05- 10 4384, TCGA-05- 4389, TCGA-05-4390, TCGA-05-4395, TCGA- 05-4396, TCGA-05-4397, TCGA-05-4398 LUSC TCGA-21-5782, TCGA-21-5784, TCGA-21-5787, TCGA-22- 10 A5C4, TCGA-33- A5GW, TCGA-34-5231, TCGA-34-5232, TCGA-34-5234, TCGA-34-5240, TCGA-34-5927 OV TCGA-04-1332, TCGA-04-1362, TCGA-04-1655, TCGA-09- 11 0366, TCGA-09- 0369, TCGA-09-1670, TCGA-13-0724, TCGA- 13-0765, TCGA-13-1487, TCGA-13-1507, TCGA-13-1509 PAAD TCGA-2J-AAB1, TCGA-2J-AAB4, TCGA-2J-AAB6, TCGA-2J- 10 AAB9, TCGA- 2J-AABH, TCGA-2J-AABK, TCGA-2J-AABT, TCGA-3A-A9I7, TCGA-3A-A9IC, TCGA-3A-A9IH Page 176 of 214 12608199v1
Attorney Docket No. 2013237-1122 Indication TCGA patient identifiers (IDs) Number of patients PRAD TCGA-CH-5741, TCGA-CH-5752, TCGA-EJ-7325, TCGA-EJ- 10 7793, TCGA-G9- 6377, TCGA-HC-8257, TCGA-KC-A7FE, TCGA- XJ-A83F, TCGA-XK-AAJT, TCGA-YL-A8SA READ TCGA-AF-2687, TCGA-AF-2690, TCGA-AF-2693, TCGA-AF- 12 4110, TCGA-AF- A56K, TCGA-AF-A56L, TCGA-AF-A56N, TCGA-AG-3591, TCGA-AG-3592, TCGA-AG-3902, TCGA-DY- A1DC, TCGA-DY-A1DF SKCM TCGA-BF-AAP1, TCGA-BF-AAP2, TCGA-BF-AAP, TCGA-BF- 10 AAP6, TCGA- D3-A5GT, TCGA-D9-A3Z4, TCGA-EB-A3XB, TCGA-EB-A42Y, TCGA-EB- A44P, TCGA-EB-A6QZ STAD TCGA-3M-AB4, TCGA-B7-A5TI, TCGA-B7-A5TN, TCGA-BR- 10 6563, TCGA-BR- 6705, TCGA-BR-7197, TCGA-BR-7715, TCGA- BR-7723, TCGA-BR-7958, TCGA-BR-6710 [0697] Data from 16 patients with triple negative breast cancer (TNBC) from the TNBC-MERIT study (ClinicalTrials.gov identifier NCT02316457, see also, e.g., M. Schmidt et al. “88MO T- cell responses induced by an individualized neoantigen specific immune therapy in post (neo)adjuvant patients with triple negative breast cancer” 2020) were also analyzed. [0698] Table 15 lists a summarizes the formats and types of patient data used. Table 15. Summary of patient data. Identification Description TNBC-MERIT Raw NGS data (FASTQ) files for 16 available patients from the TNBC- data MERIT cohort, consisting of matched normal DNA, tumor DNA and tumor RNA samples for each patient in duplicate. TCGA data Aligned NGS data (BAM) files for 143 available patients from the TCGA data base, consisting of matched normal DNA, tumor DNA and tumor RNA samples for each patient (single replicate) were downloaded and converted into raw FASTQ format using samtools fastq. [0699] Turning to FIG.34, non-neoepitope candidates were obtained via a detection pipeline that identified and evaluated peptide-major histocompatibility complex (MHC) binding and presentation predictions on sets of germline (blood) DNA, tumor DNA, and tumor RNA samples. The detection pipeline output lists of neoepitope candidates and non-neoepitope candidates, which were used as input to two selection procedures, one for neoantigens, and another used to Page 177 of 214 12608199v1
Attorney Docket No. 2013237-1122 select non-neoantigens, respectively. The procedure for selection of non-neoantigens, which is the focus of this example, and is described in further detail below. The sANTe procedure divided the set of candidates into different batches based on whether a given candidate was identified as matching a known epitope or known ligand or based on its peptide-MHC presentation prediction score. Candidates were also subdivided into tiers as shown in Table 16, based on (e.g., previous) vaccine trial results, and each batch was associated with up to two tiers as shown in FIG.35. Table 16: Description of tiers used to classify non-neoepitope antigen candidates based on vaccine trial results. Tier Description 1 Highest evidence for immunogenicity and safety in clinical trials as well as evidence for tumor cell killing 2 Cancer-testis/tumor specific antigens with evidence for immunogenicity and differentiation antigens with observed strong immunogenicity 3 Differentiation antigens with evidence for immunogenicity 4 Cancer-testis/tumor specific/differentiation antigens without evidence for immunogenicity 5 Tumor associated antigens with evidence for immunogenicity but broader normal tissue expression pattern [0700] Within each tier, candidates were additionally split into two expression categories based on a median of an expression fold change metric. In particular, an expression fold change value was computed for each particular gene within a patient sample by dividing the measured expression of the particular gene within the patient sample by the antigen-wise expression threshold for (e.g., used for detecting) the gene. Since only genes whose expression exceeded the expression threshold from the dANTe pipeline (described in Example 3) were detected and passed to the sANTe pipeline, the expression fold change was always greater than 1. Additional categories were also defined. Candidates were also ranked according to their peptide-MHC prediction in each expression category. Batches of known epitopes ranked “direct hits” (i.e., immunogenic epitopes with highest evidence of immunogenicity) first, followed by other known immunogenic epitopes (collectively referred to as “known epitopes,” for brevity). In case of batches that include information on known epitopes or ligands, the respective pre-defined categories are additionally taken into consideration for ranking of targets. Page 178 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0701] Turning to FIG.36, candidates selected as target cores were extended by including additional ligands that were predicted to be presented and passed a peptide-MHC binding prediction threshold of their respective batch, provided that the resultant target cluster did not exceed a predefined length (40 amino acids, including flanks, in this example). The thresholds of batches that take into account known epitopes or ligands are set to those of the subsequent prediction batch. Finally, up to three flanking amino acids were added up- and downstream of the target. [0702] Resulting targets or target clusters from the same transcript were inspected for at least one overlapping amino acid (AA), since targets from the same transcript can derive from immediate neighboring positions, potentially leading to duplicated target substrings. If two targets or target clusters were found to overlap, they were combined, provided that the resultant combined target cluster was no more than a maximum of 40 amino acids in length. The possible minimum length of a target was 11 amino acids, as 8 amino acids was the minimum length of an epitope. If a target core was located at a start or end of a transcript, only one flank with a length of three amino acids was added, producing an 11 amino acid long target core. [0703] As illustrated in FIG.36, targets comprised a core epitope along with flanks added on each side of the core. This lead to target lengths ranging from 11 to 18 amino acids (AAs) for MHC I ligands and 15 to 30 amino acids for MHC II ligands. To create target clusters, the core was extended by including overlapping ligands that were predicted to be presented (e.g., extensions). Combined target clusters were also created from a minimum of two targets or target clusters that overlapped by at least 1 AA. The maximum (combined) target cluster length was fixed at 40 AA. [0704] The sANTe approach also accounted for final cluster size requirements associated with a particular cassette design. In particular, first, the allowed insert size excluding cloning sites was 1,282 base pairs. Second, the particular cassette design required a minimum number of target clusters of eight and a maximum of 18 due to the availability of linkers. A minimum three different target clusters were included. If there were less than eight and at least three available unequal target clusters with different target cores, they were duplicated according to (e.g., in the order of) their rankings until a target cluster number of eight was reached. Page 179 of 214 12608199v1
Attorney Docket No. 2013237-1122 2. Inputs [0705] The sANTe method took two tables as input. One, shown in Table 17, was an input table comprising peptide information for each target candidate, including predicted MHC binding and presentation percentile ranks for the relevant peptide-allele combinations. Another, second table, shown in Table 18, included protein sequences for the usable and targetable transcript region. Table 17: Contents of a first input data table comprising peptide information. Column name Description Example value dnaseq DNA sequence of the peptide ATGTCGCAAGGGATCCTTT CT CCG peptide_offset Starting position of the peptide within the 0 sequence window window_id Identification value of the sequence uc003lcj.3_0 window in the format “transcript_id”_”fragment_number” transcript_id Identification value of the transcript uc003lcj.3 from which the sequence window derives gene_symbol Gene nomenclature of the gene from KIF20A which the transcript derives best_position The starting position of the respective 0 peptide with the longest minimum flanking sequence for the respective transcript. Can differ from “peptide_offset” if the same peptide appears in different windows peptide Amino acid sequence of the peptide MSQGILSP peptide_len Length of the peptide 8 expression Expression value in TPM of the gene 8.654610 from which the transcript derives allele HLA allele that is used to predict the A0201 HLA binding of the allele-peptide pair binding_pctrnk Predicted percentile rank for HLA 60.0565 binding peptide_id Unique identification value for the uc003lcj.3_0_0_8 peptide in the format “window_id”_”peptide_offset”_”peptide_ len” Page 180 of 214 12608199v1
Attorney Docket No. 2013237-1122 Column name Description Example value cutoff_score Cut-off value for the binding percentile 63.0565 rank. This rank is based on the best scoring peptide-allele pair for the respective peptide using its peptide_id. The cut-off value cannot be lower than the best scoring peptide-allele pair mhc_class MHC class of the respective HLA allele 1 (either 1 or 2) ExprCutoffTpm Gene specific expression cut-off in 2.0 transcirpts per million (TPM). Transcripts of genes that are not passing this expression cut-off get discarded ImmunogenicityPriority Manually curated categorization of tier5 genes into different groups of expected immunogenicity hit_in_proteome Boolean value describing whether the False peptide appears already in the proteome (excluding a gene specific whitelist) presentation_pctrank Predicted presentation percentile rank 1.7455 flank_hit_in_proteome Boolean value describing whether a False peptide within 3 positions upstream or downstream is hit in proteome Table 18: Contents of a second input table comprising transcript sequence information. Column name Description Example value window_id Identification value of the sequence uc003pjo.3_0 window in the format “transcript_id”_”fragment_number” window_sequence_dna DNA sequence of the transcript ATGCCTGGGGGGTGC… fragment window_sequence_protein Amino acid sequence of the transcript MPGGCSRGP… fragment [0706] Presentation scores represented binding percentiles. Table 19 and Table 20 show the corresponding percentiles of the presentation scores to the binding percentiles that reflect thresholds for assumed strong (0.5), intermediate (1.0), and weak (2.0) predicted binders. Page 181 of 214 12608199v1
Attorney Docket No. 2013237-1122 Table 19: Binding percentiles and the corresponding presentation percentiles for MHC I. MHC I binding MHC I presentation (the presentation value that corresponds to the binding value) 0.5 0.03 (exactly to: 0.028217) 1.0 0.05 (exact 0.046814) 2.0 0.1 (exactly 0.099853) Table 20: Binding percentiles and the corresponding presentation percentiles for MHC II. MHC II binding MHC II presentation (the presentation value that corresponds to the binding value) 0.5 0.05 (exactly to: 4.999750e-02) 1.0 0.1 (exact 1.079351e-01) 2.0 0.3 (exactly 2.654550e-01) 3. Immune response and presentation data [0707] The sANTe method utilized data from known T-cell epitopes and ligands. The list of known epitopes was derived from internal studies as well as publicly available information from the immune epitope database (IEDB) (Vita et al. The Immune Epitope Database (IEDB): 2018 update. Nucleic Acids Research. 2019;47(D1):D339-D343). The known epitope data was split into (i) “direct hit” epitopes and (ii) other known epitopes. “Direct hit” epitopes were classified as such based on experiments with positive ex-vivo ELISpot and multimer data or T-cell receptor data and included only epitopes with defined minimal epitope-allele pairs and highest level of evidence based on ELISpot and T-cell receptor data. The second category of other known epitopes primarily comprised IEDB epitopes with evidence of epitope specific T-cell killing and experimental response data, also with high level of evidence. FIG.37 illustrates the selection approach for these, ‘other known’ epitopes. Based on the exact level of evidence, epitope-allele pairs were additionally ranked within each category. If certain candidates were repeatedly found to not be immunogenic in tests, they were added to an epitope exclusion list. Data on known ligands was derived from internal and public mass spectrometry data. The internal data used was a collection of monoallelic MHC class I (MHC I) and MHC class II (MHC II) ligands. Mono- and multiallelic data from public studies also contributed to the dataset of known ligands. Monoallelic data was filtered to retain only ligand-allele pairs with a percentile rank smaller than Page 182 of 214 12608199v1
Attorney Docket No. 2013237-1122 or equal to 2. For the multiallelic data, ligand-allele pairs were included if a best predicted binder-allele pair only had a percentile rank smaller than or equal to 0.5, the second-best predicted binder had a difference of at least 1 to the best predicted binder and the ligand-allele combination appeared at least twice in the data. Additionally, ligand-allele pairs that overlapped with the monoallelic dataset were not listed again. Ligand data was split into ranked categories, with monoallelic data preferred over multiallelic data and depending on the observed frequency of ligand-allele pairs. 4. Processing target candidates from peptide-MHC prediction data [0708] Target candidates were further filtered based on information provided by the dANTe pipeline (described in Example 3). [0709] A proteome check was performed to determine whether any 8-mer of a target candidate, with or without including three up- and downstream flanking positions, appeared in the proteome (excluding a gene specific whitelist that contains manually curated homologous sequences including sequences from annotated pseudogenes). This proteome check was not performed (e.g., disabled) for target candidates within tier five, as those (tier five) antigens were assumed to be universally expressed. All remaining 8-mer peptides with a proteome hit were stored to perform gene-wise proteome checks on downstream targets and (combined) target clusters. All target candidates (including their flanks) that passed the proteome check were considered for further target cluster generation. [0710] Target candidates from the epitope exclusion list were also identified and, to exclude them during target selection, their corresponding prediction values was set to -1. 5. Determining target cores [0711] For each combination of batch and tier, target cores were determined. Depending on the respective batch, target candidates and their corresponding alleles were either matched to known epitopes or ligands or checked against the peptide-MHC presentation prediction threshold. Based on the tier, only target cores from corresponding genes were taken. [0712] After selection, the target cores for each batch-tier combination were ranked. For example, in a given batch-tier combination, target cores were split into two expression groups based on values of an expression fold change metric, determined as the ratio between the Page 183 of 214 12608199v1
Attorney Docket No. 2013237-1122 measured gene expression and the pre-determined expression threshold. The median expression fold change across all potential target genes was calculated, and target cores with an expression fold change value greater or equal to the median were ranked in a first expression. Within each expression group, target cores were first ranked by their peptide-MHC prediction value and then by their expression fold change value and transcript window. In batches comprising antigens matching known epitopes, target cores from “direct hit” epitopes were prioritized in each expression group followed by other known epitopes before ranking the target cores within each epitope group according to their category, peptide-MHC prediction value, expression fold change value, and transcript window. Target cores from known ligands were ranked in a similar fashion. A maximum number of target cores per batch and antigen was set to 1000 to optimize the runtime of the algorithm while preserving a large pool of targets to be checked downstream for clustering. 6. Extension of target cores [0713] In order to extend target cores by including additional potential presented ligands on each terminus, all target candidates that passed the peptide-MHC prediction threshold of their respective batch were checked for overlaps with target cores by at least one amino acid. For batches with known epitopes or ligands, a strong binders threshold (0.5) was used. [0714] Target cores were extended if the potential ligands were derived from a same transcript window and an overall size of the resultant target cluster (following extension) did not exceed 40 amino acids including flanks (this could only occur for MHC II ligands, with its increased (15-30 AA) length target candidates). Newly emerging 8-mers resulting from target core and extensions were checked to ensure that they did not produce a proteome hit. Extensions were ranked according to their peptide-MHC prediction values and subsequently checked until the maximum elongating extensions were identified. 7. Addition of flanks [0715] Each target core with its potential extensions was extended by up to three flanking amino acids on each terminus to create a target cluster sequence. As all target candidates including their flanks were already checked for hits in the proteome, addition of flanks to either the target core or respective extension did not lead to new proteome hits. Page 184 of 214 12608199v1
Attorney Docket No. 2013237-1122 8. Deduplication of target clusters [0716] Identical target clusters from different transcripts of a same gene were collapsed into one target cluster while retaining information identifying different transcripts for downstream clustering. Identical target clusters and/or target clusters with identical core sequences were discarded per antigen across all available batches by retaining the highest ranked target cluster. Where two or more clusters had identical ranks, the target cluster with the best predicted presented ligand extensions was retained. [0717] Additionally, all target clusters that were substrings of other targets from the same antigen were removed in a top-down approach. The same applies to combined target clusters in downstream steps. 9. Potency linker checks [0718] Target clusters were checked against potency linker sequences and were removed if hits were identified (e.g., target clusters found to match potency linker sequences were removed). If a potency linker hit or proteome hit arose due to combining targets clusters, the combined target cluster was resolved to single target clusters. 10. Combining target clusters [0719] Target clusters were combined as soon as there was an overlap of one amino acid in two target clusters from the same transcript window. All target clusters from the same batch were considered for additional clustering, referred to as clusters batch approach. Since several window identifiers (IDs) per target cluster (due to different transcripts) are possible, the window ID for the respective combined cluster was chosen based on the largest intersection of window IDs of the next best ranked target clusters with the available window ID(s) of the initial target cluster. After target clusters to be combined were determined, their lengths were checked to ensure they did exceed 40 amino acids. If a target cluster exceeded the 40 AA length restriction, it was split into chunks from N- to C-terminus containing single targets or target clusters with a maximum length of 40 amino acids. The best ranked target or target cluster within a combined target cluster was used to determine the overall ranking of the respective cluster. Page 185 of 214 12608199v1
Attorney Docket No. 2013237-1122 11. Determining allowed linker combinations [0720] The junctional sequences of the final target cluster-linker fusions with a minimal overlap of two amino acids were checked for homologies to known maximum expression risk genes. To do so, for a given final target cluster, the respective gene from which the given final target cluster originated was excluded from the high-risk gene list. For each final target cluster, all linker combinations were checked whether at least one 8-mer of the linker-final target cluster-linker junctional sequences was contained in the high-risk 8-mers (excluding the final target cluster and linker itself). Those combinations were discarded. All allowed linker-final target cluster-linker combinations were added to the output table for each peptide. Final target clusters without any allowed linker combination were discarded. [0721] The same procedure for the determination of allowed linker combinations was applied to single targets being part of a target cluster to allow downstream target (cluster) replacement by an expert user considering the allowed linker combinations. 12. Target cluster selection [0722] The selection of final target clusters for the cassette was implemented in an iterative manner. If space was left on the insert, final target clusters were checked if they fulfilled various sets of criteria to be selected. [0723] For example, turning to FIG.38A, the selection method aimed to select final target clusters derived from a variety of antigens. [0724] Accordingly, as illustrated in FIG.38A (green paths with arrows, i.e., paths originating from clusters b2, b5, b12 and b17) and FIG.38B, if two final target clusters from the same antigen were selected, the remaining target clusters from the same antigen were downranked to the end of the current batch. [0725] As illustrated in FIG.38A and FIG.38C, if final target clusters of two “direct hit” epitopes (dotted red paths in FIG.38A, i.e., dotted paths originating from clusters b1 and b4) or three other known epitopes (teal paths in FIG.38A, i.e. solid paths originating from clusters b1, b4, b11, and b16) were selected from the same antigen, the remaining target clusters were downranked to the end of the last available tier three or tier five batch dependent on the tier of the respective antigen. Page 186 of 214 12608199v1
Attorney Docket No. 2013237-1122 [0726] As illustrated in FIG.38A, where batches comprised epitopes without known immunogenicity, target clusters were downranked after the selection of a threshold number of target clusters from the same antigen. The threshold number of target clusters had a default value of four, but was adapted in a patient specific manner based on the number of available antigens from tier one to tier three or tier four and tier five. For example, as shown in Table 21, where a number of available antigens was greater or equal to six, the threshold was lowered from four to three final target clusters. For eight available antigens and on, the threshold was set to two unless the antigen was derived from tier one. Table 21: Thresholds for downranking of target clusters based on number of antigens and respective tier. Number of available antigens Tiers Threshold (Tiers 1-3 or tiers 4-5 dependent on respective antigen) 1, 2, and 3 <6 4 4 and 5 1, 2, and 3 ≥6 3 4 and 5 1 3 ≥8 2 and 3 2 [0727] If a combined target cluster exceeded the allowed insert size, it was examined whether the cluster could be split, and the best ranked single target (cluster) was kept. If a single target cluster exceeded the insert size, the extensions were gradually removed according to their prediction rank, and the remaining target length was checked until the target core and its flanks remained. [0728] After each iteration, all final target clusters were checked and discarded if they were a substring of any other final target cluster that was added in a subsequent step. [0729] Additionally, final target clusters were checked and discarded if they were identical to already selected final target clusters or if they contained at least one identical target core since target cluster deduplication was previously performed only per antigen and not across antigens. This is due to the antigen diversity approach that can alter the ranking of final target clusters Page 187 of 214 12608199v1
Attorney Docket No. 2013237-1122 from different antigens. Upon the selection of final target clusters, the placement of target clusters on the cassette was determined. 13. Placement of target clusters [0730] In order to place the final target clusters on the cassette, each final target cluster was associated with a placement cost. The placement cost was based on the rank of the final target cluster as well as additional position-dependent penalties based on its length, the linker compatibility at the respective position and the replication status. [0731] Each final target cluster received a default cost composed of its rank multiplied by 50 to ensure that the cost of preferences did not alter the overall ranking. If the length of the final target cluster was below 17, one was added to its cost for positions three and six. If the final target cluster length exceeded 27 amino acids, one was added for every additional amino acid for the first and last position. Final target clusters at positions with incompatible linkers received an additional cost of 100,000. The linker combinations at specific positions were predefined and depend on the number of final target clusters. [0732] In case of less than eight available final target clusters and at least three available final target clusters that were derived from different cores, they were duplicated in an alternating scheme based on their ranking until a target number of eight was reached (e.g., 1-2-3-1-2-3-1-2 in case of three available final target clusters). Additionally, 1000 was added to the cost in case of the first duplication of a final target cluster and 2000 in case of the second duplication. A change in target order due to linker compatibility or position preferences can break the duplication pattern and takes precedence. [0733] Upon assignment of costs, final target clusters were placed at specific positions on the cassette while minimizing the overall cost using the Munkres algorithm to solve the assignment problem. If the sum of costs was below 100,000 and thus all placed target clusters fulfilled the linker compatibility criteria, the placement was considered valid. Otherwise, the final target cluster with the highest cost was discarded and the next ranked final target cluster was checked in terms of target placement. Page 188 of 214 12608199v1
Attorney Docket No. 2013237-1122 C. Results [0734] Data from a total of 159 patients derived from TCGA and TNBC-MERIT patient cohorts was processed with the sANTe algorithm. 1. Proteome hits after applying the antigen specific whitelist [0735] To investigate how many candidate peptides are discarded due to their homology to other genes that are not whitelisted, the proportion of candidate peptides with hits in the proteome is shown in FIG.39. (As shown in FIG.39, MAGEA3 was classified as a Tier 1 antigen; ACTL8, MAGEA9B, CTAG2, MAGEA4, MAGEA2, HOXB13, MAGEC2, MAGEC1, MAGEA1, CLDN6, and TPTE were classified as Tier 2 antigens; PRAME, PMEL, TYR, GUCY2C, ACPP, NKX3-1, KLK3, KLK2, IGF2BP3, ANKRD30A, and CEACAM5 were classified as Tier 3 antigens; TDRD1, MUC13, PAEP, CDH17, HEPHL1, PAGE4, BRDT, EDDM3B, ZFP42, and IGF2BP1 were classified as Tier 4 antigens; and BIRC5, KIF20A, TPBG, FLT1, KDR, TP53, and DEPDC1 were classified as Tier 5 antigens.) Additionally, the number of patients in which the respective antigen is sufficiently expressed to pass the threshold is indicated at the top of each bar in FIG. 39. All tier one to tier four peptides with hits against the proteome (excluding a gene specific whitelist) were discarded for target selection. For each gene, the average percentage of peptides with proteome hits across patients is shown. There are three antigens (TPTE, CEACAM5, and CLDN6) whose target candidates were on average diminished by over 50%. Almost 75% of target candidates from tier two antigen TPTE were removed due to hits in the proteome. At the same time, TPTE passes the expression threshold in only eight of 159 patients and is thus not frequently selected. [0736] According to this analysis, the whitelist prevents the removal of entire sets of target candidates by whitelisting appropriate homologous genes and leads to limited amounts of discarded target candidates. 2. Comparison of different target cluster combination approaches [0737] Two different target cluster combination approaches were tested to investigate their impact on final target cluster length and target number. One approach was to combine target clusters within the same batch and subsequently evaluate them regarding their inclusion into the insert (referred to as clusters batch). A variation of the approach (clusters batch thresholds) was Page 189 of 214 12608199v1
Attorney Docket No. 2013237-1122 additionally applied, combining only those target clusters within the same batch whose cores are below the respective binding threshold of the current batch. [0738] The distribution of final target cluster lengths slightly varied for the two approaches as shown in FIG.40. The approach clusters batch thresholds led to shorter final target clusters in most indications due to the limited number of target clusters from which additional clusters could be generated. Using approach clusters batch, final target clusters were assembled from a larger pool of potential target clusters. The lengths of the resulting final target clusters were in an appropriate range leading to diverse lengths between 12 and 40 amino acids. [0739] Regarding the number of predicted strong ligands (presentation values) in the final resulting target clusters, the clusters batch approach leads in most indications to an equal or higher median number of strong ligands compared to the clusters batch thresholds approach as shown in FIG.41. All possible substrings of selected target clusters across all valid ligand lengths and alleles of the respective patients were considered to determine the number of predicted strong ligands (this means that the same substrings can occur multiple times). [0740] Based on the above results, the approach that combines all potential target clusters within the same batch (clusters batch) was chosen and applied for all subsequent analyses. The clusters batch approach resulted in slightly longer final target clusters compared to the other approach with a similar distribution of lengths across all indications. Additionally, a slightly higher number of predicted epitopes was identified across the final target clusters. 3. sANTe final target cluster generation [0741] The sANTe algorithm generated 9 to 16 final target clusters per patient across all analyzed indications. The median number of target clusters varies for different indications. On average between 10 (PRAD) and 14 (BRCA, HNSC, KIRP, LIHC, LUSC, OV, STAD, and TNBC) final target clusters were selected as shown in FIG.42. [0742] Additionally, it was investigated from which antigens final target clusters were most often derived as shown in FIG.43. Antigens that belong to tier three were most often selected as final target clusters across all indications with PRAME, IGF2BP2, and CEACAM5 being the most frequently selected antigens. 15.3% of selected final target clusters were derived from PRAME while 11.5% were derived from IGF2BP3. CEACAM5 was comparatively less frequently Page 190 of 214 12608199v1
Attorney Docket No. 2013237-1122 selected (9.7%). All three frequently selected tier three antigens also passed the expression threshold in about one third of patients. Across all available tier one to tier four antigens, they were each shared by the highest percentage of patients as shown in FIG.44. [0743] Regarding the tier one antigens, MAGEA3 was overall the fourth most frequently selected antigen (6.2%), whereas only 1.9% of final target clusters were derived from CTAG1A. The difference in frequency is also reflected in the availability of the two tier one antigens across patients with CTAG1A being selected in 8.8% of patients whereas MAGEA3 was selected in 23.9% of patients. [0744] Five out of ten tier four antigens have not been selected once as final target clusters. Tier five antigens pass their respective predefined expression thresholds in 49.1% (DEPDC1) to 99.4% (TP53) of patients and are thus very frequently available. The low number of final tier five target clusters despite the broad antigen availability is expected since final target clusters from tiers one to three are preferably chosen by the algorithm. Tiers four and five are only considered if there are no suitable final target clusters from the first tiers. [0745] Selected final target clusters by sANTe were derived from a wide variability of antigens as shown in FIG.45A and FIG.45B. A number on top of the bars represents the number of final target clusters. To further examine indication dependent final target cluster selection in terms of chosen antigens, the fraction of final target clusters that were derived from antigens of the target list for each indication was analyzed. HNSC, LUAD, LUSC, SKCM, and TNBC show a large proportion of tier one final target clusters. Overall, many indications share the same antigens. For instance, PRAME as tier three antigen is selected in every indication except READ. IGF2BP3 is also widely selected. [0746] In KIRC, KIRP, and LIHC, which are kidney and liver cancer indications, a comparatively high amount of final target clusters is derived from tier four and tier five antigens. Kidney and liver cancers are not covered by the target list. Therefore, indication-specific shared target antigens are not included leading to the observed increase in tier four and five final target clusters. All remaining indications show a large variety of antigens from tier one to three and no or only few final target clusters are derived from tier four and five antigens. Page 191 of 214 12608199v1
Attorney Docket No. 2013237-1122 4. Patient-level analyses: antigen diversity and expression fold change [0747] In addition to the previous analyses, more in-depth analyses were performed on patient- level. To investigate the diversity of antigens of selected final target clusters as well as the impact of expression on target selection, only those indications that were specifically selected are shown (LUAD, LUSC, OV, PRAD, SKCM, TNBC). Accordingly, patients from those indications were well covered by shared target antigens from internal and reported vaccine trials. [0748] A penalty approach was implemented to increase the number of different antigens from which final target clusters that are selected for vaccination are derived. Without the penalty approach (no penalty), final target clusters from three to four different antigens were selected on median across the analyzed indications, even though the median number of available tier one to three antigens is between 4 and 9.5. By introducing predefined thresholds that additionally take into account the number of available tier one to three antigens (approach clusters batch), the median number of different selected antigens was increased for four of six indications. In particular, the median number of different selected antigens ranges from 4 to 6 by using the clusters batch approach as shown in FIG.46. [0749] The highest number of available tier one to three antigens was found for SKCM. The impact of introducing the penalty approach was particularly high, as reflected by the median number of different selected antigens increasing from 3 to 6. For the other five indications, the penalty approach leads to a similar median number of different selected antigens compared to the median number of available tier one to three antigens and thus covers a broad range of available antigens. [0750] FIG.47 depicts the number of final target clusters per antigen (red) for each LUAD patient. Antigen names with white zeros indicate that the respective antigen was sufficiently expressed and passed the predefined expression threshold, however no target clusters were selected. Cells are colored according to the expression fold change (log2) of the respective antigen, and the antigen names are colored according to their tier. The number of different antigens contributing to the final target clusters is indicated at the top. The same analysis was conducted for indications LUSC, OV, PRAD, SKCM, and TNBC are shown in FIGs. 48-52, respectively. Final target clusters were preferably chosen from tiers one to three despite a high Page 192 of 214 12608199v1
Attorney Docket No. 2013237-1122 availability of tier five antigens across all analyzed indications. Moreover, patients of the same indication usually had to some extent overlapping sets of available tier one to three antigens. [0751] Across all analyzed LUAD patients, at least one tier one antigen was expressed in five patients and selected to contribute to the final target clusters. If no tier one antigen was available, tier two and tier three antigens were selected in most cases. Patient TCGA-05- 4244 had only one tier two antigen expressed (no tier one or tier three antigen) and thus most final target clusters (eight) were selected from this antigen while three final target clusters were selected form tier five antigens. Across all patients, there is a high availability of tier five antigens. However, since in all patients there is at least one antigen available from a higher tier, no tier four or five antigen was selected in this indication except for the described patient. D. Conclusions [0752] The exemplary sANTe procedure described herein selects final targets of a non- neoepitope cancer vaccine cassette. The procedure assigns respective target candidates into batches based on two categorizations. First, antigens are categorized into tiers and secondly, target candidates are categorized into known epitopes and known HLA ligands (for which additional, pre-defined categories exist), and ligands predicted to be presented. The final ranking is performed by considering the respective categories and using the peptide-MHC prediction in each expression category followed by a clustering of target sequences. Clustering was performed to extend the sequence of the target core by additional available ligands predicted to be presented as well as to combine extended target clusters to generate the final target cluster sequences. [0753] The antigen specific whitelist was analyzed to determine the number of excluded target candidates due to hits in the proteome. Different target cluster combination approaches were tested to compare the resulting final target cluster lengths and target numbers. [0754] Performance of the selection procedure was analyzed in a variety of ways. The results show that the selection procedure can successfully be applied to all the tested patients derived from either TCGA or from the TNBC-MERIT cohort, thereby representing a variety of different cancer indications. For each patient, a sufficient number of target clusters could be identified. Further, most final target clusters were derived successfully from tier one to three while maintain a number of different selected antigens ranging from four to six (e.g., antigen diversity). Thus, in Page 193 of 214 12608199v1
Attorney Docket No. 2013237-1122 most patients it is possible to select target sequences from different antigens with highest evidence for immunogenicity and safety or cancer-testis/tumor specific and differentiation antigens with evidence for immunogenicity. The lengths of the resulting final target clusters were within the required length criteria and did not exceed 40 amino acids. Accordingly, these results show that the selection procedure can successfully combine multiple potential epitopes in one target sequence, increasing the density and number of targetable epitopes per cassette (e.g., in contrast to approaches that simply chose each single epitope as one shorter target). The median number of final target clusters ranged between 10 and 14. Page 194 of 214 12608199v1
Attorney Docket No. 2013237-1122 EQUIVALENTS [0755] Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of technologies described herein. The scope of the present disclosure is not intended to be limited to the above Description, but rather is as set forth in the following claims. Page 195 of 214 12608199v1