WO2024118882A1 - Iterative multiplex genome engineering in microbial cells using a selection marker swapping system - Google Patents
Iterative multiplex genome engineering in microbial cells using a selection marker swapping system Download PDFInfo
- Publication number
- WO2024118882A1 WO2024118882A1 PCT/US2023/081763 US2023081763W WO2024118882A1 WO 2024118882 A1 WO2024118882 A1 WO 2024118882A1 US 2023081763 W US2023081763 W US 2023081763W WO 2024118882 A1 WO2024118882 A1 WO 2024118882A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target sequence
- selection marker
- sequence
- dna
- rgen
- Prior art date
Links
- 239000003550 marker Substances 0.000 title claims abstract description 320
- 230000000813 microbial effect Effects 0.000 title claims abstract description 162
- 238000010362 genome editing Methods 0.000 title abstract description 15
- 238000000034 method Methods 0.000 claims abstract description 139
- 102000040430 polynucleotide Human genes 0.000 claims description 178
- 108091033319 polynucleotide Proteins 0.000 claims description 178
- 239000002157 polynucleotide Substances 0.000 claims description 178
- 230000008836 DNA modification Effects 0.000 claims description 163
- 108010042407 Endonucleases Proteins 0.000 claims description 87
- 230000004048 modification Effects 0.000 claims description 86
- 238000012986 modification Methods 0.000 claims description 86
- 238000002744 homologous recombination Methods 0.000 claims description 70
- 230000006801 homologous recombination Effects 0.000 claims description 70
- 238000011144 upstream manufacturing Methods 0.000 claims description 56
- 230000010354 integration Effects 0.000 claims description 26
- 238000012217 deletion Methods 0.000 claims description 22
- 230000037430 deletion Effects 0.000 claims description 22
- 102000004533 Endonucleases Human genes 0.000 claims description 11
- 238000003780 insertion Methods 0.000 claims description 9
- 230000037431 insertion Effects 0.000 claims description 9
- 239000000203 mixture Substances 0.000 abstract description 35
- 230000009466 transformation Effects 0.000 abstract description 22
- 230000002441 reversible effect Effects 0.000 abstract description 5
- 210000004027 cell Anatomy 0.000 description 288
- 108090000623 proteins and genes Proteins 0.000 description 209
- 108020004414 DNA Proteins 0.000 description 121
- 102000004169 proteins and genes Human genes 0.000 description 92
- 125000003729 nucleotide group Chemical group 0.000 description 87
- 239000002773 nucleotide Substances 0.000 description 86
- 102100031780 Endonuclease Human genes 0.000 description 77
- 108091028043 Nucleic acid sequence Proteins 0.000 description 73
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 53
- 150000007523 nucleic acids Chemical class 0.000 description 53
- 230000000694 effects Effects 0.000 description 48
- 230000014509 gene expression Effects 0.000 description 45
- 108090000765 processed proteins & peptides Proteins 0.000 description 45
- 102000004196 processed proteins & peptides Human genes 0.000 description 44
- 229920001184 polypeptide Polymers 0.000 description 43
- 108091033409 CRISPR Proteins 0.000 description 41
- 102000039446 nucleic acids Human genes 0.000 description 40
- 108020004707 nucleic acids Proteins 0.000 description 40
- 239000013598 vector Substances 0.000 description 34
- 230000008685 targeting Effects 0.000 description 31
- 108091026890 Coding region Proteins 0.000 description 30
- 108020005004 Guide RNA Proteins 0.000 description 28
- 210000000349 chromosome Anatomy 0.000 description 28
- 108091079001 CRISPR RNA Proteins 0.000 description 27
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 24
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 description 24
- 230000001105 regulatory effect Effects 0.000 description 24
- 108020004999 messenger RNA Proteins 0.000 description 23
- 108020004511 Recombinant DNA Proteins 0.000 description 22
- 150000001413 amino acids Chemical class 0.000 description 22
- 239000012634 fragment Substances 0.000 description 20
- 241000193996 Streptococcus pyogenes Species 0.000 description 17
- 239000000047 product Substances 0.000 description 17
- 230000035897 transcription Effects 0.000 description 17
- 238000013518 transcription Methods 0.000 description 17
- 108091034117 Oligonucleotide Proteins 0.000 description 16
- 241000894007 species Species 0.000 description 16
- 238000013519 translation Methods 0.000 description 16
- 101100032157 Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) pyr2 gene Proteins 0.000 description 15
- 238000003556 assay Methods 0.000 description 15
- 230000000295 complement effect Effects 0.000 description 13
- 238000006467 substitution reaction Methods 0.000 description 13
- 108020004705 Codon Proteins 0.000 description 12
- 230000027455 binding Effects 0.000 description 12
- 230000034431 double-strand break repair via homologous recombination Effects 0.000 description 12
- 230000002538 fungal effect Effects 0.000 description 12
- 230000035772 mutation Effects 0.000 description 12
- 125000003275 alpha amino acid group Chemical group 0.000 description 11
- 239000013612 plasmid Substances 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 101710163270 Nuclease Proteins 0.000 description 10
- 238000005520 cutting process Methods 0.000 description 10
- 230000005782 double-strand break Effects 0.000 description 10
- 230000006780 non-homologous end joining Effects 0.000 description 10
- 102000053602 DNA Human genes 0.000 description 9
- 102000004190 Enzymes Human genes 0.000 description 9
- 108090000790 Enzymes Proteins 0.000 description 9
- 229940088598 enzyme Drugs 0.000 description 9
- 230000001404 mediated effect Effects 0.000 description 9
- 210000001236 prokaryotic cell Anatomy 0.000 description 9
- 210000001938 protoplast Anatomy 0.000 description 9
- 108700026244 Open Reading Frames Proteins 0.000 description 8
- 108010076504 Protein Sorting Signals Proteins 0.000 description 8
- 241000024277 Trichoderma reesei QM6a Species 0.000 description 8
- 230000000692 anti-sense effect Effects 0.000 description 8
- 238000010276 construction Methods 0.000 description 8
- -1 Csm2 Proteins 0.000 description 7
- 230000007018 DNA scission Effects 0.000 description 7
- 241000233866 Fungi Species 0.000 description 7
- 241000194020 Streptococcus thermophilus Species 0.000 description 7
- 239000002243 precursor Substances 0.000 description 7
- 230000008439 repair process Effects 0.000 description 7
- 241000193985 Streptococcus agalactiae Species 0.000 description 6
- 125000000539 amino acid group Chemical group 0.000 description 6
- 230000001580 bacterial effect Effects 0.000 description 6
- 230000004927 fusion Effects 0.000 description 6
- 230000002068 genetic effect Effects 0.000 description 6
- 239000000463 material Substances 0.000 description 6
- 210000004940 nucleus Anatomy 0.000 description 6
- 238000004064 recycling Methods 0.000 description 6
- 238000012552 review Methods 0.000 description 6
- 230000005783 single-strand break Effects 0.000 description 6
- 125000006850 spacer group Chemical group 0.000 description 6
- 230000001052 transient effect Effects 0.000 description 6
- 108020005544 Antisense RNA Proteins 0.000 description 5
- 241000193830 Bacillus <bacterium> Species 0.000 description 5
- 241000894006 Bacteria Species 0.000 description 5
- 238000010453 CRISPR/Cas method Methods 0.000 description 5
- 241000196324 Embryophyta Species 0.000 description 5
- 108091092195 Intron Proteins 0.000 description 5
- 102000004316 Oxidoreductases Human genes 0.000 description 5
- 108090000854 Oxidoreductases Proteins 0.000 description 5
- 241000194025 Streptococcus oralis Species 0.000 description 5
- 230000004075 alteration Effects 0.000 description 5
- 230000002759 chromosomal effect Effects 0.000 description 5
- 238000003776 cleavage reaction Methods 0.000 description 5
- 239000003184 complementary RNA Substances 0.000 description 5
- 239000003623 enhancer Substances 0.000 description 5
- 239000013604 expression vector Substances 0.000 description 5
- 238000000338 in vitro Methods 0.000 description 5
- 238000010369 molecular cloning Methods 0.000 description 5
- 230000008488 polyadenylation Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000007017 scission Effects 0.000 description 5
- 230000028327 secretion Effects 0.000 description 5
- 230000002103 transcriptional effect Effects 0.000 description 5
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 4
- KDCGOANMDULRCW-UHFFFAOYSA-N 7H-purine Chemical compound N1=CNC2=NC=NC2=C1 KDCGOANMDULRCW-UHFFFAOYSA-N 0.000 description 4
- 229930024421 Adenine Natural products 0.000 description 4
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 4
- 229920001817 Agar Polymers 0.000 description 4
- 102000014914 Carrier Proteins Human genes 0.000 description 4
- FBPFZTCFMRRESA-FSIIMWSLSA-N D-Glucitol Natural products OC[C@H](O)[C@H](O)[C@@H](O)[C@H](O)CO FBPFZTCFMRRESA-FSIIMWSLSA-N 0.000 description 4
- 241000588724 Escherichia coli Species 0.000 description 4
- 241000223218 Fusarium Species 0.000 description 4
- GRRNUXAQVGOGFE-UHFFFAOYSA-N Hygromycin-B Natural products OC1C(NC)CC(N)C(O)C1OC1C2OC3(C(C(O)C(O)C(C(N)CO)O3)O)OC2C(O)C(CO)O1 GRRNUXAQVGOGFE-UHFFFAOYSA-N 0.000 description 4
- 238000010222 PCR analysis Methods 0.000 description 4
- 108010059820 Polygalacturonase Proteins 0.000 description 4
- 241000235070 Saccharomyces Species 0.000 description 4
- 241000499912 Trichoderma reesei Species 0.000 description 4
- 238000007792 addition Methods 0.000 description 4
- 229960000643 adenine Drugs 0.000 description 4
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 4
- 239000008272 agar Substances 0.000 description 4
- 239000011575 calcium Substances 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- HVYWMOMLDIMFJA-DPAQBDIFSA-N cholesterol Chemical compound C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 HVYWMOMLDIMFJA-DPAQBDIFSA-N 0.000 description 4
- 210000003527 eukaryotic cell Anatomy 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 108010018734 hexose oxidase Proteins 0.000 description 4
- GRRNUXAQVGOGFE-NZSRVPFOSA-N hygromycin B Chemical compound O[C@@H]1[C@@H](NC)C[C@@H](N)[C@H](O)[C@H]1O[C@H]1[C@H]2O[C@@]3([C@@H]([C@@H](O)[C@@H](O)[C@@H](C(N)CO)O3)O)O[C@H]2[C@@H](O)[C@@H](CO)O1 GRRNUXAQVGOGFE-NZSRVPFOSA-N 0.000 description 4
- 229940097277 hygromycin b Drugs 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 229920000642 polymer Polymers 0.000 description 4
- 230000022532 regulation of transcription, DNA-dependent Effects 0.000 description 4
- 239000000600 sorbitol Substances 0.000 description 4
- 239000000758 substrate Substances 0.000 description 4
- 210000005253 yeast cell Anatomy 0.000 description 4
- UHPMCKVQTMMPCG-UHFFFAOYSA-N 5,8-dihydroxy-2-methoxy-6-methyl-7-(2-oxopropyl)naphthalene-1,4-dione Chemical compound CC1=C(CC(C)=O)C(O)=C2C(=O)C(OC)=CC(=O)C2=C1O UHPMCKVQTMMPCG-UHFFFAOYSA-N 0.000 description 3
- 108700028369 Alleles Proteins 0.000 description 3
- 241000589599 Francisella tularensis subsp. novicida Species 0.000 description 3
- 241000221779 Fusarium sambucinum Species 0.000 description 3
- 108010060309 Glucuronidase Proteins 0.000 description 3
- 102000053187 Glucuronidase Human genes 0.000 description 3
- 102000004157 Hydrolases Human genes 0.000 description 3
- 108090000604 Hydrolases Proteins 0.000 description 3
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 3
- 101100219625 Mus musculus Casd1 gene Proteins 0.000 description 3
- 241000588650 Neisseria meningitidis Species 0.000 description 3
- 241000233654 Oomycetes Species 0.000 description 3
- 108091005804 Peptidases Proteins 0.000 description 3
- 102000035195 Peptidases Human genes 0.000 description 3
- 241000235346 Schizosaccharomyces Species 0.000 description 3
- 108091027544 Subgenomic mRNA Proteins 0.000 description 3
- 241000589892 Treponema denticola Species 0.000 description 3
- 108091023045 Untranslated Region Proteins 0.000 description 3
- 241000700605 Viruses Species 0.000 description 3
- 241000235013 Yarrowia Species 0.000 description 3
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000000137 annealing Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 101150055766 cat gene Proteins 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 3
- 239000002299 complementary DNA Substances 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 239000012636 effector Substances 0.000 description 3
- 238000004520 electroporation Methods 0.000 description 3
- 238000010353 genetic engineering Methods 0.000 description 3
- 230000012010 growth Effects 0.000 description 3
- 244000005700 microbiome Species 0.000 description 3
- 210000003463 organelle Anatomy 0.000 description 3
- 150000003212 purines Chemical class 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000001890 transfection Methods 0.000 description 3
- 108020005345 3' Untranslated Regions Proteins 0.000 description 2
- PDACUKOKVHBVHJ-XVFCMESISA-N 5-amino-1-(5-phospho-beta-D-ribosyl)imidazole Chemical compound NC1=CN=CN1[C@H]1[C@H](O)[C@H](O)[C@@H](COP(O)(O)=O)O1 PDACUKOKVHBVHJ-XVFCMESISA-N 0.000 description 2
- 108010011619 6-Phytase Proteins 0.000 description 2
- 108010013043 Acetylesterase Proteins 0.000 description 2
- 102000004400 Aminopeptidases Human genes 0.000 description 2
- 108090000915 Aminopeptidases Proteins 0.000 description 2
- 108010065511 Amylases Proteins 0.000 description 2
- 102000013142 Amylases Human genes 0.000 description 2
- 241000203069 Archaea Species 0.000 description 2
- 241000194108 Bacillus licheniformis Species 0.000 description 2
- 101150069031 CSN2 gene Proteins 0.000 description 2
- 108010006303 Carboxypeptidases Proteins 0.000 description 2
- 102000005367 Carboxypeptidases Human genes 0.000 description 2
- 108010078791 Carrier Proteins Proteins 0.000 description 2
- 108010053835 Catalase Proteins 0.000 description 2
- 102000016938 Catalase Human genes 0.000 description 2
- 108010084185 Cellulases Proteins 0.000 description 2
- 102000005575 Cellulases Human genes 0.000 description 2
- 108010022172 Chitinases Proteins 0.000 description 2
- 102000012286 Chitinases Human genes 0.000 description 2
- 108010035563 Chloramphenicol O-acetyltransferase Proteins 0.000 description 2
- 241000611330 Chryseobacterium Species 0.000 description 2
- 108700010070 Codon Usage Proteins 0.000 description 2
- 102000004594 DNA Polymerase I Human genes 0.000 description 2
- 108010017826 DNA Polymerase I Proteins 0.000 description 2
- 230000033616 DNA repair Effects 0.000 description 2
- 230000004568 DNA-binding Effects 0.000 description 2
- 108010053770 Deoxyribonucleases Proteins 0.000 description 2
- 102000016911 Deoxyribonucleases Human genes 0.000 description 2
- 101001096557 Dickeya dadantii (strain 3937) Rhamnogalacturonate lyase Proteins 0.000 description 2
- 101710121765 Endo-1,4-beta-xylanase Proteins 0.000 description 2
- 101100326871 Escherichia coli (strain K12) ygbF gene Proteins 0.000 description 2
- 108090000371 Esterases Proteins 0.000 description 2
- 241000195623 Euglenida Species 0.000 description 2
- 241000589601 Francisella Species 0.000 description 2
- 241000567163 Fusarium cerealis Species 0.000 description 2
- 241000146406 Fusarium heterosporum Species 0.000 description 2
- 101150106478 GPS1 gene Proteins 0.000 description 2
- 108010093031 Galactosidases Proteins 0.000 description 2
- 102000002464 Galactosidases Human genes 0.000 description 2
- 108700039691 Genetic Promoter Regions Proteins 0.000 description 2
- 108700007698 Genetic Terminator Regions Proteins 0.000 description 2
- 229920001503 Glucan Polymers 0.000 description 2
- 102100022624 Glucoamylase Human genes 0.000 description 2
- 108050008938 Glucoamylases Proteins 0.000 description 2
- 108010015776 Glucose oxidase Proteins 0.000 description 2
- 108010043121 Green Fluorescent Proteins Proteins 0.000 description 2
- 102000004144 Green Fluorescent Proteins Human genes 0.000 description 2
- NYHBQMYGNKIUIF-UUOKFMHZSA-N Guanosine Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O NYHBQMYGNKIUIF-UUOKFMHZSA-N 0.000 description 2
- 101710154606 Hemagglutinin Proteins 0.000 description 2
- 108010068250 Herpes Simplex Virus Protein Vmw65 Proteins 0.000 description 2
- 108010033040 Histones Proteins 0.000 description 2
- 101000615488 Homo sapiens Methyl-CpG-binding domain protein 2 Proteins 0.000 description 2
- 108010001336 Horseradish Peroxidase Proteins 0.000 description 2
- 102000004195 Isomerases Human genes 0.000 description 2
- 108090000769 Isomerases Proteins 0.000 description 2
- 108010029541 Laccase Proteins 0.000 description 2
- 108091026898 Leader sequence (mRNA) Proteins 0.000 description 2
- 108090001060 Lipase Proteins 0.000 description 2
- 102000004882 Lipase Human genes 0.000 description 2
- 239000004367 Lipase Substances 0.000 description 2
- 241000186781 Listeria Species 0.000 description 2
- 108090000856 Lyases Proteins 0.000 description 2
- 102000004317 Lyases Human genes 0.000 description 2
- 239000004472 Lysine Substances 0.000 description 2
- 241000710118 Maize chlorotic mottle virus Species 0.000 description 2
- 241000723994 Maize dwarf mosaic virus Species 0.000 description 2
- 101710175625 Maltose/maltodextrin-binding periplasmic protein Proteins 0.000 description 2
- 108010054377 Mannosidases Proteins 0.000 description 2
- 102000001696 Mannosidases Human genes 0.000 description 2
- 102100021299 Methyl-CpG-binding domain protein 2 Human genes 0.000 description 2
- 102100036617 Monoacylglycerol lipase ABHD2 Human genes 0.000 description 2
- 241000588653 Neisseria Species 0.000 description 2
- 108010077850 Nuclear Localization Signals Proteins 0.000 description 2
- 101710093908 Outer capsid protein VP4 Proteins 0.000 description 2
- 101710135467 Outer capsid protein sigma-1 Proteins 0.000 description 2
- 241000606860 Pasteurella Species 0.000 description 2
- 108700020962 Peroxidase Proteins 0.000 description 2
- 102000003992 Peroxidases Human genes 0.000 description 2
- 102100027330 Phosphoribosylaminoimidazole carboxylase Human genes 0.000 description 2
- 241000235648 Pichia Species 0.000 description 2
- 239000002202 Polyethylene glycol Substances 0.000 description 2
- 239000004365 Protease Substances 0.000 description 2
- 101710176177 Protein A56 Proteins 0.000 description 2
- 108090001066 Racemases and epimerases Proteins 0.000 description 2
- 102000004879 Racemases and epimerases Human genes 0.000 description 2
- 108010083644 Ribonucleases Proteins 0.000 description 2
- 102000006382 Ribonucleases Human genes 0.000 description 2
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- 241000700584 Simplexvirus Species 0.000 description 2
- 238000002105 Southern blotting Methods 0.000 description 2
- 108091081024 Start codon Proteins 0.000 description 2
- 241001466451 Stramenopiles Species 0.000 description 2
- 241000194017 Streptococcus Species 0.000 description 2
- 241000194008 Streptococcus anginosus Species 0.000 description 2
- 241001291896 Streptococcus constellatus Species 0.000 description 2
- 241000194042 Streptococcus dysgalactiae Species 0.000 description 2
- 241000194045 Streptococcus macacae Species 0.000 description 2
- 241000194019 Streptococcus mutans Species 0.000 description 2
- 241000193991 Streptococcus parasanguinis Species 0.000 description 2
- 241001400864 Streptococcus pseudoporcinus Species 0.000 description 2
- 102100036407 Thioredoxin Human genes 0.000 description 2
- IQFYYKKMVGJFEH-XLPZGREQSA-N Thymidine Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 IQFYYKKMVGJFEH-XLPZGREQSA-N 0.000 description 2
- 241000723792 Tobacco etch virus Species 0.000 description 2
- 241000723873 Tobacco mosaic virus Species 0.000 description 2
- 102000004357 Transferases Human genes 0.000 description 2
- 108090000992 Transferases Proteins 0.000 description 2
- 108060008539 Transglutaminase Proteins 0.000 description 2
- 241000589886 Treponema Species 0.000 description 2
- 241000223259 Trichoderma Species 0.000 description 2
- 102000003425 Tyrosinase Human genes 0.000 description 2
- 108060008724 Tyrosinase Proteins 0.000 description 2
- DRTQHJPVMGBUCF-XVFCMESISA-N Uridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-XVFCMESISA-N 0.000 description 2
- 239000012190 activator Substances 0.000 description 2
- 125000001931 aliphatic group Chemical group 0.000 description 2
- 235000019418 amylase Nutrition 0.000 description 2
- 229940025131 amylases Drugs 0.000 description 2
- 230000000845 anti-microbial effect Effects 0.000 description 2
- 210000004436 artificial bacterial chromosome Anatomy 0.000 description 2
- 230000000712 assembly Effects 0.000 description 2
- 238000000429 assembly Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 108010051210 beta-Fructofuranosidase Proteins 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 108091008324 binding proteins Proteins 0.000 description 2
- 239000002551 biofuel Substances 0.000 description 2
- 230000004071 biological effect Effects 0.000 description 2
- 108091005948 blue fluorescent proteins Proteins 0.000 description 2
- 244000309464 bull Species 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 101150117416 cas2 gene Proteins 0.000 description 2
- 101150038500 cas9 gene Proteins 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 235000012000 cholesterol Nutrition 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 230000021615 conjugation Effects 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 101150055601 cops2 gene Proteins 0.000 description 2
- 108010005400 cutinase Proteins 0.000 description 2
- 108010082025 cyan fluorescent protein Proteins 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 230000002939 deleterious effect Effects 0.000 description 2
- 229940119679 deoxyribonucleases Drugs 0.000 description 2
- 230000029087 digestion Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 230000002616 endonucleolytic effect Effects 0.000 description 2
- 239000007850 fluorescent dye Substances 0.000 description 2
- 108020001507 fusion proteins Proteins 0.000 description 2
- 102000037865 fusion proteins Human genes 0.000 description 2
- 238000003144 genetic modification method Methods 0.000 description 2
- 235000019420 glucose oxidase Nutrition 0.000 description 2
- 230000013595 glycosylation Effects 0.000 description 2
- 238000006206 glycosylation reaction Methods 0.000 description 2
- 239000005090 green fluorescent protein Substances 0.000 description 2
- 239000000185 hemagglutinin Substances 0.000 description 2
- 108010002430 hemicellulase Proteins 0.000 description 2
- 230000001976 improved effect Effects 0.000 description 2
- 235000011073 invertase Nutrition 0.000 description 2
- 235000019421 lipase Nutrition 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000000520 microinjection Methods 0.000 description 2
- 238000002703 mutagenesis Methods 0.000 description 2
- 231100000350 mutagenesis Toxicity 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 108010087558 pectate lyase Proteins 0.000 description 2
- 108010072638 pectinacetylesterase Proteins 0.000 description 2
- 102000004251 pectinacetylesterase Human genes 0.000 description 2
- 108020004410 pectinesterase Proteins 0.000 description 2
- 230000002351 pectolytic effect Effects 0.000 description 2
- 108010035774 phosphoribosylaminoimidazole carboxylase Proteins 0.000 description 2
- 229920001223 polyethylene glycol Polymers 0.000 description 2
- 238000006116 polymerization reaction Methods 0.000 description 2
- 230000004481 post-translational protein modification Effects 0.000 description 2
- 230000001124 posttranscriptional effect Effects 0.000 description 2
- 108020001580 protein domains Proteins 0.000 description 2
- 101150089778 pyr-4 gene Proteins 0.000 description 2
- 238000003259 recombinant expression Methods 0.000 description 2
- 230000006798 recombination Effects 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 230000003362 replicative effect Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000003248 secreting effect Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000001568 sexual effect Effects 0.000 description 2
- 230000003007 single stranded DNA break Effects 0.000 description 2
- 238000002741 site-directed mutagenesis Methods 0.000 description 2
- 230000002269 spontaneous effect Effects 0.000 description 2
- 230000001502 supplementing effect Effects 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 108060008226 thioredoxin Proteins 0.000 description 2
- 238000010361 transduction Methods 0.000 description 2
- 230000026683 transduction Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000011426 transformation method Methods 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 230000009261 transgenic effect Effects 0.000 description 2
- 102000003601 transglutaminase Human genes 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 108700026220 vif Genes Proteins 0.000 description 2
- 230000003612 virological effect Effects 0.000 description 2
- 108091005957 yellow fluorescent proteins Proteins 0.000 description 2
- MTCFGRXMJLQNBG-REOHCLBHSA-N (2S)-2-Amino-3-hydroxypropansäure Chemical compound OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 description 1
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- WKKCYLSCLQVWFD-UHFFFAOYSA-N 1,2-dihydropyrimidin-4-amine Chemical compound N=C1NCNC=C1 WKKCYLSCLQVWFD-UHFFFAOYSA-N 0.000 description 1
- YKBGVTZYEHREMT-KVQBGUIXSA-N 2'-deoxyguanosine Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](CO)O1 YKBGVTZYEHREMT-KVQBGUIXSA-N 0.000 description 1
- 108020003589 5' Untranslated Regions Proteins 0.000 description 1
- MSSXOMSJDRHRMC-UHFFFAOYSA-N 9H-purine-2,6-diamine Chemical compound NC1=NC(N)=C2NC=NC2=N1 MSSXOMSJDRHRMC-UHFFFAOYSA-N 0.000 description 1
- 241000589291 Acinetobacter Species 0.000 description 1
- 241001019659 Acremonium <Plectosphaerellaceae> Species 0.000 description 1
- 241000567147 Aeropyrum Species 0.000 description 1
- 241000589155 Agrobacterium tumefaciens Species 0.000 description 1
- 241000724328 Alfalfa mosaic virus Species 0.000 description 1
- 240000007304 Amorphophallus muelleri Species 0.000 description 1
- 241001109946 Aquimarina Species 0.000 description 1
- 108010045149 Archaeal Proteins Proteins 0.000 description 1
- 241000205046 Archaeoglobus Species 0.000 description 1
- 239000004475 Arginine Substances 0.000 description 1
- 241000235349 Ascomycota Species 0.000 description 1
- 241000228212 Aspergillus Species 0.000 description 1
- 241001513093 Aspergillus awamori Species 0.000 description 1
- 241000892910 Aspergillus foetidus Species 0.000 description 1
- 241001225321 Aspergillus fumigatus Species 0.000 description 1
- 241001480052 Aspergillus japonicus Species 0.000 description 1
- 241000351920 Aspergillus nidulans Species 0.000 description 1
- 241000228245 Aspergillus niger Species 0.000 description 1
- 240000006439 Aspergillus oryzae Species 0.000 description 1
- 235000002247 Aspergillus oryzae Nutrition 0.000 description 1
- 241000193818 Atopobium Species 0.000 description 1
- 235000014469 Bacillus subtilis Nutrition 0.000 description 1
- 108010077805 Bacterial Proteins Proteins 0.000 description 1
- 241000606125 Bacteroides Species 0.000 description 1
- 241001567982 Bacteroides graminisolvens Species 0.000 description 1
- 241000221198 Basidiomycota Species 0.000 description 1
- 101150111062 C gene Proteins 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- 101150018129 CSF2 gene Proteins 0.000 description 1
- UXVMQQNJUSDDNG-UHFFFAOYSA-L Calcium chloride Chemical compound [Cl-].[Cl-].[Ca+2] UXVMQQNJUSDDNG-UHFFFAOYSA-L 0.000 description 1
- 241000589876 Campylobacter Species 0.000 description 1
- 241000222120 Candida <Saccharomycetales> Species 0.000 description 1
- 241000190890 Capnocytophaga Species 0.000 description 1
- 101710132601 Capsid protein Proteins 0.000 description 1
- 108090000209 Carbonic anhydrases Proteins 0.000 description 1
- 102000003846 Carbonic anhydrases Human genes 0.000 description 1
- 102000020313 Cell-Penetrating Peptides Human genes 0.000 description 1
- 108010051109 Cell-Penetrating Peptides Proteins 0.000 description 1
- 240000001817 Cereus hexagonus Species 0.000 description 1
- 241000221955 Chaetomium Species 0.000 description 1
- 241000191366 Chlorobium Species 0.000 description 1
- 241000588881 Chromobacterium Species 0.000 description 1
- 241001426140 Chryseobacterium tenax Species 0.000 description 1
- 241000123346 Chrysosporium Species 0.000 description 1
- 241001674013 Chrysosporium lucknowense Species 0.000 description 1
- 241000233652 Chytridiomycota Species 0.000 description 1
- 240000005721 Cirsium palustre Species 0.000 description 1
- 241000193403 Clostridium Species 0.000 description 1
- 101710094648 Coat protein Proteins 0.000 description 1
- 241001657523 Coriobacteriaceae Species 0.000 description 1
- 241001162417 Coriobacteriaceae bacterium Species 0.000 description 1
- 241001252397 Corynascus Species 0.000 description 1
- 241000186216 Corynebacterium Species 0.000 description 1
- MIKUYHXYGGJMLM-GIMIYPNGSA-N Crotonoside Natural products C1=NC2=C(N)NC(=O)N=C2N1[C@H]1O[C@@H](CO)[C@H](O)[C@@H]1O MIKUYHXYGGJMLM-GIMIYPNGSA-N 0.000 description 1
- 101150074775 Csf1 gene Proteins 0.000 description 1
- NYHBQMYGNKIUIF-UHFFFAOYSA-N D-guanosine Natural products C1=2NC(N)=NC(=O)C=2N=CN1C1OC(CO)C(O)C1O NYHBQMYGNKIUIF-UHFFFAOYSA-N 0.000 description 1
- 108010066133 D-octopine dehydrogenase Proteins 0.000 description 1
- 238000007399 DNA isolation Methods 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 102000052510 DNA-Binding Proteins Human genes 0.000 description 1
- 101710096438 DNA-binding protein Proteins 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 241000605716 Desulfovibrio Species 0.000 description 1
- 241000698776 Duma Species 0.000 description 1
- 241000710188 Encephalomyocarditis virus Species 0.000 description 1
- 241000588698 Erwinia Species 0.000 description 1
- 241000588722 Escherichia Species 0.000 description 1
- 101100438439 Escherichia coli (strain K12) ygbT gene Proteins 0.000 description 1
- 241000701959 Escherichia virus Lambda Species 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 108091029865 Exogenous DNA Proteins 0.000 description 1
- 108050001049 Extracellular proteins Proteins 0.000 description 1
- 108091092566 Extrachromosomal DNA Proteins 0.000 description 1
- 241001617393 Finegoldia Species 0.000 description 1
- 241000589565 Flavobacterium Species 0.000 description 1
- 241000405386 Flavobacterium frigidarium Species 0.000 description 1
- 241000721361 Flavobacterium soli Species 0.000 description 1
- 240000003362 Fragaria moschata Species 0.000 description 1
- 241000145614 Fusarium bactridioides Species 0.000 description 1
- 241000223194 Fusarium culmorum Species 0.000 description 1
- 241000223195 Fusarium graminearum Species 0.000 description 1
- 241000223221 Fusarium oxysporum Species 0.000 description 1
- 241001112697 Fusarium reticulatum Species 0.000 description 1
- 241001014439 Fusarium sarcochroum Species 0.000 description 1
- 241000223192 Fusarium sporotrichioides Species 0.000 description 1
- 241001465753 Fusarium torulosum Species 0.000 description 1
- 241000567178 Fusarium venenatum Species 0.000 description 1
- 241000605909 Fusobacterium Species 0.000 description 1
- 241001135750 Geobacter Species 0.000 description 1
- 108010056771 Glucosidases Proteins 0.000 description 1
- 102000004366 Glucosidases Human genes 0.000 description 1
- 102100021181 Golgi phosphoprotein 3 Human genes 0.000 description 1
- 241000606790 Haemophilus Species 0.000 description 1
- 241001059853 Haemophilus pittmaniae Species 0.000 description 1
- 241000819598 Haemophilus sputorum Species 0.000 description 1
- 241000205065 Haloarcula Species 0.000 description 1
- 101001023784 Heteractis crispa GFP-like non-fluorescent chromoprotein Proteins 0.000 description 1
- 101000899240 Homo sapiens Endoplasmic reticulum chaperone BiP Proteins 0.000 description 1
- 241000223198 Humicola Species 0.000 description 1
- 241001480714 Humicola insolens Species 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 241000235649 Kluyveromyces Species 0.000 description 1
- 241001138401 Kluyveromyces lactis Species 0.000 description 1
- KDXKERNSBIXSRK-YFKPBYRVSA-N L-lysine Chemical compound NCCCC[C@H](N)C(O)=O KDXKERNSBIXSRK-YFKPBYRVSA-N 0.000 description 1
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 1
- 241000235087 Lachancea kluyveri Species 0.000 description 1
- 241000186660 Lactobacillus Species 0.000 description 1
- 241000589248 Legionella Species 0.000 description 1
- 208000007764 Legionnaires' Disease Diseases 0.000 description 1
- 241000222722 Leishmania <genus> Species 0.000 description 1
- 108060001084 Luciferase Proteins 0.000 description 1
- 239000005089 Luciferase Substances 0.000 description 1
- 241001344133 Magnaporthe Species 0.000 description 1
- 101710125418 Major capsid protein Proteins 0.000 description 1
- 108091027974 Mature messenger RNA Proteins 0.000 description 1
- 241000619533 Mesonia Species 0.000 description 1
- 241000203353 Methanococcus Species 0.000 description 1
- 241000204675 Methanopyrus Species 0.000 description 1
- 241000205276 Methanosarcina Species 0.000 description 1
- 241000589345 Methylococcus Species 0.000 description 1
- 241000645872 Methylococcus mobilis Species 0.000 description 1
- 108060004795 Methyltransferase Proteins 0.000 description 1
- 102000016397 Methyltransferase Human genes 0.000 description 1
- 108020005196 Mitochondrial DNA Proteins 0.000 description 1
- 241000226677 Myceliophthora Species 0.000 description 1
- 241000186359 Mycobacterium Species 0.000 description 1
- 241000204031 Mycoplasma Species 0.000 description 1
- 208000009869 Neu-Laxova syndrome Diseases 0.000 description 1
- 241000221960 Neurospora Species 0.000 description 1
- 241000221961 Neurospora crassa Species 0.000 description 1
- 101100385413 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) csm-3 gene Proteins 0.000 description 1
- 241000221962 Neurospora intermedia Species 0.000 description 1
- 241000605122 Nitrosomonas Species 0.000 description 1
- 108091092724 Noncoding DNA Proteins 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 101710141454 Nucleoprotein Proteins 0.000 description 1
- 241000169855 Olivibacter Species 0.000 description 1
- 241000927544 Olsenella Species 0.000 description 1
- 241001236817 Paecilomyces <Clavicipitaceae> Species 0.000 description 1
- 240000000968 Parkia biglobosa Species 0.000 description 1
- 241000606601 Pasteurella bettyae Species 0.000 description 1
- 241000228143 Penicillium Species 0.000 description 1
- 241000228172 Penicillium canescens Species 0.000 description 1
- 241000864268 Penicillium solitum Species 0.000 description 1
- 241001112692 Peptostreptococcaceae Species 0.000 description 1
- 241001326562 Pezizomycotina Species 0.000 description 1
- 241000222393 Phanerochaete chrysosporium Species 0.000 description 1
- 108090000608 Phosphoric Monoester Hydrolases Proteins 0.000 description 1
- 102000004160 Phosphoric Monoester Hydrolases Human genes 0.000 description 1
- 108091000080 Phosphotransferase Proteins 0.000 description 1
- 241000607568 Photobacterium Species 0.000 description 1
- 241000425347 Phyla <beetle> Species 0.000 description 1
- 241000709664 Picornaviridae Species 0.000 description 1
- 241000204826 Picrophilus Species 0.000 description 1
- 241000605894 Porphyromonas Species 0.000 description 1
- 241000134844 Porphyromonas catoniae Species 0.000 description 1
- 241000710078 Potyvirus Species 0.000 description 1
- 241000605861 Prevotella Species 0.000 description 1
- 101710083689 Probable capsid protein Proteins 0.000 description 1
- 102000001253 Protein Kinase Human genes 0.000 description 1
- 241000205226 Pyrobaculum Species 0.000 description 1
- 241000205160 Pyrococcus Species 0.000 description 1
- 108091034057 RNA (poly(A)) Proteins 0.000 description 1
- 230000007022 RNA scission Effects 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 101100047461 Rattus norvegicus Trpm8 gene Proteins 0.000 description 1
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 1
- 102000004389 Ribonucleoproteins Human genes 0.000 description 1
- 108010081734 Ribonucleoproteins Proteins 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 108020004422 Riboswitch Proteins 0.000 description 1
- 235000003534 Saccharomyces carlsbergensis Nutrition 0.000 description 1
- 235000001006 Saccharomyces cerevisiae var diastaticus Nutrition 0.000 description 1
- 244000206963 Saccharomyces cerevisiae var. diastaticus Species 0.000 description 1
- 241000204893 Saccharomyces douglasii Species 0.000 description 1
- 241001407717 Saccharomyces norbensis Species 0.000 description 1
- 241001123227 Saccharomyces pastorianus Species 0.000 description 1
- 241000235343 Saccharomycetales Species 0.000 description 1
- 241000607142 Salmonella Species 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- 108020004459 Small interfering RNA Proteins 0.000 description 1
- 241000202917 Spiroplasma Species 0.000 description 1
- 241000202907 Spiroplasma apis Species 0.000 description 1
- 241001606419 Spiroplasma syrphidicola Species 0.000 description 1
- 241000191940 Staphylococcus Species 0.000 description 1
- 241000194024 Streptococcus salivarius Species 0.000 description 1
- 240000007349 Streptococcus sp. BS-21 Species 0.000 description 1
- 241001266658 Streptococcus sp. BS35b Species 0.000 description 1
- 241000302474 Streptococcus sp. CM6 Species 0.000 description 1
- 241000059353 Streptococcus sp. SR4 Species 0.000 description 1
- 241000187747 Streptomyces Species 0.000 description 1
- 101710172711 Structural protein Proteins 0.000 description 1
- 241000205101 Sulfolobus Species 0.000 description 1
- 108700005078 Synthetic Genes Proteins 0.000 description 1
- 241000228341 Talaromyces Species 0.000 description 1
- 241000228343 Talaromyces flavus Species 0.000 description 1
- 241001136494 Talaromyces funiculosus Species 0.000 description 1
- 241001540751 Talaromyces ruber Species 0.000 description 1
- 108020005038 Terminator Codon Proteins 0.000 description 1
- 241000186339 Thermoanaerobacter Species 0.000 description 1
- 241000228178 Thermoascus Species 0.000 description 1
- 241000223258 Thermomyces lanuginosus Species 0.000 description 1
- 101100329497 Thermoproteus tenax (strain ATCC 35583 / DSM 2078 / JCM 9277 / NBRC 100435 / Kra 1) cas2 gene Proteins 0.000 description 1
- 241001313536 Thermothelomyces thermophila Species 0.000 description 1
- 241000204652 Thermotoga Species 0.000 description 1
- 241000589596 Thermus Species 0.000 description 1
- 241001494489 Thielavia Species 0.000 description 1
- 241001495429 Thielavia terrestris Species 0.000 description 1
- 241001149964 Tolypocladium Species 0.000 description 1
- 108091023040 Transcription factor Proteins 0.000 description 1
- 102000040945 Transcription factor Human genes 0.000 description 1
- 108700029229 Transcriptional Regulatory Elements Proteins 0.000 description 1
- 108020004566 Transfer RNA Proteins 0.000 description 1
- 241000520890 Treponema socranskii Species 0.000 description 1
- 241000223260 Trichoderma harzianum Species 0.000 description 1
- 241000378866 Trichoderma koningii Species 0.000 description 1
- 241000223262 Trichoderma longibrachiatum Species 0.000 description 1
- 241000223261 Trichoderma viride Species 0.000 description 1
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 1
- 101150013568 US16 gene Proteins 0.000 description 1
- 102000006275 Ubiquitin-Protein Ligases Human genes 0.000 description 1
- 108010083111 Ubiquitin-Protein Ligases Proteins 0.000 description 1
- 241001148134 Veillonella Species 0.000 description 1
- 108020005202 Viral DNA Proteins 0.000 description 1
- 241000605941 Wolinella Species 0.000 description 1
- 241000589634 Xanthomonas Species 0.000 description 1
- 241000235015 Yarrowia lipolytica Species 0.000 description 1
- 241000607734 Yersinia <bacteria> Species 0.000 description 1
- 241000758405 Zoopagomycotina Species 0.000 description 1
- RZZBUMCFKOLHEH-KVQBGUIXSA-N [(2r,3s,5r)-5-(2,6-diaminopurin-9-yl)-3-hydroxyoxolan-2-yl]methyl dihydrogen phosphate Chemical compound C12=NC(N)=NC(N)=C2N=CN1[C@H]1C[C@H](O)[C@@H](COP(O)(O)=O)O1 RZZBUMCFKOLHEH-KVQBGUIXSA-N 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000021736 acetylation Effects 0.000 description 1
- 238000006640 acetylation reaction Methods 0.000 description 1
- 108020002494 acetyltransferase Proteins 0.000 description 1
- 102000005421 acetyltransferase Human genes 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- 230000006154 adenylylation Effects 0.000 description 1
- 150000001408 amides Chemical class 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 239000004599 antimicrobial Substances 0.000 description 1
- 101150009206 aprE gene Proteins 0.000 description 1
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 1
- 210000004507 artificial chromosome Anatomy 0.000 description 1
- 125000003118 aryl group Chemical group 0.000 description 1
- 229940091771 aspergillus fumigatus Drugs 0.000 description 1
- 108010005774 beta-Galactosidase Proteins 0.000 description 1
- 102000005936 beta-Galactosidase Human genes 0.000 description 1
- DRTQHJPVMGBUCF-PSQAKQOGSA-N beta-L-uridine Natural products O[C@H]1[C@@H](O)[C@H](CO)O[C@@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-PSQAKQOGSA-N 0.000 description 1
- 230000008827 biological function Effects 0.000 description 1
- 238000006664 bond formation reaction Methods 0.000 description 1
- 238000009395 breeding Methods 0.000 description 1
- 230000001488 breeding effect Effects 0.000 description 1
- 239000001110 calcium chloride Substances 0.000 description 1
- 229910001628 calcium chloride Inorganic materials 0.000 description 1
- 101150000705 cas1 gene Proteins 0.000 description 1
- 230000006800 cellular catabolic process Effects 0.000 description 1
- 230000030570 cellular localization Effects 0.000 description 1
- 230000005754 cellular signaling Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000013043 chemical agent Substances 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- WIIZWVCIJKGZOK-RKDXNWHRSA-N chloramphenicol Chemical compound ClC(Cl)C(=O)N[C@H](CO)[C@H](O)C1=CC=C([N+]([O-])=O)C=C1 WIIZWVCIJKGZOK-RKDXNWHRSA-N 0.000 description 1
- 229960005091 chloramphenicol Drugs 0.000 description 1
- 239000013611 chromosomal DNA Substances 0.000 description 1
- 239000013599 cloning vector Substances 0.000 description 1
- 230000002153 concerted effect Effects 0.000 description 1
- 239000013256 coordination polymer Substances 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 230000005860 defense response to virus Effects 0.000 description 1
- 230000006114 demyristoylation Effects 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000009189 diving Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000001214 effect on cellular process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 235000020774 essential nutrients Nutrition 0.000 description 1
- 238000000855 fermentation Methods 0.000 description 1
- 230000004151 fermentation Effects 0.000 description 1
- 108010021843 fluorescent protein 583 Proteins 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 238000003198 gene knock in Methods 0.000 description 1
- 238000012239 gene modification Methods 0.000 description 1
- 102000034356 gene-regulatory proteins Human genes 0.000 description 1
- 108091006104 gene-regulatory proteins Proteins 0.000 description 1
- 230000005017 genetic modification Effects 0.000 description 1
- 235000013617 genetically modified food Nutrition 0.000 description 1
- 125000003147 glycosyl group Chemical group 0.000 description 1
- 229940029575 guanosine Drugs 0.000 description 1
- IIRDTKBZINWQAW-UHFFFAOYSA-N hexaethylene glycol Chemical group OCCOCCOCCOCCOCCOCCO IIRDTKBZINWQAW-UHFFFAOYSA-N 0.000 description 1
- HNDVDQJCIGZPNO-UHFFFAOYSA-N histidine Natural products OC(=O)C(N)CC1=CN=CN1 HNDVDQJCIGZPNO-UHFFFAOYSA-N 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 230000000415 inactivating effect Effects 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 206010022000 influenza Diseases 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 150000002484 inorganic compounds Chemical class 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000011005 laboratory method Methods 0.000 description 1
- 229940039696 lactobacillus Drugs 0.000 description 1
- 230000029226 lipidation Effects 0.000 description 1
- 230000002503 metabolic effect Effects 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000002715 modification method Methods 0.000 description 1
- 230000007498 myristoylation Effects 0.000 description 1
- 239000002105 nanoparticle Substances 0.000 description 1
- 108091027963 non-coding RNA Proteins 0.000 description 1
- 102000042567 non-coding RNA Human genes 0.000 description 1
- 108010058731 nopaline synthase Proteins 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 230000005257 nucleotidylation Effects 0.000 description 1
- 230000030648 nucleus localization Effects 0.000 description 1
- 150000002894 organic compounds Chemical class 0.000 description 1
- 230000003071 parasitic effect Effects 0.000 description 1
- 230000035699 permeability Effects 0.000 description 1
- 230000026731 phosphorylation Effects 0.000 description 1
- 238000006366 phosphorylation reaction Methods 0.000 description 1
- 102000020233 phosphotransferase Human genes 0.000 description 1
- 230000004962 physiological condition Effects 0.000 description 1
- 239000013600 plasmid vector Substances 0.000 description 1
- 210000002706 plastid Anatomy 0.000 description 1
- 229920002401 polyacrylamide Polymers 0.000 description 1
- 229920005862 polyol Polymers 0.000 description 1
- 230000019525 primary metabolic process Effects 0.000 description 1
- 230000009465 prokaryotic expression Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 235000019833 protease Nutrition 0.000 description 1
- 230000004952 protein activity Effects 0.000 description 1
- 108060006633 protein kinase Proteins 0.000 description 1
- 230000012743 protein tagging Effects 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 150000003230 pyrimidines Chemical class 0.000 description 1
- 108020003175 receptors Proteins 0.000 description 1
- 102000005962 receptors Human genes 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 108020004418 ribosomal RNA Proteins 0.000 description 1
- 108091092562 ribozyme Proteins 0.000 description 1
- 229920002477 rna polymer Polymers 0.000 description 1
- 238000007423 screening assay Methods 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 230000037432 silent mutation Effects 0.000 description 1
- HBMJWWWQQXIZIP-UHFFFAOYSA-N silicon carbide Chemical compound [Si+]#[C-] HBMJWWWQQXIZIP-UHFFFAOYSA-N 0.000 description 1
- 229910010271 silicon carbide Inorganic materials 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- 150000003384 small molecules Chemical class 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 108091035539 telomere Proteins 0.000 description 1
- 102000055501 telomere Human genes 0.000 description 1
- 210000003411 telomere Anatomy 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 229940094937 thioredoxin Drugs 0.000 description 1
- 230000000699 topical effect Effects 0.000 description 1
- 230000005026 transcription initiation Effects 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 230000014621 translational initiation Effects 0.000 description 1
- 230000032258 transport Effects 0.000 description 1
- 108010087967 type I signal peptidase Proteins 0.000 description 1
- 241001515965 unidentified phage Species 0.000 description 1
- DRTQHJPVMGBUCF-UHFFFAOYSA-N uracil arabinoside Natural products OC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-UHFFFAOYSA-N 0.000 description 1
- 229940045145 uridine Drugs 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
- C12N15/902—Stable introduction of foreign DNA into chromosome using homologous recombination
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12R—INDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
- C12R2001/00—Microorganisms ; Processes using microorganisms
- C12R2001/645—Fungi ; Processes using fungi
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12R—INDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
- C12R2001/00—Microorganisms ; Processes using microorganisms
- C12R2001/645—Fungi ; Processes using fungi
- C12R2001/885—Trichoderma
Definitions
- the disclosure relates to the field of molecular biology, to compositions and methods for the usage of selection marker swapping systems in microbial cells. Specifically, this disclosure pertains to compositions and methods for swapping between two selection marker constructs at a predetermined target sequence within a microbial genome, by replacing a first removable selection marker construct with a second removable selection marker construct, followed by the reverse replacement in consecutive transformation steps. Methods and compositions are also disclosed in which selection marker swapping systems are used in multiplex genome engineering, by combining selection marker swapping with simultaneously modifying at least one additional target sequence at a different genome sequence.
- sequence listing is submitted electronically via Patent Center as an XML-formatted sequence listing with a file named 20231107_NB42141 PCT_sequencelisting. xml created on November ?, 2023, and having a size of 39,290 bytes and is filed concurrently with the specification.
- sequence listing contained in this XML-formatted document is part of the specification and is herein incorporated by reference in its entirety.
- Genetic engineering of microbial cells is a cumbersome process (Li et al., 2017, Microb. Cel. I Fact 16: 168). Genetic engineering, in specific transformation, requires a method to create access to the genome, as well as a method to introduce a desired genome modification. Since transformation is usually only achieved in a small fraction of cells within a cell population, it is necessary to introduce genetic markers, with the goal of conferring a growth advantage on successfully modified cells under selection conditions (Botstein et al., 1979, Gene, Volume 8, Issue 1 , pg. 17-24). Each modification requires the availability of a selection marker, and consecutive modifications within a strain lineage consequently require the availability of multiple selection markers. Alternatively, marker-recycling strategies can be applied, which is particularly important when the number of readily available selection markers is limited (Hartl and Seiboth, 2005, Curr. Genet. 48:204-211 ).
- Standard methods for marker recycling include the use of so-called bidirectional selection markers, referring to marker systems that allow for both selection (positive selection after integration) and counter-selection (negative selection after inactivation or excision); bidirectional selection marker systems are often integrated with flanking repeat sequences, allowing for spontaneous loopingout of the marker cassette via homologous recombination (Alani et al., 1987, Genetics 116:541-545).
- the described constructs are usually integrated at the genome sequence intended for modification, and the marker cassette is sequentially excised by challenging the progeny of successfully modified cells with counterselection conditions (Alani et al., 1987, Genetics 116:541-545).
- Marker excision rates can be slow in the case of microorganisms with a low frequency of spontaneous homologous recombination, such as most filamentous fungi (van den Hondel and Punt, 1991 , Applied Molecular Genetics of Fungi, Cambridge University Press, Cambridge, UK, pp. 1-28).
- Recent advances in CRISPR/Cas-based genome engineering technology enable targeting of a wide range of sequences within a microbial genome, and via the introduction of double-strand breaks also for increased rates of homologous recombination by the cellular homology-directed repair machinery (Schuster and Kahmann, 2019, Fungal Genet Biol 130:43-53; Song et al., 2019, Appl Microbiol Biotechnol, 103:6919-6932).
- RGENs RNA-guided endonucleases consisting of a Cas endonuclease together with a guide RNA that harbors a specific DNA-recognition region (i.e. , the variable targeting domain).
- compositions and methods for the usage of selection marker swapping systems in microbial cells are disclosed herein. Specifically, this disclosure pertains to compositions and methods for swapping between two selection marker constructs at a predetermined target sequence within a microbial genome, by replacing a first removable selection marker construct with a second removable selection marker construct, followed by the reverse replacement in consecutive transformation steps. Methods and compositions are also disclosed in which selection marker swapping systems are used in multiplex genome engineering, by combining selection marker swapping with simultaneously modifying at least one additional target sequence at a different genome sequence.
- Described herein are genetic modification methods that do not rely on the laborious and time-consuming two-step marker-recycling process currently used in the art. Instead, a first selection marker cassette flanked by unique RNA-guided endonuclease (RGEN) target sequences integrated at a predetermined target sequence of a microbial cell is replaced by a second selection marker cassette flanked by different unique RGEN target sequences (referred to as selection marker swapping), wherein mentioned flanking unique RGEN target sequences enable the excision of the previously integrated selection marker cassette in consecutive transformation steps. Concomitantly with the described selection marker swapping at a predetermined target sequence, parallel modifications are performed at other target sequences, without the requirement to integrate additional selection markers.
- RGEN RNA-guided endonuclease
- the method comprises a method for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct, the method comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]-[Marker-2]-[C]) comprising a second selection marker ([Marker2]) flanked by a second unique target sequence ([C]), wherein said first RGEN in combination with said first DNA modification template enables the replacement of said first selection marker construct with said second selection marker construct via homo
- the method comprises a method for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct while simultaneously modifying at least one additional target sequence, the method comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]- [Marker-2]-[C]) comprising a second selection marker ([Marker-2]) flanked by a second unique target sequence ([C]), wherein said first RGEN
- the method comprises a method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell, the method comprising: a) providing one or more microbial cells having a second selection marker construct ([C]-[Marker-2]-[C]) integrated at a predetermined target sequence ([A]), wherein said second selection marker construct comprises a second selection marker ([Marker-2]) flanked by a first unique target sequence ([C]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]- [B]) comprising a first selection marker ([Marker-1]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the replacement of said second selection marker construct with said first selection marker construct via homologous recombination
- the method comprises a A method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell, the method comprising: a) providing one or more microbial cells having a second selection marker construct (referred to as [C]- [Marker2]-[C]) integrated at a predetermined target sequence ([A]), wherein said a second selection marker construct comprises a second selection marker ([Marker2]) flanked by a first unique target sequence ([C]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]- [B]) comprising a first selection marker ([Marked ]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the
- the modification at said at least one additional target sequence is selected from the group consisting of an insertion of a polynucleotide of interest, a deletion of a polynucleotide of interest, a replacement of a polynucleotide of interest, and any one combination thereof.
- the microbial cells of (a) have at least one additional target sequence ([M]), and simultaneously introducing a modification comprises introducing at least a second RGEN (RGEN-M) and at least a second DNA modification template ([UHA-M]-[lnsert]-[DHA-M]) comprising a polynucleotide of interest ([Insert]), wherein said second RGEN in combination with said second DNA modification template enables the integration of said polynucleotide of interest at said at least one additional target sequence ([M]).
- the microbial cells of (a) have at least a first additional target sequence [(Ma)] and a second additional target sequence p(M[3)] flanking a polynucleotide of interest to be deleted, and wherein said simultaneously introducing a modification comprises introducing at least a second RGEN (RGEN-Ma), a third RGEN (RGEN-M[3) and at least a third DNA modification template ([UHA-D]-[DHA- D]) comprising an Upstream Homology Arm ([UHA-D]) directly linked to Downstream Homology Arm ([DHA-D]), wherein said UHA-D and DHA-D are homologous to a genomic region of said microbial cell flanking said polynucleotide sequence of interest to be deleted, wherein said third RGEN-Ma and fourth RGEN-M[3 in combination with said third DNA modification template enables the deletion of said polynucleotide of interest.
- the microbial cells of (a) have at least a first additional target sequence (Ma) and a second additional target sequence (M
- Figure 1 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the first swapping step, using RGENs and DNA modification templates.
- 1 st step integrating the removable selection marker construct [B]-[Marker-1 ]-[B] at a predetermined target sequence ([A]);
- 2 nd step inserting a polynucleotide of interest ([Insert]) at an RGEN target sequence ([M]) while simultaneously swapping the previously integrated Marker-1 with Marker-2, by integrating the removable selection marker construct [C]-[Marker-2]-[C] in place of [B]-[Marker-1 ]-[B],
- Figure 9 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the first swapping step, using RGENs and DNA modification templates.
- Figure 2 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the second swapping step, using RGENs and DNA modification templates.
- 1 st step integrating the removable selection marker construct [C]-[Marker-2]-[C] at a predetermined target sequence ([A]);
- 2 nd step inserting a polynucleotide of interest ([Insert]) at an RGEN target sequence ([M]) while simultaneously swapping the previously integrated Marker-2 with Marker-1 , by integrating the removable selection marker construct [B]-[Marker-1]-[B] in place of [C]-[Marker-2]-[C].
- Figure 9 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the second swapping step, using RGENs and DNA modification templates.
- Figure 3 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the first swapping step, using RGENs and DNA modification templates.
- 1 st step integrating the removable selection marker construct [B]-[Marker-1 ]-[B] at a predetermined target sequence ([A]);
- 2 nd step deleting a polynucleotide of interest ([Delete]) flanked by target sequences [Ma] and [M
- Figure 9 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the first swapping step, using RGENs and DNA modification templates.
- Figure 4 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the second swapping step, using RGENs and DNA modification templates.
- 1 st step integrating the removable selection marker construct [C]-[Marker-2]-[C] at a predetermined target sequence ([A]);
- 2 nd step deleting a polynucleotide of interest ([Delete]) flanked by target sequences [Ma] and [M
- Figure 9 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the second swapping step, using RGENs and DNA modification templates.
- Figure 5 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the first swapping step, using RGENs and DNA modification templates.
- 1 st step integrating the removable selection marker construct [B]-[Marker-1]-[B] at a predetermined target sequence ([A]);
- 2 nd step replacing a first polynucleotide of interest ([Delete]) flanked by target sequences [Ma] and [M
- Figure 6 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the second swapping step, using RGENs and DNA modification templates.
- 1 st step integrating the removable selection marker construct [C]-[Marker-2]-[C] at a predetermined target sequence ([A]);
- 2 nd step replacing a first polynucleotide of interest ([Delete]) flanked by target sequences [Ma] and [M
- Figure 9 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the second swapping step, using RGENs and DNA modification templates.
- Figure 7 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with both swapping steps, using RGENs and DNA modification templates.
- 1 st step inserting a first polynucleotide of interest ([Insert-1]) at a first RGEN target sequence ([M1]) while simultaneously swapping a previously integrated Marker-2 with Marker-1 , by integrating the removable selection marker construct [B]-[Marker-1]-[B] in place of [C]-[Marker-2]-[C];
- 2 nd step inserting a second polynucleotide of interest ([Insert-2]) at a second RGEN target sequence ([M2]) while simultaneously swapping the previously integrated Marker-1 with Marker-2, by integrating the removable selection marker construct [C]-[Marker-2]-[C] in place of [B]-[Marker-1]-[B],
- 1 st step inserting a first polynucleotide of interest ([In
- Figure 8 Schematic representation of selection marker swapping while simultaneously introducing at least two additional modifications in parallel with both swapping steps, using RGENs and DNA modification templates.
- 1 st step inserting a first polynucleotide of interest ([Insert-1]) at a first RGEN target sequence ([M1]) and a second polynucleotide of interest ([Insert-2]) at a second RGEN target sequence ([M2]) while simultaneously swapping a previously integrated Marker-2 with Marker- 1 , by integrating the removable selection marker construct [B]-[Marker-1]-[B] in place of [C]-[Marker-2]-[C];
- 2 nd step inserting a third polynucleotide of interest ([Insert-3]) at a third RGEN target sequence ([M3]) and a fourth polynucleotide of interest ([Insert-4]) at a fourth RGEN target sequence ([M4]) while simultaneously swap
- compositions and methods for the usage of selection marker swapping systems in microbial cells are disclosed herein. Specifically, this disclosure pertains to compositions and methods for swapping between two selection marker constructs at a predetermined target sequence within a microbial genome, by replacing a first removable selection marker construct with a second removable selection marker construct, followed by the reverse replacement in consecutive transformation steps. Methods and compositions are also disclosed in which selection marker swapping systems are used in multiplex genome engineering, by combining selection marker swapping with simultaneously modifying at least one additional target sequence at a different genome sequence.
- Described herein are genetic modification methods that do not rely on the laborious and time-consuming two-step marker-recycling process currently used in the art. Instead, a first selection marker cassette flanked by unique RNA-guided endonuclease (RGEN) target sequences integrated at a predetermined target sequence of a microbial cell is replaced by a second selection marker cassette flanked by different unique RGEN target sequences (referred to as selection marker swapping), wherein mentioned flanking unique RGEN target sequences enable the excision of the previously integrated selection marker cassette in consecutive transformation steps. Concomitantly with the described selection marker swapping at a predetermined target sequence, parallel modifications are performed at other target sequences, without the requirement to integrate additional selection markers.
- RGEN RNA-guided endonuclease
- CRISPR loci refers to certain genetic loci encoding components of DNA cleavage systems, for example, used by bacterial and archaeal cells to destroy foreign DNA (Horvath and Barrangou, 2010, Science 327:167-170; W02007/025097, published March 1 , 2007).
- a CRISPR locus can consist of a CRISPR array, comprising short direct repeats (CRISPR repeats) separated by short variable DNA sequences (called ‘spacers’), which can be flanked by diverse Cas (CRISPR-associated) genes. The number of CRISPR- associated genes at a given CRISPR locus can vary between species.
- CRISPR/Cas systems have been described including Class 1 systems, with multisubunit effector complexes (comprising type I, type III and type IV subtypes), and Class 2 systems, with single protein effectors (comprising type II and type V subtypes, such as but not limiting to Cas9, Cpf1 , C2c1 , C2c2, C2c3).
- Class 1 systems (Makarova et al. 2015, Nature Reviews; Microbiology Vol.
- the type II CRISPR/Cas system from bacteria employs a crRNA (CRISPR RNA) and tracrRNA (trans-activating CRISPR RNA) to guide the Cas endonuclease to its DNA target.
- CRISPR RNA crRNA
- tracrRNA trans-activating CRISPR RNA
- the crRNA contains a spacer region complementary to one strand of the double strand DNA target and a region that base pairs with the tracrRNA (trans-activating CRISPR RNA) forming a RNA duplex that directs the Cas endonuclease to cleave the DNA target. Spacers are acquired through a not fully understood process involving Cas1 and Cas2 proteins. All type II CRISPR/Cas loci contain cas1 and cas2 genes in addition to the cas9 gene (Chylinski et al., 2013, RNA Biology 10:726-737; Makarova et al. 2015, Nature Reviews Microbiology Vol. 13:1-15).
- Type II CRISPR-Cas loci can encode a tracrRNA, which is partially complementary to the repeats within the respective CRISPR array, and can comprise other proteins such as Csn1 and Csn2.
- Type I CRISPR- Cas (CRISPR-associated) systems consist of a complex of proteins, termed Cascade (CRISPR-associated complex for antiviral defense), which function together with a single CRISPR RNA (crRNA) and Cas3 to defend against invading viral DNA (Brouns, S.J.J. et al. Science 321 :960-964; Makarova et al. 2015, Nature Reviews; Microbiology Vol. 13:1-15, which are incorporated in their entirety herein).
- Cas gene herein refers to a gene that is generally coupled, associated or close to, or in the vicinity of flanking CRISPR loci.
- the terms “Cas gene”, “cas gene”, “CRISPR-associated (Cas) gene” and “Clustered Regularly Interspaced Short Palindromic Repeats-associated gene” are used interchangeably herein.
- the term “Cas protein” or “Cas polypeptide” refers to a polypeptide encoded by a Cas (CRISPR-associated) gene.
- a Cas protein includes a Cas endonuclease.
- a Cas protein may be a bacterial or archaeal protein.
- Type l-lll CRISPR Cas proteins herein are typically prokaryotic in origin; type I and III Cas proteins can be derived from bacterial or archaeal species, whereas type II Cas proteins (i.e. , a Cas9) can be derived from bacterial species, for example.
- Cas proteins include one or more of Cas1 , Cas1 B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, Cas10, Csy1 , Csy2, Csy3, Cse1 , Cse2, Csc1 , Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1 , Cmr3, Cmr4, Cmr5, Cmr6, Csb1 , Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1 , Csx15, Csf1 , Csf2, Csf3, Csf4, homologs thereof, or modified versions thereof.
- a Cas protein includes a Cas9 protein, a Cpf1 protein, a C2c1 protein, a C2c2 protein, a C2c3 protein, Cas3, Cas3- HD, Cas 5, Cas7, Cas8, Casio, or combinations or complexes of these.
- Cas endonuclease refers to a Cas polypeptide (Cas protein) that, when in complex with a suitable polynucleotide component, is capable of recognizing, binding to, and optionally nicking or cleaving all or part of a specific DNA target sequence.
- a Cas endonuclease is guided by the guide polynucleotide to recognize, bind to, and optionally nick or cleave all or part of a specific target sequence in double stranded DNA (e.g., at a target sequence in the genome of a cell).
- a Cas endonuclease described herein comprises one or more nuclease domains.
- the Cas endonucleases employed in genome DNA modification methods described herein are endonucleases that introduce single or double-strand breaks into the DNA at the genome target sequence.
- a Cas endonuclease may lack DNA cleavage or nicking activity, but can still specifically bind to a DNA target sequence when complexed with a suitable RNA component.
- a polypeptide referred to as a “Cas9” (formerly referred to as Cas5, Csn1 , or Csx12) or a “Cas9 endonuclease” or having “Cas9 endonuclease activity” refers to a Cas endonuclease that forms a complex with a crNucleotide and a tracrNucleotide, or with a single guide polynucleotide, for specifically binding to, and optionally nicking or cleaving all or part of a DNA target sequence.
- a Cas9 endonuclease comprises a RuvC nuclease domain and an HNH (H-N-H) nuclease domain, each of which can cleave a single DNA strand at a target sequence (the concerted action of both domains leads to DNA double-strand cleavage, whereas activity of one domain leads to a nick).
- the RuvC domain comprises subdomains I, II and III, where domain I is located near the N-terminus of Cas9 and subdomains II and III are located in the middle of the protein, flanking the HNH domain (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13:1-15, Hsu et al, 2013, Cell 157:1262-1278).
- Cas9 endonucleases are typically derived from a type II CRISPR system, which includes a DNA cleavage system utilizing a Cas9 endonuclease in complex with at least one polynucleotide component.
- a Cas9 can be in complex with a CRISPR RNA (crRNA) and a trans-activating CRISPR RNA (tracrRNA).
- a Cas9 can be in complex with a single guide RNA (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13:1-15).
- a “functional fragment “, “fragment that is functionally equivalent” and “functionally equivalent fragment” of a Cas endonuclease are used interchangeably herein, and refer to a portion or subsequence of the Cas endonuclease in which the ability to recognize, bind to, and optionally unwind, nick or cleave (introduce a single or double-strand break in) the target sequence is retained.
- Determining binding activity and/or endonucleolytic activity of a Cas protein herein toward a specific target DNA sequence may be assessed by any suitable assay known in the art, such as disclosed in U.S. Patent No. 8697359, which is disclosed herein by reference. A determination can be made, for example, by expressing a Cas protein and suitable RNA component in host cell/organism, and then examining the predicted DNA target sequence for the presence of an indel (a Cas protein in this particular assay would have endonucleolytic activity [single or double-strand cleaving activity]).
- Examining for the presence of an indel at the predicted target sequence could be done via a DNA sequencing method or by inferring indel formation by assaying for loss of function of the target sequence, for example.
- Cas protein activity can be determined by expressing a Cas protein and suitable RNA component in a host cell/organism that has been provided a DNA modification template comprising a sequence homologous to a sequence in at or near the target sequence. The presence of DNA modification template at the target sequence (such as would be predicted by successful HR between the donor and target sequences) would indicate that targeting occurred.
- Cas endonucleases herein can be Cas endonucleases from any of the following genera: Aeropyrum, Pyrobaculum, Sulfolobus, Archaeoglobus, Haloarcula, Methanobacteriumn, Methanococcus, Methanosarcina, Methanopyrus, Pyrococcus, Picrophilus, Thernioplasnia, Corynebacterium, Mycobacterium, Streptomyces, Aquifrx, Porphvromonas, Chlorobium, Thermus, Bacillus, Listeria, Staphylococcus, Clostridium, Thermoanaerobacter, Mycoplasma, Fusobacterium, Azarcus, Chromobacterium, Neisseria, Nitrosomonas, Desulfovibrio, Geobacter, Myrococcus, Campylobacter, Wolinella, Acinetobacter, Erwinia, Escherichia, Legionella, Me
- a Cas endonuclease herein can be encoded, for example, by any Cas endonuclease as disclosed in U.S. Appl. Publ. No. 2010/0093617, which is incorporated herein by reference.
- a Cas9 endonuclease herein may be derived from a Streptococcus (e.g., S. pyogenes, S. pneumoniae, S. thermophilus, S. agalactiae, S. parasanguinis, S. oralis, S. salivarius, S. macacae, S. dysgalactiae, S. anginosus, S. constellatus, S. pseudoporcinus, S. mutans), Listeria (e.g., L innocua), Spiroplasma (e.g., S. apis, S.
- a Streptococcus e.g., S. pyogenes, S. pneumoniae, S. thermophilus, S. agalactiae, S. parasanguinis, S. oralis, S. salivarius, S. macacae, S. dysgalactiae, S. anginosus, S. constellatus
- P. syrphidicola Peptostreptococcaceae
- Atopobium Porphyromonas (e.g., P. catoniae), Prevotella (e.g., P. intermedia), Veillonella, Treponema (e.g., T. socranskii, T. denticola), Capnocytophaga, Finegoldia (e.g., F. magna), Coriobacteriaceae (e.g., C. bacterium), Olsenella (e.g., 0. profusa), Haemophilus (e.g., H. sputorum, H. pittmaniae), Pasteurella (e.g., P.
- Olivibacter e.g., O. sifiensis
- Epilithonimonas e.g., E. tenax
- Mesonia e.g., M. mobilis
- Lactobacillus e.g., L plantarum
- Bacillus e.g., B. cereus
- Aquimarina e.g., A. muelleri
- Chryseobacterium e.g., C. palustre
- Bacteroides e.g., B. graminisolvens
- Neisseria e.g., N. meningitidis
- Francisella e.g., F. novicida
- Flavobacterium e.g., F.
- a S. pyogenes Cas9 endonuclease is described herein.
- a Cas9 endonuclease can be any of the Cas9 proteins disclosed in Chylinski et al. (RNA Biology 10:726-737), which is incorporated herein by reference.
- sequence of a Cas9 endonuclease herein can comprise, for example, any of the Cas9 amino acid sequences disclosed in GenBank Accession Nos. G3ECR1 (S. thermophilus), WP_026709422, WP_027202655, WP_027318179, WP-027347504, WP_027376815, WP_027414302, WP_027821588, WP_027886314, WP_027963583, WP_028123848, WP_028298935, Q03JI6 (S. thermophilus), EGP66723, EGS38969, EGV05092, EHI65578 (S.
- EIC75614 (S. oralis), EID22027 (S. constellatus), EIJ69711 , EJP22331 (S. oralis), EJP26004 (S. anginosus), EJP30321 , EPZ44001 (S. pyogenes), EPZ46028 (S. pyogenes), EQL78043 (S. pyogenes), EQL78548 (S. pyogenes), ERL10511 , ERL12345, ERL19088 (S. pyogenes), ESA57807 (S. pyogenes), ESA59254 (S.
- ESU85303 S. pyogenes
- ETS96804 UC75522, EGR87316 (S. dysgalactiae), EGS33732, EGV01468 (S. oralis), EHJ52063 (S. macacae), EID26207 (S. oralis), EID33364, EIG27013 (S. parasanguinis), EJF37476, EJO19166 (Streptococcus sp. BS35b), EJU16049, EJU32481 , YP_006298249, ERF61304, ERK04546, ETJ95568 (S.
- a Cas9 protein herein can be encoded by any of SEQ ID NOs:462 (S. thermophilus), 474 (S. thermophilus), 489 (S. agalactiae), 494 (S. agalactiae), 499 (S. mutans), 505 (S. pyogenes), or 518 (S. pyogenes) as disclosed in U.S. Appl. Publ. No. 2010/0093617 (incorporated herein by reference), for example.
- amino acid at each position in a Cas9 can be as provided in the disclosed sequences or substituted with a conserved amino acid residue (“conservative amino acid substitution”) as follows:
- Fragments and variants can be obtained via methods such as site-directed mutagenesis and synthetic construction. Methods for measuring endonuclease activity are well known in the art such as, but not limiting to, PCT/US 13/39011 , filed May 1 , 2013, PCT/US16/32073 filed May 12, 2016, PCT/US 16/32028 filed May 12, 2016, incorporated by reference herein).
- the Cas endonuclease can comprise a modified form of the Cas polypeptide.
- the modified form of the Cas polypeptide can include an amino acid change (e.g., deletion, integration, or substitution) that reduces the naturally occurring nuclease activity of the Cas protein.
- the modified form of the Cas protein has less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, less than 5%, or less than 1% of the nuclease activity of the corresponding wild-type Cas polypeptide (US patent application US20140068797 A1 , published on March 6, 2014).
- the modified form of the Cas polypeptide has no substantial nuclease activity and is referred to as catalytically “inactivated Cas” or “deactivated Cas (dCas).”
- An inactivated Cas/deactivated Cas includes a deactivated Cas endonuclease (dCas).
- a catalytically inactive Cas can be fused to a heterologous sequence.
- Other Cas9 variants lack the activity of either the HNH or the RuvC nuclease domains and are thus proficient to cleave only 1 strand of the DNA (nickase variants).
- Recombinant DNA constructs expressing the Cas endonuclease described herein can be transiently integrated into a microbial cell or stably integrated into the genome of a microbial cell.
- a Cas endonuclease can be part of a fusion protein comprising one or more heterologous protein domains (e.g., 1 , 2, 3, or more domains in addition to the Cas polypeptide).
- a fusion protein may comprise any additional protein sequence, and optionally a linker sequence between any two domains, such as between Cas polypeptide and a first heterologous domain.
- a Cas endonuclease can also be in fusion with a protein that binds DNA molecules or other molecules, such as maltose binding protein (MBP), S-tag, Lex A DNA binding domain (DBD), GAL4A DNA binding domain, and herpes simplex virus (HSV) VP16.
- MBP maltose binding protein
- DBD Lex A DNA binding domain
- HSV herpes simplex virus
- a Cas endonuclease can comprise a heterologous regulatory element such as a nuclear localization sequence (NLS).
- a heterologous NLS amino acid sequence may be of sufficient strength to drive accumulation of a Cas endonuclease in a detectable amount in the nucleus of a cell herein.
- An NLS may comprise one (monopartite) or more (e.g., bipartite) short sequences (e.g., 2 to 20 residues) of basic, positively charged residues (e.g., lysine and/or arginine), and can be located anywhere in a Cas amino acid sequence but such that it is exposed on the protein surface.
- An NLS may be operably linked to the N-terminus or C-terminus of a Cas protein herein, for example.
- Two or more NLS sequences can be linked to a Cas protein, for example, such as on both the N- and C-termini of a Cas protein.
- the Cas gene can be operably linked to a SV40 nuclear targeting signal upstream of the Cas codon region and a bipartite VirD2 nuclear localization signal (Tinland et al. (1992) Proc. Natl. Acad. Sci. USA 89:7442-6) downstream of the Cas codon region.
- suitable NLS sequences herein include those disclosed in U.S. Patent Nos. 6660830 and 7309576, which are both incorporated by reference herein.
- a heterologous NLS amino acid sequence include plant, viral and mammalian nuclear localization signals.
- a catalytically active and/ or inactive Cas endonuclease can be fused to a heterologous sequence (US patent application US20140068797 A1 , published on March 6, 2014).
- Suitable fusion partners include, but are not limited to, a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target DNA or on a polypeptide (e.g., a histone or other DNA-binding protein) associated with the target DNA.
- Additional suitable fusion partners include, but are not limited to, a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity.
- fusion partners include, but are not limited to, a polypeptide that directly provides for increased transcription of the target nucleic acid (e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator, etc.).
- a catalytically inactive Cas9 endonuclease can also be fused to a Fokl nuclease to generate double-strand breaks (Guilinger et al. Nature biotechnology, volume 32, number s, June 2014).
- the term “guide polynucleotide”, relates to a polynucleotide sequence that can form a complex with a Cas endonuclease, and enables the Cas endonuclease to recognize, bind to, and optionally nick or cleave a DNA target sequence (also referred to as target sequence).
- the guide polynucleotide can be a single molecule or a double molecule.
- the guide polynucleotide sequence can be a RNA sequence, a DNA sequence, or a combination thereof (a RNA-DNA combination sequence).
- the guide polynucleotide can comprise at least one nucleotide, phosphodiester bond or linkage modification such as, but not limited, to Locked Nucleic Acid (LNA), 5-methyl dC, 2,6-Diaminopurine, 2’-Fluoro A, 2’- Fluoro U, 2'-O-Methyl RNA, phosphorothioate bond, linkage to a cholesterol molecule, linkage to a polyethylene glycol molecule, linkage to a spacer 18 (hexaethylene glycol chain) molecule, or 5’ to 3’ covalent linkage resulting in circularization.
- a guide polynucleotide that solely comprises ribonucleic acids is also referred to as a “guide RNA” or “gRNA”.
- the guide polynucleotide can be a double molecule (also referred to as duplex guide polynucleotide) comprising a crNucleotide sequence and a tracrNucleotide sequence.
- the crNucleotide includes a first nucleotide sequence domain (referred to as Variable Targeting domain or VT domain) that can hybridize to a nucleotide sequence in a target DNA and a second nucleotide sequence (also referred to as a tracr mate sequence) that is part of a Cas endonuclease recognition (CER) domain.
- VT domain Variable Targeting domain
- CER Cas endonuclease recognition
- the tracr mate sequence can hybridized to a tracrNucleotide along a region of complementarity and together form the Cas endonuclease recognition domain or CER domain.
- the CER domain is capable of interacting with a Cas endonuclease polypeptide.
- the crNucleotide and the tracrNucleotide of the duplex guide polynucleotide can be RNA, DNA, and/or RNA-DNA- combination sequences.
- the crNucleotide molecule of the duplex guide polynucleotide is referred to as “crDNA” (when composed of a contiguous stretch of DNA nucleotides) or “crRNA” (when composed of a contiguous stretch of RNA nucleotides), or “crDNA-RNA” (when composed of a combination of DNA and RNA nucleotides).
- the crNucleotide can comprise a fragment of the crRNA naturally occurring in Bacteria and Archaea.
- the size of the fragment of the crRNA naturally occurring in Bacteria and Archaea that can be present in a crNucleotide disclosed herein can range from, but is not limited to, 2, 3, 4, 5, 6, 7, 8, 9,10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides.
- the tracrNucleotide is referred to as “tracrRNA” (when composed of a contiguous stretch of RNA nucleotides) or “tracrDNA” (when composed of a contiguous stretch of DNA nucleotides) or “tracrDNA-RNA” (when composed of a combination of DNA and RNA nucleotides.
- the RNA that guides the RNA/ Cas9 endonuclease complex is a duplexed RNA comprising a duplex crRNA-tracrRNA.
- the guide polynucleotide includes a dual RNA molecule comprising a chimeric non-naturally occurring crRNA (non-covalently) linked to at least one tracrRNA.
- a chimeric non-naturally occurring crRNA includes a crRNA that comprises regions that are not found together in nature (i.e. , they are heterologous with each other).
- a non-naturally occurring crRNA is a crRNA wherein the naturally occurring spacer sequence is exchanged for a heterologous Variable Targeting domain.
- a non-naturally occurring crRNA comprises a first nucleotide sequence domain (referred to as Variable Targeting domain or VT domain) that can hybridize to a nucleotide sequence in a target DNA, linked to a second nucleotide sequence (also referred to as a tracr mate sequence) such that the first and second sequence are not found linked together in nature.
- VT domain Variable Targeting domain
- tracr mate sequence a nucleotide sequence domain
- the guide polynucleotide can also be a single molecule (also referred to as single guide polynucleotide) comprising a crNucleotide sequence linked to a tracrNucleotide sequence.
- the single guide polynucleotide comprises a first nucleotide sequence domain (referred to as Variable Targeting domain or VT domain) that can hybridize to a nucleotide sequence in a target DNA and a Cas endonuclease recognition domain (CER domain), that interacts with a Cas endonuclease polypeptide.
- domain it is meant a contiguous stretch of nucleotides that can be RNA, DNA, and/or RNA-DNA-combination sequence.
- the VT domain and /or the CER domain of a single guide polynucleotide can comprise a RNA sequence, a DNA sequence, or a RNA-DNA-combination sequence.
- the single guide polynucleotide being comprised of sequences from the crNucleotide and the tracrNucleotide may be referred to as “single guide RNA” (when composed of a contiguous stretch of RNA nucleotides) or “single guide DNA” (when composed of a contiguous stretch of DNA nucleotides) or “single guide RNA-DNA” (when composed of a combination of RNA and DNA nucleotides).
- the single guide polynucleotide can form a complex with a Cas endonuclease, wherein said guide polynucleotide/Cas endonuclease complex (also referred to as a guide polynucleotide/Cas endonuclease system) can direct the Cas endonuclease to a genome target sequence, enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double-strand break) the target sequence.
- a guide polynucleotide/Cas endonuclease complex also referred to as a guide polynucleotide/Cas endonuclease system
- can direct the Cas endonuclease to a genome target sequence enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double
- variable targeting domain or “VT domain” is used interchangeably herein and includes a nucleotide sequence that can hybridize (is complementary) to one strand (nucleotide sequence) of a double strand DNA target sequence.
- the % complementation between the first nucleotide sequence domain (VT domain) and the target sequence can be at least 50%, 51 %, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 63%, 65%, 66%, 67%, 68%, 69%, 70%, 71%,
- variable targeting domain can be at least 12, 13, 14, 15, 16, 17, 18, 19,
- variable targeting domain can comprises a contiguous stretch of 12 to 30, 12 to 29, 12 to 28, 12 to 27, 12 to 26, 12 to 25, 12 to 26, 12 to 25, 12 to 24, 12 to 23,
- variable targeting domain can be composed of a DNA sequence, a RNA sequence, a modified DNA sequence, a modified RNA sequence, or any combination thereof.
- the VT domain can be complementary to target sequences derived from prokaryotic or eukaryotic DNA.
- CER domain of a guide polynucleotide
- CER domain includes a nucleotide sequence that interacts with a Cas endonuclease polypeptide.
- a CER domain comprises a tracrNucleotide mate sequence followed by a tracrNucleotide sequence.
- the CER domain can be composed of a DNA sequence, a RNA sequence, a modified DNA sequence, a modified RNA sequence (see for example US 2015- 0059010 A1 , published on February 26, 2015, incorporated in its entirety by reference herein), or any combination thereof.
- the nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide can comprise a RNA sequence, a DNA sequence, or a RNA-DNA combination sequence.
- the nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide (also referred to as “loop”) can be at least 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37,
- the loop can be 3-4, 3-5, 3-6, 3-7, 3-8, 3-9, 3-10, 3-11 , 3-12,
- nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide can comprise a tetraloop sequence, such as, but not limiting to a GAAA tetraloop sequence.
- the single guide polynucleotide includes a chimeric non-naturally occurring single guide RNA.
- the terms “single guide RNA” and “sgRNA” are used interchangeably herein and relate to a synthetic fusion of two RNA molecules, a crRNA (CRISPR RNA) comprising a variable targeting domain (linked to a tracr mate sequence that hybridizes to a tracrRNA), fused to a tracrRNA (trans-activating CRISPR RNA).
- CRISPR RNA crRNA
- a chimeric non-naturally occurring guide RNA comprising regions that are not found together in nature (i.e. , they are heterologous with each other).
- a chimeric non-naturally occurring guide RNA comprising a first nucleotide sequence domain (referred to as Variable Targeting domain or VT domain) that can hybridize to a nucleotide sequence in a target DNA, linked to a second nucleotide sequence that can recognize the Cas endonuclease, such that the first and second nucleotide sequence are not found linked together in nature.
- VT domain Variable Targeting domain
- the chimeric non-naturally occurring guide RNA can comprise a crRNA or and a tracrRNA of the type II CRISPR/Cas system that can form a complex with a type II Cas endonuclease, wherein said guide RNA/Cas endonuclease complex can direct the Cas endonuclease to a DNA target sequence, enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double-strand break) the DNA target sequence.
- the guide polynucleotide can be produced by any method known in the art, including chemically synthesizing guide polynucleotides (such as but not limiting to Hendel et al. 2015, Nature Biotechnology 33, 985-989), in vitro generated guide polynucleotides, and/or self-splicing guide RNAs (such as but not limiting to Xie et al. 2015, PNAS 112:3570-3575).
- RNA components such as guide RNA in prokaryotic cells for performing Cas9-mediated DNA targeting have been described (WO20 16/099887 published on June 23, 2016 and WO2018/156705 published on August 30, 2018)
- a subject nucleic acid comprises a modification or sequence that provides for an additional desirable feature (e.g., modified or regulated stability; subcellular targeting; tracking, e.g., a fluorescent label; a binding site for a protein or protein complex; etc.).
- an additional desirable feature e.g., modified or regulated stability; subcellular targeting; tracking, e.g., a fluorescent label; a binding site for a protein or protein complex; etc.
- Nucleotide sequence modification of the guide polynucleotide, VT domain and/or CER domain can be selected from, but not limited to , the group consisting of a 5' cap, a 3' polyadenylated tail, a riboswitch sequence, a stability control sequence, a sequence that forms a dsRNA duplex, a modification or sequence that targets the guide poly nucleotide to a subcellular location, a modification or sequence that provides for tracking , a modification or sequence that provides a binding site for proteins , a Locked Nucleic Acid (LNA), a 5-methyl dC nucleotide, a 2,6-Diaminopurine nucleotide, a 2’-Fluoro A nucleotide, a 2’-Fluoro U nucleotide; a 2'-O-Methyl RNA nucleotide, a phosphorothioate bond, linkage to a cholesterol molecule,
- the additional beneficial feature is selected from the group of a modified or regulated stability, a subcellular targeting, tracking, a fluorescent label, a binding site for a protein or protein complex, modified binding affinity to complementary target sequence, modified resistance to cellular degradation, and increased cellular permeability.
- RGEN RNA-guided endonuclease
- guide RNA/Cas endonuclease complex guide RNA/Cas endonuclease system
- guide RNA/Cas complex guide RNA/Cas system
- gRNA/Cas complex gRNA/Cas system
- RNP ribonucleoprotein
- an RNA component of an RGEN contains sequence that is complementary to a DNA sequence in a target sequence. Based on this complementarity, an RGEN can specifically recognize and cleave a particular DNA target sequence.
- An RGEN herein can comprise Cas protein(s) and suitable RNA component(s) of any of the four known CRISPR systems (Horvath and Barrangou, Science 327:167-170) such as a type I, II, or III CRISPR system.
- An RGEN in preferred embodiments comprises a Cas9 endonuclease (CRISPR II system) and at least one RNA component (e.g., a crRNA and tracrRNA, or a gRNA).
- Any guided endonuclease can be used in the methods disclosed herein.
- Such endonucleases include, but are not limited to Cas9 and Cpf1 endonucleases.
- Many endonucleases have been described to date that can recognize specific PAM sequences (see for example -US patent applicationl 4/772711 filed March 12, 2014 and Zetsche B et al. 2015. Cell 163, 1013) and cleave the target DNA at a specific positions. It is understood that based on the methods and embodiments described herein utilizing a guided Cas system, one can now tailor these methods such that they can utilize any guided endonuclease system.
- the present disclosure further provides expression constructs for expressing in a microbial cell a guide RNA/Cas system that is capable of recognizing, binding to, and optionally nicking, unwinding, or cleaving all or part of a target sequence.
- Polynucleotides disclosed herein such as a polynucleotide of interests, a synthetic sequence of interest, a heterologous sequence of interest, a homologous sequence of interest, a gene of interest, can be provided in an expression cassette (also referred to as DNA construct) for expression in an organism of interest.
- expression refers to the production of a functional end-product (e.g., a crRNA, a tracrRNA, a mRNA, a guide RNA, sRNA, siRNA, antisense RNA, or a polypeptide (protein) in either precursor or mature form.
- the term "expression” includes any step involved in the production of a polypeptide including, but not limited to, transcription, post-transcriptional modification, translation, post- translational modification, and secretion.
- the expression cassette can include 5' and 3' regulatory sequences and or tags and synthetic sequences operably linked to a polynucleotide as disclosed herein.
- the expression cassettes disclosed herein may include in the 5'-3' direction of transcription, a transcriptional and translational initiation region (i.e. , a promoter), a 5’ untranslated region, polynucleotides encoding various proteins tags and sequences, a polynucleotide of interest, and a transcriptional and translational termination region (i.e., termination region) functional in the Micorbial(host) cell.
- Expression cassettes are also provided with a plurality of restriction sites and/or recombination sites for integration of the polynucleotide to be under the transcriptional regulation of the regulatory regions described elsewhere herein.
- the regulatory regions i.e., promoters, transcriptional regulatory regions, and translational termination regions
- the polynucleotide of interest may be native/analogous to the host cell or to each other.
- Other polynucleotide sequences encoding various protein sequences may be appended to either the 5’ or 3’ end of the polynucleotide of interest.
- the regulatory regions and/or the polynucleotide of interest may be heterologous to the host cell or to each other.
- polynucleotides disclosed herein can be stacked with any combination of polynucleotide sequences of interest or expression cassettes as disclosed elsewhere herein or known in the art.
- the stacked polynucleotides may be operably linked to the same promoter as the initial polynucleotide, or may be operably linked to a separate promoter polynucleotide.
- Expression cassettes may comprise a promoter operably linked to a polynucleotide of interest, along with a corresponding termination region.
- the termination region may be native to the transcriptional initiation region, may be native to the operably linked polynucleotide of interest or to the promoter sequences, may be native to the host organism, or may be derived from another source (i.e., foreign or heterologous).
- Convenient termination regions are available from phage sequences, eg. lambda phage to termination region or strong terminators from prokaryotic ribosomal RNA operons or genes involved in the secretion of extracellular proteins (eg. aprE from B. subtilis, aprL from B. licheniformis).
- Convenient termination regions are available from the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also Guerineau et al. (1991 ) Mol. Gen. Genet. 262:141-144; Proudfoot (1991 ) Cell 64:671-674; Sanfacon et al. (1991) Genes Dev. 5:141-149; Mogen et al. (1990) Plant Cell 2: 1261 -1272; Munroe et al. (1990) Gene 91 : 151 -158; Ballas et al. (1989) Nucleic Acids Res. 17:7891-7903; and Joshi et al. (1987) Nucleic Acids Res. 15:9627-9639.
- the polynucleotides of interest may be optimized for increased expression in the transformed or targeted organism.
- the polynucleotides can be synthesized or altered to use organism-preferred codons for improved expression.
- Additional sequence modifications are known to enhance gene expression in a cellular host. These include elimination of sequences encoding spurious polyadenylation signals, exon-intron splice site signals, transposon-like repeats, and other such well-characterized sequences that may be deleterious to gene expression.
- the G-C content of the sequence may be adjusted to levels average for a given cellular host, as calculated by reference to known genes expressed in the host cell. When possible, the sequence is modified to avoid predicted hairpin secondary mRNA structures.
- the expression cassettes may additionally contain 5' leader sequences.
- leader sequences can act to enhance translation or the level of RNA stability.
- Translation leaders are known in the art and include: picornavirus leaders, for example, EMCV leader (Encephalomyocarditis 5' noncoding region) (Elroy-Stein et al. (1989) Proc. Natl. Acad. Sci. USA 86:6126-6130); potyvirus leaders, for example, TEV leader (Tobacco Etch Virus) (Gallie et al. (1995) Gene 165(2):233-238), MDMV leader (Maize Dwarf Mosaic Virus) (Johnson et al.
- the various DNA fragments may be modified so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame.
- adapters or linkers may be employed to join the DNA fragments or other modifications may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like.
- in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions may be involved.
- a nucleotide sequence encoding a guide RNA and/or a Cas protein is operably linked to a control element, e.g., a transcriptional control element, such as a promoter.
- a control element e.g., a transcriptional control element, such as a promoter.
- the transcriptional control element may be functional in either a eukaryotic cell or a prokaryotic cell.
- prokaryotic promoters promoter functional in a prokaryotic cell
- promoter sequence regions for use in the expression of genes, open reading frames (ORFs) thereof and/or variant sequences thereof in prokaryotic cells are generally known on one of skill in the art.
- Non-limiting examples of suitable eukaryotic promoters are generally known on one of skill in the art.
- recombinant refers to an artificial combination of two otherwise separated segments of sequence, e.g., by chemical synthesis or by the modification of isolated segments of nucleic acids by genetic engineering techniques.
- the term “recombinant,” when used in reference to a biological component or composition indicates that the biological component or composition is in a state that is not found in nature. In other words, the biological component or composition has been modified by human intervention from its natural state.
- a recombinant cell encompasses a cell that expresses one or more genes that are not found in its native (i.e., non-recombinant) cell, a cell that expresses one or more native genes in an amount that is different than its native cell, and/or a cell that expresses one or more native genes under different conditions than its native cell.
- Recombinant nucleic acids may differ from a native sequence by one or more nucleotides, be operably linked to heterologous sequences (e.g., a heterologous promoter, a sequence encoding a non-native or variant signal sequence, etc.), be devoid of intronic sequences, and/or be in an isolated form.
- Recombinant polypeptides/enzymes may differ from a native sequence by one or more amino acids, may be fused with heterologous sequences, may be truncated or have internal deletions of amino acids, may be expressed in a manner not found in a native cell (e.g., from a recombinant cell that over-expresses the polypeptide due to the presence in the cell of an expression vector encoding the polypeptide), and/or be in an isolated form. It is emphasized that in some embodiments, a recombinant polynucleotide or polypeptide/enzyme has a sequence that is identical to its wild-type counterpart but is in a non-native form (e.g., in an isolated or enriched form).
- recombinant DNA refers to a DNA sequence comprising at least one expression cassette comprising an artificial combination of nucleic acid fragments.
- the recombinant DNA construct can include 5' and 3' regulatory sequences operably linked to a polynucleotide of interest as disclosed herein.
- a recombinant DNA construct may comprise regulatory sequences and coding sequences that are derived from different sources.
- Such a recombinant DNA construct may be used by itself or it may be used in conjunction with a vector, which is referred to herein as a circular recombinant DNA construct.
- the choice of vector is dependent upon the method that will be used to introduce the vector into the host cells as is well known to those skilled in the art.
- plasmid vector can be used.
- the skilled artisan is well aware of the genetic elements that must be present on the vector in order to successfully transform, select and propagate host cells.
- a recombinant DNA construct can be a "linear recombinant DNA construct” referring to a recombinant DNA construct that is linear, and/or a "circular recombinant DNA construct” or “circular recombinant DNA” referring to a recombinant DNA construct that is circular.
- the term “circular recombinant DNA construct” includes a circular extra chromosomal element comprising autonomously replicating sequences, genome integrating sequences (such as but not limiting to single or multi-copy gene expression cassettes) , phage, or nucleotide sequences, derived from any source, or synthetic (/e.
- Target sequences (Target sites)
- target sequence refers to a polynucleotide sequence such as, but not limited to, a nucleotide sequence on a chromosome, episome, a transgenic locus, or any other DNA molecule in the genome (including chromosomal, plasmid DNA, or DNA modification templates introduced into the cell) of a cell, at which a guide polynucleotide/Cas endonuclease complex can recognize, bind to, and optionally nick or cleave all or part of the target sequence.
- the target sequence includes a polynucleotide sequence in the genome of a microbial cell at which a Cas endonuclease cleavage is desired to promote a genome modification, e.g., homologous recombination with a DNA modification template.
- a genome modification e.g., homologous recombination with a DNA modification template.
- the context in which this term is used can slightly alter its meaning.
- the target sequence for a Cas endonuclease is generally very specific and can often be defined to the exact nucleotide sequence/position, whereas in some cases the target sequence for a desired genome modification can be defined more broadly than merely the site at which DNA cleavage occurs, e.g., a genome locus or region where homologous recombination is desired.
- the genome modification that occurs via the activity of Cas/guide RNA DNA cleavage is described as occurring “at or near” the target sequence
- the target sequence can be an endogenous site in the genome of a cell, or alternatively, the target sequence can be heterologous to the cell and thereby not be naturally occurring in the genome of the cell, or the target sequence can be found in a heterologous genome location compared to where it occurs in nature.
- endogenous target sequence and “native target sequence” are used interchangeable herein to refer to a target sequence that is endogenous or native to the genome of a cell and is at the endogenous or native position of that target sequence in the genome of the cell.
- An “artificial target sequence” or “artificial target sequence” are used interchangeably herein and refer to a target sequence that has been introduced into the genome of a cell. Such an artificial target sequence can be identical in sequence to an endogenous or native target sequence in the genome of a cell but be located in a different position (/.e., a non-endogenous or non-native position) in the genome of a cell.
- the target sequence at which a guide polynucleotide/Cas endonuclease (RGEN) complex can recognize, bind to, and optionally nick or cleave all or part of the target sequence is referred to as a predetermined target sequence.
- identifying a predetermined site to have a single marker introduced e.g. a predetermined target sequence
- a single marker introduced e.g. a predetermined target sequence
- identifying a selection marker in a predetermined target sequence of a microbial cell allows for the simultaneous introduction of introducing a selection marker in a predetermined target sequence of a microbial cell and simultaneously modifying at least one target sequence that is different from said predetermined DNA sequence in the genome of a microbial cell.
- predetermined target sequences were guided by balanced GC content, the absence of repetitive sequences, distance to repetitive sequences such as telomeres, ORF tail-to-tail reading orientations, effectiveness of simultaneously consistent and high gene of interest (GOI) expression with low cell-to-cell variability, and the availability of unique and active CRISPR sites.
- GOI gene of interest
- a “unique target sequence” is a target sequence that is not found in the genome of a microbial cell that one wants to modify (such as for example target sequence [B] in Figure 1) and as such is different from the predetermined target sequence and from any additional target sequences described herein.
- the target sequence is a unique target sequence that is used for marker flanking in a DNA modification template, such as shown in Figure 1 , where the unique marker [B] flanks a selection marker located on a DNA modification template.
- the target sequence is an additional target sequence that occurs only once in the genome of a microbial cell (and is different from the predetermined target sequence) and that is used for additional genome modifications (comodification) using RGEN and DNA medication templates (See also Figures 1-8 for examples of marker integration and excision while simultaneously modifying at least one additional target sequence.
- the modification at at least one additional target sequence is selected from the group consisting of an insertion of a polynucleotide of interest, a deletion of a polynucleotide of interest, a replacement of a polynucleotide of interest, and any one combination thereof.
- altered target site refers to a target sequence as disclosed herein that comprises at least one alteration when compared to non-altered target sequence.
- alterations include, for example:
- the target sequence for a Cas endonuclease can be very specific and can often be defined to the exact nucleotide position, whereas in some cases the target sequence for a desired genome modification can be defined more broadly than merely the site at which DNA cleavage occurs, e.g., a genome locus or region that is to be deleted from the genome. Thus, in certain cases, the genome modification that occurs via the activity of Cas/guide RNA DNA cleavage is described as occurring “at or near” the target sequence.
- Methods for “modifying a target sequence” and “altering a target sequence” are used interchangeably herein and refer to methods for producing an altered target sequence.
- target sequence can vary, and includes, for example, target sequences that are at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30 or more nucleotides in length. It is further possible that the target sequence can be palindromic, that is, the sequence on one strand reads the same in the opposite direction on the complementary strand.
- the nick/cleavage site can be within the target sequence or the nick/cleavage site could be outside of the target sequence.
- the cleavage could occur at nucleotide positions immediately opposite each other to produce a blunt end cut or, in other cases, the incisions could be staggered to produce single-stranded overhangs, also called “sticky ends”, which can be either 5' overhangs, or 3' overhangs. Active variants of genome target sequences can also be used.
- Such active variants can comprise at least 65%, 70%, 75%, 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the given target sequence, wherein the active variants retain biological activity and hence are capable of being recognized and cleaved by a Cas endonuclease.
- Assays to measure the single or double-strand break of a target sequence by an endonuclease are known in the art and generally measure the overall activity and specificity of the agent on DNA substrates containing recognition sites.
- the target sequence selected by a user of the disclosed methods can be located within a region of a gene of interest selected from the group consisting of an open reading frame, a promoter, a regulatory sequence, a terminator sequence, a regulatory element sequence, a splice site, a coding sequence, a polyubiquitination site, an intron site, and an intron enhancing motif.
- genes of interest include genes encoding acetyl esterases, aminopeptidases, amylases, arabinases, arabinofuranosidases, carboxypeptidases, catalases, cellulases, chitinases, cutinase, deoxyribonucleases, epimerases, esterases, a-galactosidases, [3- galactosidases, a-glucanases, glucan lysases, endo- [3-glucanases, glucoamylases, glucose oxidases, a-glucosidases, [3-glucosidases, glucuronidases, hemicellulases, hexose oxidases, hydrolases, invertases, isomerases, laccases, lipases, lyases, mannosidases, oxidases, oxidoreductases, pec
- Target genes encoding regulatory proteins such as transcription factors, repressors, proteins that modifies other proteins such as kinases, proteins involved in post-translational modification (e.g., glycosylation) can be subjected to Cas mediated engineering as well as genes involved in cell signaling, morphology, growth rate, and protein secretion. No limitation in this regard is intended.
- a “protospacer adjacent motif’ herein refers to a short nucleotide sequence adjacent to a target sequence (protospacer) that is recognized (targeted) by a guide polynucleotide/Cas endonuclease (PGEN) system.
- the Cas endonuclease may not successfully recognize a target DNA sequence if the target DNA sequence is not followed by a PAM sequence.
- the sequence and length of a PAM herein can differ depending on the Cas protein or Cas protein complex used.
- the PAM sequence can be of any length but is typically 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides long.
- a PAM herein is typically selected in view of the type of PGEN being employed.
- a PAM sequence herein may be one recognized by a PGEN comprising a Cas, such as the Cas9 variants described herein, derived from any of the species disclosed herein from which a Cas can be derived, for example.
- the PAM sequence may be one recognized by an RGEN comprising a Cas9 derived from S. pyogenes, S. thermophilus, S. agalactiae, N. meningitidis, T. denticola, or F. novicida.
- pyogenes Including the Cas9 Y155 variants described herein, could be used to target genome sequences having a PAM sequence of NGG; N can be A, C, T, or G).
- a suitable Cas9 could be derived from any of the following species when targeting DNA sequences having the following PAM sequences: S. thermophilus (NNAGAA), S. agalactiae (NGG), NNAGAAW [W is A or T], NGGNG), N. meningitidis (NNNNGATT), T. denticola (NAAAAC), or F. novicida (NG) (where N’s in all these particular PAM sequences are A, C, T, or G).
- Cas9/PAMs useful herein include those disclosed in Shah et al. (RNA Biology 10:891-899) and Esvelt et al. (Nature Methods 10:1116-1121 ), which are incorporated herein by reference. DNA Modification Templates
- the present disclosure includes methods and compositions for marker swapping in microbial cells. Specifically, this disclosure pertains to compositions and methods for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct, using uniquely designed DNA modification templates in combination with RGENs.
- DNA modification template refers to a DNA sequence that comprises, at a minimum, a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region of a microbial cell referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region of a microbial cell referred to as the Downstream Genome Region (Downstream Genome Arm, DA) and wherein said DNA modification template in combination with an RNA-guided endonuclease (RGEN) can modify at least one genome target sequence in a microbial cell through homology directed repair (homologous recombination).
- UHA Upstream Homology Arm
- DHA Downstream Homology Arm
- the DNA modification template further comprises a DNA sequence (referred to as donor DNA) located in between said UHA and DHA, wherein said DNA modification template in combination with an RNA-guided endonuclease (RGEN) can modify at least one additional genome target sequence in a microbial cell through homology directed repair (homologous recombination), wherein said modifications can be, but are not limited to, a DNA integration, a DNA deletion, a DNA replacement/substitution, or any one combination thereof.
- RGEN RNA-guided endonuclease
- the “DNA modification template” comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein said UHA and DHA flank a DNA sequence (referred to as a donor DNA), wherein said donor DNA comprises a first selection marker ([Marker-1]) flanked by an upstream target sequence ([B]) and an identical downstream target sequence ([B]) that is different from a predetermined target sequence ([A]) present in the genome of a microbial cell (such as but not limiting to Figure 1 [UHA-A]-[B]-[Marker-
- the “DNA modification template” comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein said UHA and DHA flank a DNA sequence (referred to as a donor DNA), wherein said donor DNA comprises a first selection marker ([Marker-1]) flanked by a unique upstream target sequence ([B1 ]) and a different but unique downstream target sequence ([B2]) that are different from a predetermined target sequence ([A]) present in the genome of a microbial cell (such as but not limiting to [UHA-A]-[ Ba]- [Marker-1]-[B ]-[DHA-A
- the “DNA modification template” comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein said UHA and DHA flank a DNA sequence (referred to as a donor DNA), wherein said donor DNA comprises a second selection marker ([Marker-2]) flanked by an upstream target sequence ([C]) and an identical downstream target sequence ([C]) that is different from a predetermined target sequence ([A]) present in the genome of a microbial cell (such as but not limiting to Figure 1 [UHA-A]-[C]-[Marker-
- the “DNA modification template” comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein said UHA and DHA flank a DNA sequence (referred to as a donor DNA), wherein said donor DNA comprises a second selection marker ([Marker-2]) flanked by a unique upstream target sequence ([C1]) and a different but unique downstream target sequence ([C2]) that are different from a predetermined target sequence ([A]) present in the genome of a microbial cell (such as but not limiting to [UHA-A]-[Ca]- [Marker-1]-[CP]-[DHA-A]
- the first selection marker [Marker-1]
- a second selection marker [Marker2]
- the “DNA modification template” comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein said upstream and downstream homology region flank a DNA sequence (DNA template), wherein said DNA modification template in combination with an RNA-guided endonuclease (RGEN) can result in homologous recombination (HDR) of said DNA template with a target region in the genome of a microbial cell, wherein said homologous recombination results in a genome medication selected from the group consisting of a DNA integration, a DNA deletion, a DNA replacement/substitution, or any one combination thereof.
- UHA Upstream Hom
- the “DNA modification template” comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein the DNA modification template further comprises a DNA sequence (referred to as donor DNA) located in between said UHA and DHA, wherein said donor DNA comprises a DNA sequence to be inserted into said genome (such as but not limiting to Figure 1 [UHA-M]-[insert]-[DHA-M]).
- UHA Upstream Homology Arm
- DHA Downstream Homology Arm
- the “DNA modification template” comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein the DNA modification template further comprises a DNA sequence (referred to as donor DNA) located in between said UHA and DHA, wherein said donor DNA comprises a first polynucleotide of interest (Insert) that upon integration into the genome will replace a second said genome (see also Figure 5-6).
- UHA Upstream Homology Arm
- DHA Downstream Homology Arm
- the “DNA modification template” comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein said upstream and downstream homology region flank a DNA sequence to be deleted from said genome.
- the DNA sequence to be deleted from the microbial genome can comprise a polynucleotide of interest by itself, or comprise a polynucleotide of interest flanked by at least one target sequence that can be recognized by at least one RGEN.
- the nucleotide sequence of interest to be integrated into the microbial genome is selected from the group consisting of a polynucleotide of interest, a selection marker, a selection marker DNA flanked by target sequence DNA, a DNA sequence capable of self-excising, a gene of interest, a transcriptional regulatory sequence, a translational regulatory sequence, a promoter sequence, a terminator sequence, a transgenic nucleic acid sequence, an antisense sequence complementary to at least a portion of the messenger RNA, a heterologous sequence, or any one combination thereof.
- the “DNA modification template” comprises a DNA sequence flanked by a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA), wherein the DNA sequence comprises at least one nucleotide modification when compared to a genome nucleotide sequence to be edited.
- a nucleotide modification can be at least one nucleotide substitution, addition or deletion.
- the homology arms of the present disclosure flanking a double stranded DNA sequence, include about between 1001 base pairs (bps) and 2000 bps; between 2000 bps and 3000 bps; between 2000 bps and 4000 bps; between 2000 bps and 5000 bps; between 2000 bps and 6000 bps, between 3000 bps and 4000 bps; between 3000 bps and 5000 bps; between 3000 bps and 6000 bps, between 4000 bps and 5000 bps; between 4000 bps and 6000 bps, between 5000 bps and up to 6000 bps.
- the 5' and 3' ends of a gene of interest are flanked by a homology arm wherein the homology arm comprises nucleic acid sequences immediately flanking the targeted genome locus of the microbial cell.
- Selection markers for marker swapping by replacing a first selection marker construct integrated at a predetermined target seguence of a microbial cell with a second selection marker construct
- the present disclosure includes methods and compositions for selection marker swapping in a microbial cell by replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct using DNA modification templates in combination with RGENs.
- Disclosed herein are replaceable selection marker constructs comprising a selection marker (shown as [Marker-1] or [Marker-2] in Figures) flanked by a unique RGEN target sequences (say for example target sequence [B] or [C], see Figures, wherein said construct is part of a DNA modification template comprising homologous recombination arms.
- Use of these DNA modification templates together with the specific RGENs that can recognize and cleave the RGEN target sequence allows for the replacement of a first selection marker construct ( [B]-[Marker-1]-[B]) with a second selection marker ([C]-[Marker-2]-[C]) at a predetermined target sequence of a microbial cell.
- selection marker constructs comprising a selection marker (shown as [Marker 1] or [Marker2] in Figures) flanked by unique but different RGEN target sequences (say for example target sequences [Ba] and [B
- a selection marker shown as [Marker 1] or [Marker2] in Figures
- RGEN target sequences say for example target sequences [Ba] and [B
- selection markers include, but are not limited to pyr4 (Smith et al., Curr Genet 1991 , 19(1 ):27-33), pyr2 (Jorgensen et al., 2014, Microbial Cell Factories, 13(1 )33), hph (Mach et al., Curr. Genet., 1994,25(6):567-570), amdS (Penttila et al., Gene, 1987, (2): 155-164), alS (W02008039370A1 ; Ouedraogo et al., Appl. Microbial. Biotechnol., 2015, 99(23): 10083-95)
- the method comprises a method for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct, the method comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]-[Marker-2]-[C]) comprising a second selection marker ([Marker2]) flanked by a second unique target sequence ([C]), wherein said first RGEN in combination with said first DNA modification template enables the replacement of said first selection marker construct with said second selection marker construct via homo
- the method comprises a method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell, the method comprising: a) providing one or more microbial cells having a second selection marker construct ([C]-[Marker-2]-[C]) integrated at a predetermined target sequence ([A]), wherein said second selection marker construct comprises a second selection marker ([Marker-2]) flanked by a first unique target sequence ([C]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]- [B]) comprising a first selection marker ([Marker-1]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the replacement of said second selection marker construct with said first selection marker construct via homologous recombination
- Selection markers for marker swapping by replacing a first selection marker construct integrated at a predetermined target seguence of a microbial cell with a second selection marker construct while simultaneously modifying at least one additional target seguence in a microbial genome.
- the present disclosure further includes methods and compositions in which the selection marker swapping system described herein is combined with simultaneously modifying at least one additional target sequence at a different genome target sequence.
- the methods and compositions employ homologous recombination-based selection marker swapping at a predetermined target sequence of a microbial cell while simultaneously modifying at least one target sequence that is different from said predetermined DNA sequence in the genome of a microbial cell using RNA-guided endonucleases (RGENs) mediated and DNA modification template based methods.
- RGENs RNA-guided endonucleases
- marker swapping refers to a process of integrating a (first) selection marker at a predetermined target sequence in the genome of a microbial cell which is later replaced (swapped) by a second selection marker at the site where the first marker was integrated, template based methods.
- marker swapping also refers to a process of replacing (swapping) a selection marker integrated at a predetermined target sequence in the genome of a microbial cell with a second selection marker at the site where the first marker was integrated.
- the method comprises a method for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct while simultaneously modifying at least one additional target sequence, the method comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]- [Marker-2]-[C]) comprising a second selection marker ([Marker-2]) flanked by a second unique target sequence ([C]), wherein said first RGEN
- the method comprises a A method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell, the method comprising: a) providing one or more microbial cells having a second selection marker construct (referred to as [C]- [Marker2]-[C]) integrated at a predetermined target sequence ([A]), wherein said a second selection marker construct comprises a second selection marker ([Marker2]) flanked by a first unique target sequence ([C]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]- [B]) comprising a first selection marker ([Marked ]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the
- the modification at said at least one additional target sequence is selected from the group consisting of an insertion of a polynucleotide of interest, a deletion of a polynucleotide of interest, a replacement of a polynucleotide of interest, and any one combination thereof.
- the microbial cells of (a) have at least one additional target sequence ([M]), and simultaneously introducing a modification comprises introducing at least a second RGEN (RGEN-M) and at least a second DNA modification template ([UHA-M]-[lnsert]-[DHA-M]) comprising a polynucleotide of interest ([Insert]), wherein said second RGEN in combination with said second DNA modification template enables the integration of said polynucleotide of interest at said at least one additional target sequence ([M]).
- the microbial cells of (a) have at least a first additional target sequence [(Ma)] and a second additional target sequence p(M[3)] flanking a polynucleotide of interest to be deleted, and wherein said simultaneously introducing a modification comprises introducing at least a second RGEN (RGEN-Ma), a third RGEN (RGEN-M[3) and at least a third DNA modification template ([UHA-D]-[DHA- D]) comprising an Upstream Homology Arm ([UHA-D]) directly linked to Downstream Homology Arm ([DHA-D]), wherein said UHA-D and DHA-D are homologous to a genomic region of said microbial cell flanking said polynucleotide sequence of interest to be deleted, wherein said third RGEN-Ma and fourth RGEN-M[3 in combination with said third DNA modification template enables the deletion of said polynucleotide of interest.
- the microbial cells of (a) have at least a first additional target sequence (Ma) and a second additional target sequence (M
- an “allele” or “allelic variant” is one of several alternative forms of a gene occupying a given locus on a chromosome. When all the alleles present at a given locus on a chromosome are the same, that organism is homozygous at that locus. If the alleles present at a given locus on a chromosome differ, that organism is heterozygous at that locus.
- An allelic variant of a polypeptide is a polypeptide encoded by an allelic variant of a gene.
- host cell refers to a cell that has the capacity to act as a host or expression vehicle for a newly introduced DNA sequence.
- the host cells are microbial cells.
- cell herein refers to any type of cell such as a prokaryotic or eukaryotic cell.
- a eukaryotic cell has a nucleus and other membrane-enclosed structures (organelles), whereas a prokaryotic cell lacks a nucleus.
- a microbial cell herein can refer to a fungal cell (e.g., yeast cell), prokaryotic cell, protist cell (e.g., algal cell), euglenoid cell, stramenopile cell, or oomycete cell, for example.
- a prokaryotic cell herein can refer to a bacterial cell or archaeal cell, for example.
- Fungal cells (e.g., yeast cells), protist cells (e.g., algal cells), euglenoid cells, stramenopile cells, and oomycete cells represent examples of eukaryotic microbial cells.
- a eukaryotic microbial cell has a nucleus and other membrane- enclosed structures (organelles), whereas a prokaryotic cell lacks a nucleus.
- Fungal cells that find use in the subject methods can be filamentous fungal cell species.
- “Fungal cell”, “fungi”, “fungal host cell”, and the like, as used herein includes the phyla Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota (as defined by Hawksworth et al., In, Ainsworth and Bisby's Dictionary of The Fungi, 8th edition, 1995, CAB International, University Press, Cambridge, UK) as well as the Oomycota (as cited in Hawksworth et al., supra) and all mitosporic fungi (Hawksworth et al., supra).
- the fungal host cell is a yeast cell, where by “yeast” is meant ascosporogenous yeast (Endomycetales), basidiosporogenous yeast, and yeast belonging to the Fungi Imperfecti (Blastomycetes).
- a yeast host cell includes a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell.
- yeast examples include, but are not limited to, the following: Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviform is, Kluyveromyces lactis, and Yarrowia lipolytica cell.
- filamentous fungal cell includes all filamentous forms of the subdivision Eumycotina or Pezizomycotina.
- Suitable cells of filamentous fungal genera include, but are not limited to, cells of Acremonium, Aspergillus, Chrysosporium, Corynascus, Chaetomium, Fusarium, Gibberella, Humicola, Magnaporthe, Myceliophthora, Neurospora, Paecilomyces, Penicillium, Scytaldium, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Hypocrea, and Trichoderma.
- Suitable cells of filamentous fungal species include, but are not limited to, cells of Aspergillus awamori, Aspergillus fumigatus, Aspergillus foetidus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Chrysosporium lucknowense, Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium sambucinum, Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium sulphureum, Fusarium torulosum, Fus
- yeast herein refers to fungal species that predominantly exist in unicellular form. Yeast can alternatively be referred to as “yeast cells”. A yeast herein can be characterized as either a conventional yeast or non-conventional yeast, for example.
- model yeast herein generally refers to Saccharomyces or Schizosaccharomyces yeast species.
- Conventional yeast in certain embodiments are yeast that favor homologous recombination (HR) DNA repair processes over repair processes mediated by non-homologous end-joining (NHEJ).
- HR homologous recombination
- NHEJ non-homologous end-joining
- non-conventional yeast refers to any yeast that is not a Saccharomyces or Schizosaccharomyces yeast species.
- Non-conventional yeast are described in Non-Conventional Yeasts in Genetics, Biochemistry and Biotechnology: Practical Protocols (K. Wolf, K.D. Breunig, G. Barth, Eds., Springer- Verlag, Berlin, Germany, 2003) and Spencer et al. (Appl. Microbiol. Biotechnol. 58:147-156), which are incorporated herein by reference.
- Non-conventional yeast in certain embodiments may additionally (or alternatively) be yeast that favor NHEJ DNA repair processes over repair processes mediated by HR.
- non-conventional yeast are those of the genus Yarrowia (e.g., Yarrowia Hpolytica).
- a “recombinant host cell” (also referred to as a “genetically modified host cell”) is a host cell into which has been introduced a heterologous nucleic acid, e.g., a recombinant DNA construct, or which has been introduced and comprises a genome modification system such as the guide RNA/Cas endonuclease system described herein.
- a subject microbial host cell includes a genetically modified microbial cell by virtue of introduction into a suitable microbial cell of an exogenous nucleic acid (e.g., a plasmid or circular recombinant DNA construct).
- a “parental cell” or a “parental (host) cell” may be used interchangeably and refer to “unmodified” parental cells.
- a “parental” cell refers to any cell or strain of microorganism in which the genome of the “parental” cell is altered (e.g., via one or more mutations/modifications introduced into the parental cell) to generate a modified “daughter” cell thereof.
- a “modified cell” or a “modified (host) cell” may be used interchangeably and refer to recombinant (host) cells that comprise at least one genetic modification which is not present in the “parental” host cell from which the modified cells are derived.
- a “genome region” or “genomic region” is a segment of a chromosome in the genome of a cell. In one aspect the genome region is present on either side of the target sequence or, alternatively, also comprises a portion of the target sequence.
- the genome region can comprise at least 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5-40, 5-45, 5- 50, 5-55, 5-60, 5-65, 5- 70, 5-75, 5-80, 5-85, 5-90, 5-95, 5- 100, 5-200, 5-300, 5-400, 5-500, 5-600, 5-700, 5-800, 5-900, 5-1000, 5-1100, 5- 1200, 5-1300, 5-1400, 5-1500, 5-1600, 5-1700, 5-1800, 5-1900, 5-2000, 5-2100, 5- 2200, 5-2300, 5-2400, 5-2500, 5-2600, 5-2700, 5-2800. 5-2900, 5-3000, 5-3100 or more bases such that the genome region has sufficient homology to undergo homologous recombination with the corresponding region of homology.
- the structural similarity between a given genome region and the corresponding region of homology found on the DNA modification template can be any degree of sequence identity that allows for homologous recombination to occur.
- the amount of homology or sequence identity shared by the “region of homology” of the DNA modification template and the “genome region” of the organism genome can be at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81 %, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity, such that the sequences undergo homologous recombination
- the region of homology on the DNA modification template can have homology to any sequence flanking the target sequence. While in some instances the regions of homology share significant sequence homology to the genome sequence immediately flanking the target sequence, it is recognized that the regions of homology can be designed to have sufficient homology to regions that may be further 5' or 3' to the target sequence. The regions of homology can also have homology with a fragment of the target sequence along with downstream genome regions.
- the first region of homology further comprises a first fragment of the target sequence and the second region of homology comprises a second fragment of the target sequence, wherein the first and second fragments are dissimilar.
- the DNA modification template sequence comprises an upstream homology arm (HR1) and a downstream homology arm (HR2), wherein each homology arm is greater than 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600, 3700, 3800, 3900, 4000, 5000 and up to 6000 nucleotides in length and comprises sequence homology to said target sequence on the genome of the microbial cell.
- HR1 upstream homology arm
- HR2 downstream homology arm
- homologous recombination includes the exchange of DNA fragments between two DNA molecules at the sites of homology.
- the frequency of homologous recombination is influenced by a number of factors. Different organisms vary with respect to the amount of homologous recombination and the relative proportion of homologous to non-homologous recombination.
- the length of the homology region (homology arm) needed to observe homologous recombination varies among organisms.
- Homologous recombination has also been in many organisms. For example, at least 150-200 bp of homology was required for homologous recombination in the parasitic protozoan Leishmania (Papadopoulou and Dumas, (1997) Nucleic Acids Res 25:4278-86) and 150-200bp of homology is required for efficient recombination in the protobacterium E coli (Lovett et al (2002) Genetics 160:851-859).
- Homology-directed repair is a mechanism in cells to repair doublestranded and single stranded DNA breaks.
- Homology-directed repair includes homologous recombination (HR) and single-strand annealing (SSA) (Lieber. 2010 Annu. Rev. Biochem. 79:181-211 ).
- HR homologous recombination
- SSA single-strand annealing
- Other forms of HDR include single-stranded annealing (SSA) and breakage-induced replication, and these require shorter sequence homology relative to HR.
- telomere sequences DNA sequences that are similar.
- a “region of homology to a genome region” that is found on the DNA modification template is a region of DNA that has a similar sequence to a given “genome region” in the cell or organism genome.
- a region of homology can be of any length that is sufficient to promote homologous recombination at the cleaved target sequence.
- the region of homology can comprise at least 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5-40, 5-45, 5- 50, 5-55, 5-60, 5-65, 5- 70, 5-75, 5-80, 5-85, 5-90, 5-95, 5-100, 5-200, 5-300, 5-400, 5-500, 5-600, 5-700, 5-800, 5-900, 5-1000, 5-1100, 5-1200, 5- 1300, 5-1400, 5-1500, 5-1600, 5-1700, 5-1800, 5-1900, 5-2000, 5-2100, 5-2200, 5- 2300, 5-2400, 5-2500, 5-2600, 5-2700, 5-2800, 5-2900, 5-3000, 5-3100 or more bases in length such that the region of homology has sufficient homology to undergo homologous recombination with the corresponding genome region.
- “Sufficient homology” indicates that two polynucleotide sequences have sufficient structural similarity to act as substrates for a homologous recombination reaction.
- the structural similarity includes overall length of each polynucleotide fragment, as well as the sequence similarity of the polynucleotides. Sequence similarity can be described by the percent sequence identity over the whole length of the sequences, and/or by conserved regions comprising localized similarities such as contiguous nucleotides having 100% sequence identity, and percent sequence identity over a portion of the length of the sequences.
- the amount of homology or sequence identity shared by a target and a DNA modification template can vary and includes total lengths and/or regions having unit integral values in the ranges of about 1-20 bp, 20-50 bp, 50-100 bp, 75-150 bp, 100- 250 bp, 150-300 bp, 200-400 bp, 250-500 bp, 300-600 bp, 350-750 bp, 400-800 bp, 450-900 bp, 500-1000 bp, 600-1250 bp, 700-1500 bp, 800-1750 bp, 900-2000 bp, 1- 2.5 kb, 1 .5-3 kb, 2-4 kb, 2.5-5 kb, 3-6 kb, 3.5-7 kb, 4-8 kb, 5-10 kb, or up to and including the total length of the target sequence.
- ranges include every integer within the range, for example, the range of 1-20 bp includes 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19 and 20 bps.
- the amount of homology can also be described by percent sequence identity over the full aligned length of the two polynucleotides which includes percent sequence identity of about at least 50%, 55%, 60%, 65%, 70%, 71 %, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81 %, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%.
- Sufficient homology includes any combination of polynucleotide length, global percent sequence identity, and optionally conserved regions of contiguous nucleotides or local percent sequence identity, for example sufficient homology can be described as a region of 75-150 bp having at least 80% sequence identity to a region of the target locus. Sufficient homology can also be described by the predicted ability of two polynucleotides to specifically hybridize under high stringency conditions, see, for example, Sambrook et al., (1989) Molecular Cloning: A Laboratory Manual, (Cold Spring Harbor Laboratory Press, NY); Current Protocols in Molecular Biology, Ausubel et al., Eds (1994) Current Protocols, (Greene Publishing Associates, Inc. and John Wiley & Sons, Inc.); and, Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology- Hybridization with Nucleic Acid Probes, (Elsevier, New York).
- the term “increased” as used herein may refer to a quantity or activity that is at least 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 100%, or at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13,14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350,
- the term “integration efficiency” is defined by diving the number of transformed cells having the desired gene of interest integrated into its genome by the total number of transformed cells. This number can be multiplied by 100 to express it as a %.
- conserved domain or “motif” means a set of amino acids conserved at specific positions along an aligned sequence of evolutionarily related proteins. While amino acids at other positions can vary between homologous proteins, amino acids that are highly conserved at specific positions indicate amino acids that are essential to the structure, the stability, or the activity of a protein. Because they are identified by their high degree of conservation in aligned sequences of a family of protein homologues, they can be used as identifiers, or “signatures”, to determine if a protein with a newly determined sequence belongs to a previously identified protein family.
- knock-in represents the replacement or integration of a DNA sequence at a specific DNA sequence in cell by targeting with a Cas protein (for example by homologous recombination (HR), wherein a suitable donor DNA polynucleotide is also used).
- HR homologous recombination
- knock-ins are a specific integration of a heterologous amino acid coding sequence in a coding region of a gene, or a specific integration of a transcriptional regulatory element in a genetic locus.
- nucleic acid means a polynucleotide and includes a single or a double-stranded polymer of deoxyribonucleotide or ribonucleotide bases. Nucleic acids may also include fragments and modified nucleotides. Thus, the terms “polynucleotide”, “nucleic acid sequence”, “nucleotide sequence” and “nucleic acid fragment” are used interchangeably to denote a polymer of RNA and/or DNA and/or RNA-DNA that is single- or double-stranded, optionally containing synthetic, nonnatural, or altered nucleotide bases.
- Nucleotides are referred to by their single letter designation as follows: “[A]” for adenosine or deoxyadenosine (for RNA or DNA, respectively), “C” for cytosine or deoxycytosine, “G” for guanosine or deoxyguanosine, “U” for undine, “T” for deoxythymidine, “R” for purines (A or G), “Y” for pyrimidines (C or T), “K” for G or T, “H” for A or C or T, “[I]” for inosine, and “N” for any nucleotide (nucleotide (e.g., N can be A, C, T, or G, if referring to a DNA sequence; N can be A, C, U, or G, if referring to an RNA sequence).
- N nucleotide
- polynucleotides or nucleic acid molecules described herein include “genes”, “vectors” and “plasmids”.
- gene refers to a polynucleotide that codes for a functional molecule such as, but not limited to, a particular sequence of amino acids, which comprise all, or part of a protein coding sequence, and may include regulatory (non-transcribed) sequences, such as promoter sequences, which determine for example the conditions under which the gene is expressed.
- the transcribed region of the gene may include untranslated regions (UTRs), including introns, 5'-untranslated regions (UTRs), and 3'-UTRs, as well as the coding sequence.
- “Native gene” refers to a gene as found in nature with its own regulatory sequences.
- a “codon-modified gene” or “codon-preferred gene” or “codon-optimized gene” is a gene having its frequency of codon usage designed to mimic the frequency of preferred codon usage of the host cell.
- the nucleic acid changes made to codon- optimize a gene are “synonymous”, meaning that they do not alter the amino acid sequence of the encoded polypeptide of the parent gene.
- both native and variant genes can be codon-optimized for a particular host cell, and as such no limitation in this regard is intended. Methods are available in the art for synthesizing codon-preferred genes. See, for example, U.S. Patent Nos. 5,380,831 , and 5,436,391 , and Murray et al. (1989) Nucleic Acids Res. 17:477-498, herein incorporated by reference.
- Additional sequence modifications are known to enhance gene expression in a host organism. These include, for example, elimination of: one or more sequences encoding spurious polyadenylation signals, one or more exon-intron splice site signals, one or more transposon-like repeats, and other such well-characterized sequences that may be deleterious to gene expression.
- the G-C content of the sequence may be adjusted to levels average for a given host organism, as calculated by reference to known genes expressed in the host cell. When possible, the sequence is modified to avoid one or more predicted hairpin secondary mRNA structures.
- coding sequence refers to a nucleotide sequence, which directly specifies the amino acid sequence of its (encoded) protein product.
- the boundaries of the coding sequence are generally determined by an open reading frame (hereinafter, “ORF”), which usually begins with an ATG start codon.
- ORF open reading frame
- the coding sequence typically includes DNA, cDNA, and recombinant nucleotide sequences.
- ORF open reading frame
- chromosomal integration refers to a process where a polynucleotide of interest is integrated into a microbial chromosome.
- the homology arms of the DNA modification template will align with homologous regions of the microbial chromosome. Subsequently, the sequence between the homology arms is replaced by the polynucleotide of interest in a double crossover (i.e. , homologous recombination).
- Regulatory sequences refer to nucleotide sequences located upstream (5’ non-coding sequences), within, or downstream (3’ non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence.
- Regulatory sequences include, but are not limited to, promoters, translation leader sequences, 5’ untranslated sequences, 3’ untranslated sequences, introns, polyadenylation target sequences, RNA processing sites, effector binding sites, and stem-loop structures.
- promoter refers to a nucleic acid sequence capable of controlling the expression of a coding sequence or functional RNA.
- a coding sequence is located 3' (downstream) to a promoter sequence. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic nucleic acid segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different cell types, or at different stages of development, or in response to different environmental or physiological conditions. Promoters which cause a gene to be expressed in most cell types at most times are commonly referred to as “constitutive promoters”. It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of different lengths may have identical promoter activity.
- operably linked is intended to mean a functional linkage between two or more elements.
- an operable linkage between a polynucleotide of interest and a regulatory sequence is a functional link that allows for expression of the polynucleotide of interest (i.e., the polynucleotide of interest is under transcriptional control of the promoter).
- Operably linked elements may be contiguous or non-contiguous. Coding sequences (e.g., an ORF) can be operably linked to regulatory sequences in sense or antisense orientation. When used to refer to the joining of two protein coding regions, by operably linked is intended that the coding regions are in the same reading frame.
- a nucleic acid is “operably linked” when it is placed into a functional relationship with another nucleic acid sequence.
- DNA encoding a secretory leader i.e., a signal peptide
- a promoter or enhancer is operably linked to a coding sequence if it affects the transcription of the sequence
- a ribosome binding site is operably linked to a coding sequence if it is positioned so as to facilitate translation.
- operably linked means that the DNA sequences being linked are contiguous, and, in the case of a secretory leader, contiguous and in reading phase. However, enhancers do not have to be contiguous. Linking is accomplished by ligation at convenient restriction sites. If such sites do not exist, the synthetic oligonucleotide adaptors or linkers are used in accordance with conventional practice.
- a functional promoter sequence controlling the expression of a gene of interest (or open reading frame thereof) linked to the gene of interest’s protein coding sequence refers to a promoter sequence which controls the transcription and translation of the coding sequence in Bacillus.
- the present disclosure is directed to a polynucleotide comprising a 5' promoter (or 5' promoter region, or tandem 5' promoters and the like), wherein the promoter region is operably linked to a nucleic acid sequence encoding a protein of interest.
- a functional promoter sequence controls the expression of a gene of interest encoding a protein of interest.
- a functional promoter sequence controls the expression of a heterologous gene or an endogenous gene encoding a protein of interest in a microbial cell.
- the promoter sequence consists of proximal and more distal upstream elements, the latter elements often referred to as enhancers.
- An “enhancer” is a DNA sequence that can stimulate promoter activity, and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissuespecificity of a promoter.
- the recombinant DNAs (such as, but not limiting to, DNA modification templates) disclosed herein can be introduced into a microbial cell using any method known in the art.
- introducing includes methods known in the art for introducing polynucleotides into a cell, including, but not limited to protoplast fusion, natural or artificial transformation (e.g., calcium chloride, electroporation, heat shock), transduction, transfection, conjugation and the like (e.g., see Ferrari et al., 1989).
- "Introducing” is intended to mean presenting to the organism, such as a cell or organism, DNAs disclosed herein (such as but not limiting to a DNA modification template, a donor DNA, a recombinant DNA construct/expression construct), in such a manner that the component(s) gains access to the interior of a cell of the organism or to the cell itself.
- DNAs disclosed herein such as but not limiting to a DNA modification template, a donor DNA, a recombinant DNA construct/expression construct
- the methods and compositions do not depend on a particular method for introducing a sequence into an organism or cell, only that DNAs disclosed herein gains access to the interior of at least one cell of the organism.
- Introducing includes reference to the incorporation of a nucleic acid into a microbial cell where the nucleic acid may be incorporated (integrated) into the genome of the cell, and includes reference to the transient (direct) provision of a nucleic acid to the cell.
- Stable transformation is intended to mean that the nucleotide construct introduced into an organism integrates into a genome of the organism and is capable of being inherited by the progeny thereof.
- Transient transformation is intended to mean that a polynucleotide is introduced (directly or indirectly) into the organism and does not integrate into a genome of the organism or a polypeptide is introduced into an organism. Transient transformation indicates that the introduced composition is only temporarily expressed or present in the organism.
- transient introduction includes situations in which the introduced DNA does not integrate into the chromosome of the host cell and thus is not transmitted to all daughter cells during growth as well as situations in which an introduced DNA molecule that may have integrated into the chromosome is removed at a desired time using any convenient method (e.g., employing a cre-lox system, by removing positive selective pressure for an episomal DNA construct, by promoting looping out of all or part of the integrated polynucleotide from the chromosome using a selection media, etc.). No limitation in this regard is intended.
- a variety of methods are available for identifying those cells with integration into the genome at or near to the target sequence. Such methods can be viewed as directly analyzing a target sequence to detect any change in the target sequence, including but not limited to PCR methods, sequencing methods, nuclease digestion, Southern blots, and any combination thereof. See, for example, US Patent Application 12/147,834, herein incorporated by reference to the extent necessary for the methods described herein.
- the method also comprises recovering an organism from the cell comprising a polynucleotide of interest integrated into its genome.
- genomic or a microbial (host) cell “genome includes not only chromosomal DNA found within the nucleus, but organelle DNA found within subcellular components of the cell (extrachromosomal DNA).
- plasmid refers to extrachromosomal elements, often carrying genes which are typically not part of the central metabolism of the cell, and usually in the form of double-stranded DNA molecules.
- Such elements may be autonomously replicating sequences, genome integrating sequences, phage or nucleotide sequences, linear or circular, of a singlestranded or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a promoter fragment and DNA sequence for a selected gene product along with appropriate 3' untranslated sequence into a cell.
- vector includes any nucleic acid that can be replicated (propagated) in cells and can carry new genes or DNA segments into cells.
- Vectors include viruses, bacteriophage, pro-viruses, plasmids, phagemids, transposons, and artificial chromosomes such as BACs (bacterial artificial chromosomes), and the like, that are “episomes” (/.e. , replicate autonomously or can integrate into a chromosome of a host organism).
- expression cassette and “expression vector” refer to a nucleic acid construct generated recombinantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular nucleic acid in a cell.
- the recombinant expression cassette can be incorporated into a plasmid, chromosome, mitochondrial DNA, plastid DNA, virus, or nucleic acid fragment.
- the recombinant expression cassette portion of an expression vector includes, among other sequences, a nucleic acid sequence to be transcribed and a promoter.
- DNA constructs also include a series of specified nucleic acid elements that permit transcription of a particular nucleic acid in a target cell.
- a DNA construct of the disclosure comprises a selective marker and an inactivating chromosomal or gene or DNA segment as defined herein.
- Many prokaryotic expression vectors are commercially available and know to one skilled in the art. Selection of appropriate expression vectors is within the knowledge of one skilled in the art.
- a “targeting vector” is a vector that includes polynucleotide sequences that are homologous to a region in the chromosome of a host cell into which the targeting vector is transformed and that can drive homologous recombination at that region.
- targeting vectors find use in introducing mutations into the chromosome of a host cell through homologous recombination.
- the targeting vector comprises other non-homologous sequences, e.g., added to the ends (/.e., stuffer sequences or flanking sequences). The ends can be closed such that the targeting vector forms a closed circle, such as, for example, integration into a vector. Selection and/or construction of appropriate vectors is well within the knowledge of those having skill in the art.
- plasmid refers to a circular double-stranded (ds) DNA construct used as a cloning vector, and which forms an extrachromosomal selfreplicating genetic element in many bacteria and some eukaryotes. In some embodiments, plasmids become incorporated into the genome of the host cell. Polynucleotides of interest are further described herein and include polynucleotides reflective of the commercial markets and interests of those involved in the production of enzymes (such as, but not limiting to, through fermentation of bacteria thereby producing the enzymes.
- a polynucleotide of interest can code for one or more proteins of interest. It can have other biological functions.
- the polynucleotide of interest may or may not already be present in the genome of the host cell to be transformed, i.e. , either a homologous or heterologous sequence.
- Nucleotides of interest may comprise antisense sequences complementary to at least a portion of the messenger RNA (mRNA) for a targeted gene sequence of interest.
- Antisense nucleotides are constructed to hybridize with the corresponding mRNA. Modifications of the antisense sequences may be made as long as the sequences hybridize to and interfere with expression of the corresponding mRNA. In this manner, antisense constructions having 70%, 80%, or 85% sequence identity to the corresponding antisense sequences may be used. Furthermore, portions of the antisense nucleotides may be used to disrupt the expression of the target gene. Generally, sequences of at least 50 nucleotides, 100 nucleotides, 200 nucleotides, or greater may be used.
- the polynucleotide of interest may also be used in the sense orientation to suppress the expression of endogenous genes in organisms.
- Methods for suppressing gene expression in organisms using polynucleotides in the sense orientation are known in the art.
- the methods generally involve transforming an organism with a DNA construct comprising a promoter that drives expression in an organism operably linked to at least a portion of a nucleotide sequence that corresponds to the transcript of the endogenous gene.
- a nucleotide sequence has substantial sequence identity to the sequence of the transcript of the endogenous gene, generally greater than about 65% sequence identity, about 85% sequence identity, or greater than about 95% sequence identity. See, U.S. Patent Nos. 5,283,184 and 5,034,323; herein incorporated by reference.
- a phenotypic marker is a screenable or a selection marker that includes visual markers and selection markers whether it is a positive or negative selection marker. Any phenotypic marker can be used.
- a selection or screenable marker comprises a DNA segment that allows one to identify, or select for or against a molecule or a cell that contains it, often under particular conditions. These markers can encode an activity, such as, but not limited to, production of RNA, peptide, or protein, or can provide a binding site for RNA, peptides, proteins, inorganic and organic compounds or compositions and the like.
- selection marker refers to a nucleotide sequence which is capable of expression in (host) cells and where expression of the selection marker confers to cells containing the expressed gene the ability to grow in the presence of a corresponding selective agent or lack of an essential nutrient.
- the selective marker refers to a nucleic acid (e.g., a gene) capable of expression in host cell which allows for ease of selection of those hosts containing the vector
- selection markers include, but are not limited to pyr4 (Smith et al., Curr Genet 1991 , 19(1 ):27-33), pyr2 (Jorgensen et al., 2014, Microbial Cell Factories, 13(1 )33), hph (Mach et al., Curr. Genet., 1994,25(6):567-570), amdS (Penttila et al., Gene, 1987, (2): 155-164), alS (W02008039370A1 ; Ouedraogo et al., Appl. Microbial. Biotechnol., 2015, 99(23): 10083-95).
- selection marker includes genes that provide an indication that a host cell has taken up an incoming DNA of interest or some other reaction has occurred.
- selection markers are genes that confer antimicrobial resistance or a metabolic advantage on the host cell to allow cells containing the exogenous DNA to be distinguished from cells that have not received any exogenous sequence during the transformation.
- a “residing selection marker” is one that is located on the chromosome of the microorganism to be transformed.
- a residing selection marker encodes a gene that is different from the selection marker on the transforming DNA construct.
- Selective markers are well known to those of skill in the art.
- the marker can be an antimicrobial resistance marker (e.g., amp R , phleo R , spec R , kan R , ery R , tet R , cmp R and neo R (see e.g., Guerot-Fleury, 1995; Palmeros et al., 2000; and Trieu-Cuot et al., 1983).
- the present invention provides a chloramphenicol resistance gene (e.g., the gene present on pC194, as well as the resistance gene present in the Bacillus licheniformis genome).
- This resistance gene is particularly useful in the present invention, as well as in embodiments involving chromosomal amplification of chromosomally integrated cassettes and integrative plasmids (See e.g., Albertini and Galizzi, 1985; Stahl and Ferrari, 1984).
- Other markers useful in accordance with the invention include, but are not limited to auxotrophic markers, such as serine, lysine, tryptophan; and detection markers, such as [3-galactosidase.
- Polynucleotides of interest includes genes that can be stacked or used in combination with other traits.
- polypeptide and “protein” are used interchangeably, and refer to polymers of any length comprising amino acid residues linked by peptide bonds.
- the conventional one (1 ) letter or three (3) letter codes for amino acid residues are used herein.
- the polypeptide may be linear or branched, it may comprise modified amino acids, and it may be interrupted by non-amino acids.
- the term polypeptide also encompasses an amino acid polymer that has been modified naturally or by intervention; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation or modification, such as conjugation with a labeling component.
- polypeptides containing one or more analogs of an amino acid including, for example, unnatural amino acids, etc.
- POI protein of interest
- a POI may be an enzyme, a substrate-binding protein, a surface-active protein, a structural protein, a receptor protein, an antibody and the like
- a “gene of interest” or “GOI” refers a nucleic acid sequence (e.g., a polynucleotide, a gene or an ORF) which encodes a POI.
- a “gene of interest” encoding a “protein of interest” may be a naturally occurring gene, a mutated gene or a synthetic gene.
- a gene of interest of the instant disclosure encodes a commercially relevant industrial protein of interest, such as an enzyme (e.g., a acetyl esterases, aminopeptidases, amylases, arabinases, arabinofuranosidases, carbonic anhydrases, carboxypeptidases, catalases, cellulases, chitinases, chymosins, cutinases, deoxyribonucleases, epimerases, esterases, a-galactosidases, [3- galactosidases, a-glucanases, glucan lysases, endo-[3-glucanases, glucoamylases, glucose oxidases, a- glucosidases, [3-glucosidases, glucuronidases, glycosyl hydrolases, hemicellulases, hexose oxidases,
- an enzyme
- a “mutation” refers to any change or alteration in a nucleic acid sequence.
- mutations include point mutations, deletion mutations, silent mutations, frame shift mutations, splicing mutations and the like. Mutations may be performed specifically (e.g., via site directed mutagenesis) or randomly (e.g., via chemical agents, passage through repair minus bacterial strains).
- a “mutated gene” is a gene that has been altered through human intervention. Such a “mutated gene” has a sequence that differs from the sequence of the corresponding non-mutated gene by at least one nucleotide addition, deletion, or substitution. In certain embodiments of the disclosure, the mutated gene comprises an alteration that results from a guide polynucleotide/Cas protein system as disclosed herein.
- a mutated cell or organism is a cell or organism comprising a mutated gene.
- a “targeted mutation” is a mutation in a gene (referred to as the target gene), including a native gene, that was made by altering a target sequence within the target gene using any method known to one skilled in the art, including a method involving a guided Cas protein system.
- the Cas protein is a cas endonuclease
- a guide polynucleotide/Cas endonuclease induced targeted mutation can occur in a nucleotide sequence that is located within or outside a genome target sequence that is recognized and cleaved by the Cas endonuclease.
- substitution means the replacement (i.e., substitution) of one amino acid with another amino acid.
- an “endogenous gene” refers to a gene in its natural location in the genome of an organism.
- heterologous in reference to a polynucleotide or polypeptide sequence is a sequence that originates from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genome locus by deliberate human intervention.
- a promoter operably linked to a heterologous polynucleotide is from a species different from the species from which the polynucleotide was derived, or, if from the same/analogous species, one or both are substantially modified from their original form and/or genome locus, or the promoter is not the native promoter for the operably linked polynucleotide.
- a chimeric polynucleotide comprises a coding sequence operably linked to a transcription initiation region that is heterologous to the coding sequence.
- a “heterologous” gene, a “non-endogenous” gene, or a “foreign” gene refer to a gene (or ORF) not normally found in the host organism, but that is introduced into the host organism by gene transfer.
- the term “foreign” gene(s) comprise native genes (or ORFs) inserted into a non-native organism and/or chimeric genes inserted into a native or non-native organism.
- a “heterologous” nucleic acid construct or a “heterologous” nucleic acid sequence has a portion of the sequence which is not native to the cell in which it is expressed.
- heterologous control sequence refers to a gene expression control sequence (e.g., a promoter or enhancer) which does not function in nature to regulate (control) the expression of the gene of interest.
- heterologous nucleic acid sequences are not endogenous (native) to the cell, or a part of the genome in which they are present, and have been added to the cell, by infection, transfection, transformation, microinjection, electroporation, and the like.
- a “heterologous” nucleic acid construct may contain a control sequence/DNA coding (ORF) sequence combination that is the same as, or different, from a control sequence/DNA coding sequence combination found in the native host cell.
- ORF control sequence/DNA coding
- signal sequence and “signal peptide” refer to a sequence of amino acid residues that may participate in the secretion or direct transport of a mature protein or precursor form of a protein.
- the signal sequence is typically located N-terminal to the precursor or mature protein sequence.
- the signal sequence may be endogenous or exogenous.
- a signal sequence is normally absent from the mature protein.
- a signal sequence is typically cleaved from the protein by a signal peptidase after the protein is transported.
- a “flanking sequence” refers to any sequence that is either upstream or downstream of the sequence being discussed (e.g., for genes A-B-C, gene B is flanked by the A and C gene sequences).
- the incoming sequence is flanked by a homology arm on each side.
- a flanking sequence is present on only a single side (either 3' or 5'), while in other embodiments, it is on each side of the sequence being flanked.
- the sequence of each homology arm is homologous to a sequence in the host cell genome (such as the microbial chromosome).
- stuffer sequence refers to any extra DNA that flanks homology arms (typically vector sequences). However, the term encompasses any non- homologous DNA sequence. Not to be limited by any theory, a stuffer sequence provides a non-critical target for a cell to initiate DNA uptake.
- Sequence identity or “identity” in the context of nucleic acid or polypeptide sequences refers to the nucleic acid bases or amino acid residues in two sequences that are the same when aligned for maximum correspondence over a specified comparison window.
- percentage of sequence identity refers to the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions or deletions (i.e. , gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the results by 100 to yield the percentage of sequence identity.
- percent sequence identities include, but are not limited to, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%, or any integer percentage from 50% to 100%. These identities can be determined using any of the programs described herein.
- Sequence alignments and percent identity or similarity calculations may be determined using a variety of comparison methods designed to detect homologous sequences including, but not limited to, the MegAlignTM program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, Wl).
- sequence analysis software is used for analysis, that the results of the analysis will be based on the “default values” of the program referenced, unless otherwise specified.
- default values will mean any set of values or parameters that originally load with the software when first initialized.
- Clustal V method of alignment corresponds to the alignment method labeled Clustal V (described by Higgins and Sharp, (1989) CABIOS 5:151-153; Higgins et al., (1992) Comput Appl Biosci 8:189-191 ) and found in the MegAlignTM program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, Wl).
- sequence identity/similarity values provided herein refer to the value obtained using GAP Version 10 (GCG, Accelrys, San Diego, CA) using the following parameters: % identity and % similarity for a nucleotide sequence using a gap creation penalty weight of 50 and a gap length extension penalty weight of 3, and the nwsgapdna.cmp scoring matrix; % identity and % similarity for an amino acid sequence using a GAP creation penalty weight of 8 and a gap length extension penalty of 2, and the BLOSUM62 scoring matrix (Henikoff and Henikoff, (1989) Proc. Natl. Acad. Sci. USA 89:10915).
- GAP uses the algorithm of Needleman and Wunsch, (1970) J Mol Biol 48.443-53, to find an alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. GAP considers all possible alignments and gap positions and creates the alignment with the largest number of matched bases and the fewest gaps, using a gap creation penalty and a gap extension penalty in units of matched bases.
- BLAST is a searching algorithm provided by the National Center for Biotechnology Information (NCBI) used to find regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches to identify sequences having sufficient similarity to a query sequence such that the similarity would not be predicted to have occurred randomly. BLAST reports the identified sequences and their local alignment to the query sequence.
- Translation leader sequence refers to a polynucleotide sequence located between the promoter sequence of a gene and the coding sequence.
- the translation leader sequence is present in the mRNA upstream of the translation start sequence.
- the translation leader sequence may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency. Examples of translation leader sequences have been described (e.g., Turner and Foster, (1995) Mol Biotechnol 3:225-236).
- “3’ non-coding sequences”, “transcription terminator” or “termination sequences” refer to DNA sequences located downstream of a coding sequence and include polyadenylation recognition sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression.
- the polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3’ end of the mRNA precursor.
- the use of different 3’ non-coding sequences is exemplified by Ingelbrecht et al., (1989) Plant Cell 1 :671- 680.
- RNA transcript refers to the product resulting from RNA polymerase-catalyzed transcription of a DNA sequence. When the RNA transcript is a perfect complimentary copy of the DNA sequence, it is referred to as the primary transcript or pre-mRNA. A RNA transcript is referred to as the mature RNA or mRNA when it is a RNA sequence derived from post-transcriptional processing of the primary transcript pre-mRNA. “Messenger RNA” or “mRNA” refers to the RNA that is without introns and that can be translated into protein by the cell. “cDNA” refers to a DNA that is complementary to, and synthesized from, an mRNA template using the enzyme reverse transcriptase.
- RNA transcript that includes the mRNA and can be translated into protein within a cell or in vitro.
- Antisense RNA refers to an RNA transcript that is complementary to all or part of a target primary transcript or mRNA, and that blocks the expression of a target gene (see, e.g., U.S. Patent No. 5,107,065). The complementarity of an antisense RNA may be with any part of the specific gene transcript, i.e. , at the 5’ non-coding sequence, 3’ non-coding sequence, introns, or the coding sequence.
- RNA refers to antisense RNA, ribozyme RNA, or other RNA that may not be translated but yet has an effect on cellular processes.
- complement and “reverse complement” are used interchangeably herein with respect to mRNA transcripts, and are meant to define the antisense RNA of the message.
- “Mature” protein refers to a post-translationally processed polypeptide (i.e., one from which any pre- or propeptides present in the primary translation product have been removed).
- “Precursor” protein refers to the primary product of translation of mRNA (i.e., with pre- and propeptides still present). Pre- and propeptides may be but are not limited to intracellular localization signals. Proteins may be altered in various ways including amino acid substitutions, deletions, truncations, and integrations. Methods for such modifications are generally known. For example, amino acid sequence variants of the protein(s) can be prepared by mutations in the DNA. Methods for mutagenesis and nucleotide sequence alterations include, for example, Kunkel, (1985) Proc.
- Conservative deletions, integrations, and amino acid substitutions are not expected to produce radical changes in the characteristics of the protein, and the effect of any substitution, deletion, integration, or combination thereof can be evaluated by routine screening assays.
- Assays for double-strand-break-inducing activity are known and generally measure the overall activity and specificity of the agent on DNA substrates containing target sequences.
- Vectors and constructs include circular plasmids, and linear polynucleotides, comprising a polynucleotide of interest and optionally other components including linkers, adapters, regulatory or analysis.
- a recognition site and/or target sequence can be contained within an intron, coding sequence, 5' UTRs, 3' UTRs, and/or regulatory regions.
- a method for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]-[Marker-2]-[C]) comprising a second selection marker ([Marker2]) flanked by a second unique target sequence ([C]), wherein said first RGEN in combination with said first DNA modification template enables the replacement of said first selection marker construct with said second selection marker construct via homologous recombination;
- a method for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct while simultaneously modifying at least one additional target sequence comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]-[Marker-2]-[C]) comprising a second selection marker ([Marker-2]) flanked by a second unique target sequence ([C]), wherein said first RGEN in combination with said first DNA modification template enables
- a method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell comprising: a) providing one or more microbial cells having a second selection marker construct ([C]-[Marker-2]-[C]) integrated at a predetermined target sequence ([A]), wherein said second selection marker construct comprises a second selection marker ([Marker-2]) flanked by a first unique target sequence ([C]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]-[B]) comprising a first selection marker ([Marker-1]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the replacement of said second selection marker construct with said first selection marker construct via homologous recombination, thereby reestablishing said first selection marker
- a method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell comprising: a) providing one or more microbial cells having a second selection marker construct (referred to as [C]-[Marker2]-[C]) integrated at a predetermined target sequence ([A]), wherein said a second selection marker construct comprises a second selection marker ([Marker2]) flanked by a first unique target sequence ([C]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]-[B]) comprising a first selection marker ([Marked ]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the replacement of said second selection marker construct with said first selection
- This example discloses the usage of two selection markers in a marker swapping system while simultaneously editing a gene of interest (GOI).
- a first genome modification assay ( Figure 1 , 1 st step) the first marker is integrated at a predetermined genome sequence.
- a subsequent genome modification assay ( Figure 1 , 2 nd step) the previously integrated marker is excised again and replaced by a second marker while simultaneously inserting a polynucleotide sequence into a GOI at another genome sequence (multiplex genome engineering).
- the method described herein allows for iterative rounds of multiplex genome engineering using the same marker swapping system.
- RGENs and DNA modification templates were designed to replace a previously integrated removable construct comprising the marker pyr2 (Jorgensen et al., Microbial Cell Factories 2014) with a removable construct comprising the marker hph (Mach et al., Curr Genet 1994) at a predetermined target sequence within the genome of Trichoderma reesei QM6a.
- ade2 is edited, coding for a phosphoribosyl aminoimidazole carboxylase necessary for the synthesis of purines (Jorgensen et al., Microbial Cell Factories 2014).
- ade2 The modification of ade2 was designed to insert a stop codon, giving rise to a ade2 phenotype showing red- colored colonies when supplementing with adenine (due to the accumulation and polymerization of the purine precursor 5-aminoimidazole ribonucleotide).
- Figure 1 illustrates RGENs, DNA modification templates and the genome composition before and after editing, in this case RGEN-A targeting a predetermined target sequence [A], allowing for homologous recombination with the DNA modification template [UHA-A]-[B]-[Marker-1HBHDHA-A] (SEQ ID NO: 1), harboring the pyr2 cassette [Marker-1 ] flanked by the unique target sequence [B] and framed by the upstream and downstream homology arms [UHA-A] and [DHA-A] (1 st step).
- the pyr2 cassette [Marker-1] is excised again by targeting the two previously introduced flanking target sequences [B] with RGEN-B, allowing for homologous recombination with the DNA modification template [UHA-A]-[C]-[Marker-2HC]-[DHA-A] (SEQ ID NO: 2), harboring the hph cassette [Marker-2] flanked by the unique target sequences [C] and framed by the upstream and downstream homology arms [UHA-A] and [DHA-A]; simultaneously, the additional target sequence [M] within the coding sequence of ade2 (SEQ ID NO: 3) is targeted by RGEN-M, resulting in the in-frame insertion of a stop codon via homologous recombination with the DNA modification template [UHA- M]-[lnsert]-[DHA-M] (SEQ ID NO: 4), harboring the polynucleotide sequence of interest to be
- Table 1 illustrates targeting DNA sequences of RGENs, including their respective PAM sequences, in this case target sequence [A] (SEQ ID NO: 5) for RGEN-A, cutting T. reesei QM6a chromosome 3 after bp position 3,895,292 (Li et al., Biotechnology for Biofuels 2017), framed by the upstream arm [IIA-A] (QM6a chromosome 2, bp position 3,894,161 - 3,895,222) and the downstream arm [DA-A] (QM6a chromosome 2, bp position 3,895,298 - 3,896,303); target sequence [B] (SEQ ID NO: 6) for RGEN-B, cutting the pyr2 flanking sequences within DNA modification template [UHA-A]-[B]-[Marker-1]-[B]-[DHA-A]; target sequence [C] (SEQ ID NO: 7) for RGEN-C, cutting the hph flanking sequences within DNA
- Table 2 illustrates oligonucleotides used to assemble subcloning vectors and to PCR-amplify DNA modification templates. All vectors are based on pUC18 (Yanisch-Perron et al., Gene 1985), and vector construction was carried out via seamless assembly (Thermo Fisher Scientific: GeneArtTM Seamless Cloning and Assembly Kit) and subcloning in E. coli (Thermo Fisher Scientific: One ShotTM TOP10 Chemically Competent E. coli).
- the DNA modification template [UHA-A]-[B]- [Marker-1]-[B]-[DHA-A] was PCR-amplified using the oligonucleotide pair RAS210/RAS213 (SEQ ID NO: 9/10) from a vector constructed by assembling the PCR products RAS210/RAS415X (SEQ ID NO:9/11 ), RAS301X/RAS414 (SEQ ID NO: 12/13) and RAS303/RAS213 (SEQ ID NO: 14/10) amplified from QM6a genomic DNA together with RAS234/RAS233 (SEQ ID NO: 15/16) amplified from pUC18.
- the DNA modification template [UHA-A]-[C]-[Marker- 2]— [C]— [DHA-A] was PCR-amplified using the oligonucleotide pair RAS210/RAS213 (SEQ ID NO: 9/10), from a vector constructed by assembling the PCR products RAS210/RAS417X (SEQ ID NO: 9/17) and RAS303/RAS213 (SEQ ID NO: 14/10) amplified from QM6a genomic DNA together with RAS304X/RAS416 (SEQ ID NO: 18/19) amplified from a synthetic construct together with RAS234/RAS233 (SEQ ID NO: 15/16) amplified from pUC18.
- the DNA modification template [UHA-M]-[lnsert]-[DHA-M] was PCR-amplified using the oligonucleotide pair RAS532/RAS535 (SEQ ID NQ:20/21 ), from a vector constructed by assembling the PCR products RAS531/RAS533 (SEQ ID NO: 22/23) and RAS534/RAS536 (SEQ ID NO: 24/25) amplified from QM6a genomic DNA together with RAS537/RAS538 (SEQ ID NO: 26/27) amplified from pUC18.
- T. reesei QM6a genome editing was carried out via protoplast transformation according to standard protocol (Penttila et al., Gene 1987). In a volume of 150 pL, approximately 5*10 A 6 protoplasts, 10 pmol per RGEN, and 0.2, 0.5 or 1 pmol per DNA modification template were used.
- Table 3 illustrates results of editing ade2 while swapping pyr2 with hph (2 nd step), using different amounts of DNA modification templates.
- the number of red colonies indicative for Aacte2 the number of white colonies indicative for no ade2 editing, and the fraction of 12 selected red colonies with colony-PCR product Eag ⁇ restriction patterns indicative for homologous recombination are shown.
- red colonies were observed for all tested concentrations of the DNA modification template [UHA-M]-[lnsert]-[DHA-M] (0.2, 0.5 and 1.0 pmol), and RAS531/RAS536 colony-PCR product Eag ⁇ restriction patterns indicated high freguency of homologous recombination.
- This example discloses the usage of two selection markers in a marker swapping system while simultaneously editing a gene of interest (GOI).
- a first genome modification assay ( Figure 2, 1 st step) the first marker is integrated at a predetermined genome sequence.
- a subsequent genome modification assay Figure 2, 2 nd step
- the previously integrated marker is excised again and replaced by a second marker while simultaneously inserting a polynucleotide sequence into a GOI at another genome sequence (multiplex genome engineering).
- the method described herein allows for iterative rounds of multiplex genome engineering using the same marker swapping system.
- RGENs and DNA modification templates were designed to replace a previously integrated removable construct comprising the marker hph (Mach et al., Curr Genet 1994) with a removable construct comprising the marker pyr2 (Jorgensen et al., Microbial Cell Factories 2014) at a predetermined target sequence within the genome of Trichoderma reesei QM6a.
- ade2 is edited, coding for a phosphoribosyl aminoimidazole carboxylase necessary for the synthesis of purines (Jorgensen et al., Microbial Cell Factories 2014).
- ade2 The modification of ade2 was designed to insert a stop codon, giving rise to a ade2 phenotype showing red- colored colonies when supplementing with adenine (due to the accumulation and polymerization of the purine precursor 5-aminoimidazole ribonucleotide).
- Figure 2 illustrates RGENs, DNA modification templates and the genome composition before and after editing, in this case RGEN-A targeting a predetermined target sequence [A], allowing for homologous recombination with the DNA modification template [UHA-A]-[C]-[Marker-2HC]-[DHA-A] (SEQ ID NO: 2), harboring the hph cassette [Marker-2] flanked by the unique target sequence [C] and framed by the upstream and downstream homology arms [UHA-A] and [DHA-A] (1 st step).
- the hph cassette [Marker-2] is excised again by targeting the two previously introduced flanking target sequences [C] with RGEN-C, allowing for homologous recombination with the DNA modification template [UHA-A]-[B]-[Marker-1]-[B]-[DHA-A] (SEQ ID NO: 1), harboring the pyr2 cassette [Marker-1 ] flanked by the unique target sequence [B] and framed by the upstream and downstream homology arms [UHA-A] and [DHA-A]; simultaneously, the additional target sequence [M] within the coding sequence of ade2 (SEQ ID NO: 3) is targeted by RGEN-M, resulting in the in-frame insertion of a stop codon via homologous recombination with the DNA modification template [UHA- M]-[lnsert]-[DHA-M] (SEQ ID NO: 4), harboring the polynucleotide sequence of interest
- Table 1 illustrates targeting DNA sequences of RGENs, including their respective PAM sequences, in this case target sequence [A] (SEQ ID NO: 5) for RGEN-A, cutting T. reesei QM6a chromosome 3 after bp position 3,895,292 (Li et al., Biotechnology for Biofuels 2017), framed by the upstream arm [UA-A] (QM6a chromosome 2, bp position 3,894,161 - 3,895,222) and the downstream arm [DA-A] (QM6a chromosome 2, bp position 3,895,298 - 3,896,303); target sequence [B] (SEQ ID NO: 6) for RGEN-B, cutting the pyr2 flanking sequences within DNA modification template [UHA-A]-[B]-[Marker-1]-[B]-[DHA-A]; target sequence [C] (SEQ ID NO: 7) for RGEN-C, cutting the hph flanking sequence
- Table 2 (see Example 1 ) illustrates oligonucleotides used to assemble subcloning vectors and to PCR-amplify DNA modification templates. All vectors are based on pUC18 (Yanisch-Perron et al., Gene 1985), and vector construction was carried out via seamless assembly (Thermo Fisher Scientific: GeneArtTM Seamless Cloning and Assembly Kit) and subcloning in E. coli (Thermo Fisher Scientific: One ShotTM TQP10 Chemically Competent E. coli).
- the DNA modification template [UHA- AHBHMarker-1]-[B]-[DHA-A] was PCR-amplified using the oligonucleotide pair RAS210/RAS213 (SEQ ID NO: 9/10) from a vector constructed by assembling the PCR products RAS210/RAS415X (SEQ ID NO:9/11 ), RAS301X/RAS414 (SEQ ID NO: 12/13) and RAS303/RAS213 (SEQ ID NO: 14/10) amplified from QM6a genomic DNA together with RAS234/RAS233 (SEQ ID NO: 15/16) amplified from pUC18.
- the DNA modification template [UHA-A]-[C]-[Marker- 2]-[C]-[DHA-A] was PCR-amplified using the oligonucleotide pair RAS210/RAS213 (SEQ ID NO: 9/10), from a vector constructed by assembling the PCR products RAS210/RAS417X (SEQ ID NO: 9/17) and RAS303/RAS213 (SEQ ID NO: 14/10) amplified from QM6a genomic DNA together with RAS304X/RAS416 (SEQ ID NO: 18/19) amplified from a synthetic construct together with RAS234/RAS233 (SEQ ID NO: 15/16) amplified from pUC18.
- the DNA modification template [UHA-M]-[lnsert]-[DHA-M] was PCR-amplified using the oligonucleotide pair RAS532/RAS535 (SEQ ID NQ:20/21 ), from a vector constructed by assembling the PCR products RAS531/RAS533 (SEQ ID NO: 22/23) and RAS534/RAS536 (SEQ ID NO: 24/25) amplified from QM6a genomic DNA together with RAS537/RAS538 (SEQ ID NO: 26/27) amplified from pUC18.
- T. reesei QM6a genome editing was carried out via protoplast transformation according to standard protocol (Penttila et al., Gene 1987). In a volume of 150 pL, approximately 5*10 A 6 protoplasts, 10 pmol per RGEN, and 0.2, 0.5 or 1 pmol per DNA modification template were used.
- Editing of ade2 in emerging red colonies was analyzed by colony-PCR using the oligonucleotide pair RAS531/RAS536 (SEQ ID NO: 22/25); successful homologous recombination with the DNA modification template [UHA-M]-[lnsert]- [DHA-M] was designed to result in distinguishable Eag ⁇ restriction patterns of colony- PCR products (86 bps + 941 bps + 1036 bps), compared with patterns from wild-type ade2 (86 bps + 1968 bps) or editing events by non-homologous end joining (NHEJ).
- Table 4 Multiplex genome engineering: stop codon insertion into acte2 while swapping hph with pyr2.
- Table 4 illustrates results of editing ade2 while swapping hph with pyr2 (2 nd step), using different amounts of DNA modification templates.
- the number of red colonies indicative for Aacte2 the number of white colonies indicative for no ade2 editing, and the fraction of 12 selected red colonies with colony-PCR product Eag ⁇ restriction patterns indicative for homologous recombination are shown.
- red colonies were observed for all tested concentrations of the DNA modification template [UHA-M]-[lnsert]-[DHA-M] (0.2, 0.5 and 1.0 pmol), and RAS531/RAS536 colony-PCR product Eag ⁇ restriction patterns indicated high frequency of homologous recombination.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Plant Pathology (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Medicinal Chemistry (AREA)
- Mycology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The disclosure relates to the field of molecular biology, to compositions and methods for the usage of selection marker swapping systems in microbial cells. Specifically, this disclosure pertains to compositions and methods for swapping between two selection marker constructs at a predetermined target sequence within a microbial genome, by replacing a first removable selection marker construct with a second removable selection marker construct, followed by the reverse replacement in consecutive transformation steps. Methods and compositions are also disclosed in which selection marker swapping systems are used in multiplex genome engineering, by combining selection marker swapping with simultaneously modifying at least one additional target sequence at a different genome sequence.
Description
TITLE
ITERATIVE MULTIPLEX GENOME ENGINEERING IN MICROBIAL CELLS USING A SELECTION MARKER SWAPPING SYSTEM
CROSS-REFERENCE TO RELATED APPLICATIONS
The present application claims the benefit of U.S. Provisional Patent Application Serial No. 63/385,663, filed December 1 , 2022, which is hereby incorporated in its entirety by reference.
FIELD OF INVENTION
The disclosure relates to the field of molecular biology, to compositions and methods for the usage of selection marker swapping systems in microbial cells. Specifically, this disclosure pertains to compositions and methods for swapping between two selection marker constructs at a predetermined target sequence within a microbial genome, by replacing a first removable selection marker construct with a second removable selection marker construct, followed by the reverse replacement in consecutive transformation steps. Methods and compositions are also disclosed in which selection marker swapping systems are used in multiplex genome engineering, by combining selection marker swapping with simultaneously modifying at least one additional target sequence at a different genome sequence.
REFERENCE TO SEQUENCE LISTING SUBMITTED ELECTRONICALLY
The official copy of the sequence listing is submitted electronically via Patent Center as an XML-formatted sequence listing with a file named 20231107_NB42141 PCT_sequencelisting. xml created on November ?, 2023, and having a size of 39,290 bytes and is filed concurrently with the specification. The sequence listing contained in this XML-formatted document is part of the specification and is herein incorporated by reference in its entirety.
BACKGROUND
Genetic engineering of microbial cells, such as, but not limited to filamentous fungi, is a cumbersome process (Li et al., 2017, Microb. Cel. I Fact 16: 168). Genetic engineering, in specific transformation, requires a method to create access to the genome, as well as a method to introduce a desired genome modification. Since
transformation is usually only achieved in a small fraction of cells within a cell population, it is necessary to introduce genetic markers, with the goal of conferring a growth advantage on successfully modified cells under selection conditions (Botstein et al., 1979, Gene, Volume 8, Issue 1 , pg. 17-24). Each modification requires the availability of a selection marker, and consecutive modifications within a strain lineage consequently require the availability of multiple selection markers. Alternatively, marker-recycling strategies can be applied, which is particularly important when the number of readily available selection markers is limited (Hartl and Seiboth, 2005, Curr. Genet. 48:204-211 ).
Standard methods for marker recycling include the use of so-called bidirectional selection markers, referring to marker systems that allow for both selection (positive selection after integration) and counter-selection (negative selection after inactivation or excision); bidirectional selection marker systems are often integrated with flanking repeat sequences, allowing for spontaneous loopingout of the marker cassette via homologous recombination (Alani et al., 1987, Genetics 116:541-545). The described constructs are usually integrated at the genome sequence intended for modification, and the marker cassette is sequentially excised by challenging the progeny of successfully modified cells with counterselection conditions (Alani et al., 1987, Genetics 116:541-545). Marker excision rates, however, can be slow in the case of microorganisms with a low frequency of spontaneous homologous recombination, such as most filamentous fungi (van den Hondel and Punt, 1991 , Applied Molecular Genetics of Fungi, Cambridge University Press, Cambridge, UK, pp. 1-28).
Accumulating multiple genome modifications within a cell may require sequential transformation steps that are labor-intensive and time-consuming. Both the selection process and the marker-recycling, in the case of limited selection marker availability, require single cells to grow out forming colonies, and cells within colonies must be isolated and analyzed. In the case of slow-growing microbial cells, this process can take up to several weeks. It is therefore desirable to establish methods that allow for multiple parallel modifications within a single transformation assays, and furthermore for more straightforward marker recycling strategies.
Recent advances in CRISPR/Cas-based genome engineering technology enable targeting of a wide range of sequences within a microbial genome, and via the introduction of double-strand breaks also for increased rates of homologous
recombination by the cellular homology-directed repair machinery (Schuster and Kahmann, 2019, Fungal Genet Biol 130:43-53; Song et al., 2019, Appl Microbiol Biotechnol, 103:6919-6932). This is achieved by employing RNA-guided endonucleases (RGENs) consisting of a Cas endonuclease together with a guide RNA that harbors a specific DNA-recognition region (i.e. , the variable targeting domain).
There remains a need for developing more efficient RGEN-based methods, and compositions thereof, allowing for iterative rounds of modifying one or multiple target sequences in the genome of microbial cells with limited availability of selection markers.
BRIEF SUMMARY
Compositions and methods for the usage of selection marker swapping systems in microbial cells are disclosed herein. Specifically, this disclosure pertains to compositions and methods for swapping between two selection marker constructs at a predetermined target sequence within a microbial genome, by replacing a first removable selection marker construct with a second removable selection marker construct, followed by the reverse replacement in consecutive transformation steps. Methods and compositions are also disclosed in which selection marker swapping systems are used in multiplex genome engineering, by combining selection marker swapping with simultaneously modifying at least one additional target sequence at a different genome sequence.
Described herein are genetic modification methods that do not rely on the laborious and time-consuming two-step marker-recycling process currently used in the art. Instead, a first selection marker cassette flanked by unique RNA-guided endonuclease (RGEN) target sequences integrated at a predetermined target sequence of a microbial cell is replaced by a second selection marker cassette flanked by different unique RGEN target sequences (referred to as selection marker swapping), wherein mentioned flanking unique RGEN target sequences enable the excision of the previously integrated selection marker cassette in consecutive transformation steps. Concomitantly with the described selection marker swapping at a predetermined target sequence, parallel modifications are performed at other target sequences, without the requirement to integrate additional selection markers.
In one embodiment of the disclosure, the method comprises a method for replacing a first selection marker construct integrated at a predetermined target
sequence of a microbial cell with a second selection marker construct, the method comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]-[Marker-2]-[C]) comprising a second selection marker ([Marker2]) flanked by a second unique target sequence ([C]), wherein said first RGEN in combination with said first DNA modification template enables the replacement of said first selection marker construct with said second selection marker construct via homologous recombination; and, c) identifying one or more microbial cells from (b) that has said second selection marker construct integrated at said predetermined target sequence.
In one embodiment of the disclosure, the method comprises a method for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct while simultaneously modifying at least one additional target sequence, the method comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]- [Marker-2]-[C]) comprising a second selection marker ([Marker-2]) flanked by a second unique target sequence ([C]), wherein said first RGEN in combination with said first DNA modification template enables the replacement of said first selection marker construct with said second selection marker construct via homologous recombination; c) simultaneously with step (b), introducing into the microbial cells of (a) a modification at said at least one additional target sequence; and, d) identifying one or more microbial cells from (c) that has said second selection marker construct replacing said first marker construct, and that has said modification at said at least one additional target sequence.
In one embodiment of the disclosure, the method comprises a method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell, the method comprising: a) providing one or more microbial cells having a second selection marker construct ([C]-[Marker-2]-[C]) integrated at a predetermined target sequence ([A]), wherein said second selection marker construct comprises a second selection marker ([Marker-2]) flanked by a first unique target sequence ([C]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]- [B]) comprising a first selection marker ([Marker-1]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the replacement of said second selection marker construct with said first selection marker construct via homologous recombination, thereby reestablishing said first selection marker construct at said predetermined target sequence; and, c) identifying one or more microbial cells from (b) that has said first selection marker construct reestablished at said predetermined target sequence.
In one embodiment of the disclosure, the method comprises a A method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell, the method comprising: a) providing one or more microbial cells having a second selection marker construct (referred to as [C]- [Marker2]-[C]) integrated at a predetermined target sequence ([A]), wherein said a second selection marker construct comprises a second selection marker ([Marker2]) flanked by a first unique target sequence ([C]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]- [B]) comprising a first selection marker ([Marked ]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the replacement of said second selection marker construct with said first selection marker construct via homologous recombination, thereby reestablishing said first selection marker construct at said predetermined target sequence; c) simultaneously with step (b), introducing into the microbial cells of (a) a modification at said at least one additional target sequence; and, d) identifying one or more microbial cells from (c) that has said first selection marker construct
reestablished at said predetermined target sequence and that has said modification at said at least one additional target sequence.
In one aspect, the modification at said at least one additional target sequence is selected from the group consisting of an insertion of a polynucleotide of interest, a deletion of a polynucleotide of interest, a replacement of a polynucleotide of interest, and any one combination thereof.
In one aspect, the microbial cells of (a) have at least one additional target sequence ([M]), and simultaneously introducing a modification comprises introducing at least a second RGEN (RGEN-M) and at least a second DNA modification template ([UHA-M]-[lnsert]-[DHA-M]) comprising a polynucleotide of interest ([Insert]), wherein said second RGEN in combination with said second DNA modification template enables the integration of said polynucleotide of interest at said at least one additional target sequence ([M]).
In one aspect, the microbial cells of (a) have at least a first additional target sequence [(Ma)] and a second additional target sequence p(M[3)] flanking a polynucleotide of interest to be deleted, and wherein said simultaneously introducing a modification comprises introducing at least a second RGEN (RGEN-Ma), a third RGEN (RGEN-M[3) and at least a third DNA modification template ([UHA-D]-[DHA- D]) comprising an Upstream Homology Arm ([UHA-D]) directly linked to Downstream Homology Arm ([DHA-D]), wherein said UHA-D and DHA-D are homologous to a genomic region of said microbial cell flanking said polynucleotide sequence of interest to be deleted, wherein said third RGEN-Ma and fourth RGEN-M[3 in combination with said third DNA modification template enables the deletion of said polynucleotide of interest.
In one aspect, the microbial cells of (a) have at least a first additional target sequence (Ma) and a second additional target sequence (M|3) flanking a first polynucleotide of interest to be replaced, and wherein said simultaneously introducing a modification comprises introducing at least a third RNA-guided endonuclease (RGEN-Ma), a fourth RNA guided endonuclease (RGEN-M[3) and at least a third DNA modification template ([UHA-M]-[lnsert]-[DHA-M]) comprising a second polynucleotide of interest, wherein said RGEN-Ma and RGEN-M[3 in combination with said third DNA modification template enables the replacement of said first polynucleotide sequence of interest with said second polynucleotide of interest
BRIEF DESCRIPTION OF THE DRAWINGS AND SEQUENCES
Figure 1. Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the first swapping step, using RGENs and DNA modification templates. 1st step: integrating the removable selection marker construct [B]-[Marker-1 ]-[B] at a predetermined target sequence ([A]); 2nd step: inserting a polynucleotide of interest ([Insert]) at an RGEN target sequence ([M]) while simultaneously swapping the previously integrated Marker-1 with Marker-2, by integrating the removable selection marker construct [C]-[Marker-2]-[C] in place of [B]-[Marker-1 ]-[B], For symbol explanation see Figure 9.
Figure 2 Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the second swapping step, using RGENs and DNA modification templates. 1st step: integrating the removable selection marker construct [C]-[Marker-2]-[C] at a predetermined target sequence ([A]); 2nd step: inserting a polynucleotide of interest ([Insert]) at an RGEN target sequence ([M]) while simultaneously swapping the previously integrated Marker-2 with Marker-1 , by integrating the removable selection marker construct [B]-[Marker-1]-[B] in place of [C]-[Marker-2]-[C]. For symbol explanation see Figure 9.
Figure 3. Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the first swapping step, using RGENs and DNA modification templates. 1st step: integrating the removable selection marker construct [B]-[Marker-1 ]-[B] at a predetermined target sequence ([A]); 2nd step: deleting a polynucleotide of interest ([Delete]) flanked by target sequences [Ma] and [M|3] while simultaneously swapping the previously integrated Marker-1 with Marker-2, by integrating the removable selection marker construct [C]-[Marker-2]-[C] in place of [B]-[Marker-1 ]-[B], For symbol explanation see Figure 9.
Figure 4. Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the second swapping step, using RGENs and DNA modification templates. 1st step: integrating the removable selection marker construct [C]-[Marker-2]-[C] at a predetermined target sequence ([A]); 2nd step: deleting a polynucleotide of interest
([Delete]) flanked by target sequences [Ma] and [M|3] while simultaneously swapping the previously integrated Marker-2 with Marker-1 , by integrating the removable selection marker construct [B]-[Marker-1]-[B] in place of [C]-[Marker-2]-[C]. For symbol explanation see Figure 9.
Figure 5. Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the first swapping step, using RGENs and DNA modification templates. 1st step: integrating the removable selection marker construct [B]-[Marker-1]-[B] at a predetermined target sequence ([A]); 2nd step: replacing a first polynucleotide of interest ([Delete]) flanked by target sequences [Ma] and [M|3] with a second polynucleotide of interest ([Insert]) while simultaneously swapping the previously integrated Marker-1 with Marker-2, by integrating the removable selection marker construct [C]-[Marker-2]- [C] in place of [B]-[Marker-1]-[B], For symbol explanation see Figure 9.
Figure 6. Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with the second swapping step, using RGENs and DNA modification templates. 1st step: integrating the removable selection marker construct [C]-[Marker-2]-[C] at a predetermined target sequence ([A]); 2nd step: replacing a first polynucleotide of interest ([Delete]) flanked by target sequences [Ma] and [M|3] with a second polynucleotide of interest ([Insert]) while simultaneously swapping the previously integrated Marker-2 with Marker-1 , by integrating the removable selection marker construct [B]-[Marker-1]-[B] in place of [C]-[Marker-2]-[C]. For symbol explanation see Figure 9.
Figure 7. Schematic representation of selection marker swapping while simultaneously introducing at least one additional modification in parallel with both swapping steps, using RGENs and DNA modification templates. 1st step: inserting a first polynucleotide of interest ([Insert-1]) at a first RGEN target sequence ([M1]) while simultaneously swapping a previously integrated Marker-2 with Marker-1 , by integrating the removable selection marker construct [B]-[Marker-1]-[B] in place of [C]-[Marker-2]-[C]; 2nd step: inserting a second polynucleotide of interest ([Insert-2]) at a second RGEN target sequence ([M2]) while simultaneously swapping the previously integrated Marker-1 with Marker-2, by integrating the removable selection marker construct [C]-[Marker-2]-[C] in place of [B]-[Marker-1]-[B], For symbol explanation see Figure 9.
Figure 8. Schematic representation of selection marker swapping while simultaneously introducing at least two additional modifications in parallel with both swapping steps, using RGENs and DNA modification templates. 1st step: inserting a first polynucleotide of interest ([Insert-1]) at a first RGEN target sequence ([M1]) and a second polynucleotide of interest ([Insert-2]) at a second RGEN target sequence ([M2]) while simultaneously swapping a previously integrated Marker-2 with Marker- 1 , by integrating the removable selection marker construct [B]-[Marker-1]-[B] in place of [C]-[Marker-2]-[C]; 2nd step: inserting a third polynucleotide of interest ([Insert-3]) at a third RGEN target sequence ([M3]) and a fourth polynucleotide of interest ([Insert-4]) at a fourth RGEN target sequence ([M4]) while simultaneously swapping the previously integrated Marker-1 with Marker-2, by integrating the removable selection marker construct [C]-[Marker-2]-[C] in place of [B]-[Marker-1 ]- [B], For symbol explanation see Figure 9.
Figure 9. Explanation of symbols used in Figures 1-8.
DETAILED DESCRIPTION
Compositions and methods for the usage of selection marker swapping systems in microbial cells are disclosed herein. Specifically, this disclosure pertains to compositions and methods for swapping between two selection marker constructs at a predetermined target sequence within a microbial genome, by replacing a first removable selection marker construct with a second removable selection marker construct, followed by the reverse replacement in consecutive transformation steps. Methods and compositions are also disclosed in which selection marker swapping systems are used in multiplex genome engineering, by combining selection marker swapping with simultaneously modifying at least one additional target sequence at a different genome sequence.
Described herein are genetic modification methods that do not rely on the laborious and time-consuming two-step marker-recycling process currently used in the art. Instead, a first selection marker cassette flanked by unique RNA-guided endonuclease (RGEN) target sequences integrated at a predetermined target sequence of a microbial cell is replaced by a second selection marker cassette flanked by different unique RGEN target sequences (referred to as selection marker swapping), wherein mentioned flanking unique RGEN target sequences enable the excision of the previously integrated selection marker cassette in consecutive
transformation steps. Concomitantly with the described selection marker swapping at a predetermined target sequence, parallel modifications are performed at other target sequences, without the requirement to integrate additional selection markers.
The present document is organized into a number of sections for ease of reading; however, the reader will appreciate that statements made in one section may apply to other sections. In this manner, the headings used for different sections of the disclosure should not be construed as limiting.
The headings provided herein are not limitations of the various aspects or embodiments of the present compositions and methods which can be had by reference to the specification as a whole. Accordingly, the terms defined immediately below are more fully defined by reference to the specification as a whole.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present compositions and methods belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present compositions and methods, representative illustrative methods and materials are now described.
All publications and patents cited in this specification are herein incorporated by reference as if each individual publication or patent were specifically and individually indicated to be incorporated by reference and are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.
As used herein, the term “disclosure” or “disclosed disclosure” is not meant to be limiting, but applies generally to any of the disclosures defined in the claims or described herein. These terms are used interchangeably herein.
Cas genes and proteins
CRISPR (clustered regularly interspaced short palindromic repeats) loci refers to certain genetic loci encoding components of DNA cleavage systems, for example, used by bacterial and archaeal cells to destroy foreign DNA (Horvath and Barrangou, 2010, Science 327:167-170; W02007/025097, published March 1 , 2007). A CRISPR locus can consist of a CRISPR array, comprising short direct repeats (CRISPR repeats) separated by short variable DNA sequences (called ‘spacers’), which can
be flanked by diverse Cas (CRISPR-associated) genes. The number of CRISPR- associated genes at a given CRISPR locus can vary between species. Multiple CRISPR/Cas systems have been described including Class 1 systems, with multisubunit effector complexes (comprising type I, type III and type IV subtypes), and Class 2 systems, with single protein effectors (comprising type II and type V subtypes, such as but not limiting to Cas9, Cpf1 , C2c1 , C2c2, C2c3). Class 1 systems (Makarova et al. 2015, Nature Reviews; Microbiology Vol. 13:1-15; Zetsche et al., 2015, Cell 163, 1-13; Shmakov et al., 2015, Molecular_Cell 60, 1-13; Haft et al., 2005, Computational Biology, PLoS Comput Biol 1 (6): e60. doi:10.1371 /journal. pcbi. 0010060 and WO 2013/176772 A1 published on November 23, 2013 incorporated by reference herein). The type II CRISPR/Cas system from bacteria employs a crRNA (CRISPR RNA) and tracrRNA (trans-activating CRISPR RNA) to guide the Cas endonuclease to its DNA target. The crRNA contains a spacer region complementary to one strand of the double strand DNA target and a region that base pairs with the tracrRNA (trans-activating CRISPR RNA) forming a RNA duplex that directs the Cas endonuclease to cleave the DNA target. Spacers are acquired through a not fully understood process involving Cas1 and Cas2 proteins. All type II CRISPR/Cas loci contain cas1 and cas2 genes in addition to the cas9 gene (Chylinski et al., 2013, RNA Biology 10:726-737; Makarova et al. 2015, Nature Reviews Microbiology Vol. 13:1-15). Type II CRISPR-Cas loci can encode a tracrRNA, which is partially complementary to the repeats within the respective CRISPR array, and can comprise other proteins such as Csn1 and Csn2. The presence of cas9 in the vicinity of Cas 1 and cas2 genes is the hallmark of type II loci (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13:1-15). Type I CRISPR- Cas (CRISPR-associated) systems consist of a complex of proteins, termed Cascade (CRISPR-associated complex for antiviral defense), which function together with a single CRISPR RNA (crRNA) and Cas3 to defend against invading viral DNA (Brouns, S.J.J. et al. Science 321 :960-964; Makarova et al. 2015, Nature Reviews; Microbiology Vol. 13:1-15, which are incorporated in their entirety herein).
The term “Cas gene” herein refers to a gene that is generally coupled, associated or close to, or in the vicinity of flanking CRISPR loci. The terms “Cas gene”, “cas gene”, “CRISPR-associated (Cas) gene” and “Clustered Regularly Interspaced Short Palindromic Repeats-associated gene” are used interchangeably herein.
The term “Cas protein” or “Cas polypeptide” refers to a polypeptide encoded by a Cas (CRISPR-associated) gene. A Cas protein includes a Cas endonuclease.
A Cas protein may be a bacterial or archaeal protein. Type l-lll CRISPR Cas proteins herein are typically prokaryotic in origin; type I and III Cas proteins can be derived from bacterial or archaeal species, whereas type II Cas proteins (i.e. , a Cas9) can be derived from bacterial species, for example. In other aspects, Cas proteins include one or more of Cas1 , Cas1 B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, Cas10, Csy1 , Csy2, Csy3, Cse1 , Cse2, Csc1 , Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1 , Cmr3, Cmr4, Cmr5, Cmr6, Csb1 , Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1 , Csx15, Csf1 , Csf2, Csf3, Csf4, homologs thereof, or modified versions thereof. A Cas protein includes a Cas9 protein, a Cpf1 protein, a C2c1 protein, a C2c2 protein, a C2c3 protein, Cas3, Cas3- HD, Cas 5, Cas7, Cas8, Casio, or combinations or complexes of these.
The term “Cas endonuclease” refers to a Cas polypeptide (Cas protein) that, when in complex with a suitable polynucleotide component, is capable of recognizing, binding to, and optionally nicking or cleaving all or part of a specific DNA target sequence. A Cas endonuclease is guided by the guide polynucleotide to recognize, bind to, and optionally nick or cleave all or part of a specific target sequence in double stranded DNA (e.g., at a target sequence in the genome of a cell). A Cas endonuclease described herein comprises one or more nuclease domains. The Cas endonucleases employed in genome DNA modification methods described herein are endonucleases that introduce single or double-strand breaks into the DNA at the genome target sequence. Alternatively, a Cas endonuclease may lack DNA cleavage or nicking activity, but can still specifically bind to a DNA target sequence when complexed with a suitable RNA component.
As used herein, a polypeptide referred to as a “Cas9” (formerly referred to as Cas5, Csn1 , or Csx12) or a “Cas9 endonuclease” or having “Cas9 endonuclease activity” refers to a Cas endonuclease that forms a complex with a crNucleotide and a tracrNucleotide, or with a single guide polynucleotide, for specifically binding to, and optionally nicking or cleaving all or part of a DNA target sequence. A Cas9 endonuclease comprises a RuvC nuclease domain and an HNH (H-N-H) nuclease domain, each of which can cleave a single DNA strand at a target sequence (the concerted action of both domains leads to DNA double-strand cleavage, whereas activity of one domain leads to a nick). In general, the RuvC domain comprises
subdomains I, II and III, where domain I is located near the N-terminus of Cas9 and subdomains II and III are located in the middle of the protein, flanking the HNH domain (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13:1-15, Hsu et al, 2013, Cell 157:1262-1278). Cas9 endonucleases are typically derived from a type II CRISPR system, which includes a DNA cleavage system utilizing a Cas9 endonuclease in complex with at least one polynucleotide component. For example, a Cas9 can be in complex with a CRISPR RNA (crRNA) and a trans-activating CRISPR RNA (tracrRNA). In another example, a Cas9 can be in complex with a single guide RNA (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13:1-15).
A “functional fragment “, “fragment that is functionally equivalent” and “functionally equivalent fragment” of a Cas endonuclease are used interchangeably herein, and refer to a portion or subsequence of the Cas endonuclease in which the ability to recognize, bind to, and optionally unwind, nick or cleave (introduce a single or double-strand break in) the target sequence is retained.
The terms “functional variant “, “variant that is functionally equivalent” and “functionally equivalent variant” of a Cas endonuclease of the present disclosure, are used interchangeably herein, and refer to a variant of the Cas endonuclease of the present disclosure in which the ability to recognize, bind to, and optionally unwind, nick or cleave all or part of a target sequence is retained.
Determining binding activity and/or endonucleolytic activity of a Cas protein herein toward a specific target DNA sequence may be assessed by any suitable assay known in the art, such as disclosed in U.S. Patent No. 8697359, which is disclosed herein by reference. A determination can be made, for example, by expressing a Cas protein and suitable RNA component in host cell/organism, and then examining the predicted DNA target sequence for the presence of an indel (a Cas protein in this particular assay would have endonucleolytic activity [single or double-strand cleaving activity]). Examining for the presence of an indel at the predicted target sequence could be done via a DNA sequencing method or by inferring indel formation by assaying for loss of function of the target sequence, for example. In another example, Cas protein activity can be determined by expressing a Cas protein and suitable RNA component in a host cell/organism that has been provided a DNA modification template comprising a sequence homologous to a sequence in at or near the target sequence. The presence of DNA modification
template at the target sequence (such as would be predicted by successful HR between the donor and target sequences) would indicate that targeting occurred.
Non limiting examples of Cas endonucleases herein can be Cas endonucleases from any of the following genera: Aeropyrum, Pyrobaculum, Sulfolobus, Archaeoglobus, Haloarcula, Methanobacteriumn, Methanococcus, Methanosarcina, Methanopyrus, Pyrococcus, Picrophilus, Thernioplasnia, Corynebacterium, Mycobacterium, Streptomyces, Aquifrx, Porphvromonas, Chlorobium, Thermus, Bacillus, Listeria, Staphylococcus, Clostridium, Thermoanaerobacter, Mycoplasma, Fusobacterium, Azarcus, Chromobacterium, Neisseria, Nitrosomonas, Desulfovibrio, Geobacter, Myrococcus, Campylobacter, Wolinella, Acinetobacter, Erwinia, Escherichia, Legionella, Methylococcus, Pasteurella, Photobacterium, Salmonella, Xanthomonas, Yersinia, Streptococcus, Treponema, Francisella, or Thermotoga. Furthermore, a Cas endonuclease herein can be encoded, for example, by any Cas endonuclease as disclosed in U.S. Appl. Publ. No. 2010/0093617, which is incorporated herein by reference.
Furthermore, a Cas9 endonuclease herein may be derived from a Streptococcus (e.g., S. pyogenes, S. pneumoniae, S. thermophilus, S. agalactiae, S. parasanguinis, S. oralis, S. salivarius, S. macacae, S. dysgalactiae, S. anginosus, S. constellatus, S. pseudoporcinus, S. mutans), Listeria (e.g., L innocua), Spiroplasma (e.g., S. apis, S. syrphidicola), Peptostreptococcaceae, Atopobium, Porphyromonas (e.g., P. catoniae), Prevotella (e.g., P. intermedia), Veillonella, Treponema (e.g., T. socranskii, T. denticola), Capnocytophaga, Finegoldia (e.g., F. magna), Coriobacteriaceae (e.g., C. bacterium), Olsenella (e.g., 0. profusa), Haemophilus (e.g., H. sputorum, H. pittmaniae), Pasteurella (e.g., P. bettyae), Olivibacter (e.g., O. sifiensis), Epilithonimonas (e.g., E. tenax), Mesonia (e.g., M. mobilis), Lactobacillus (e.g., L plantarum), Bacillus (e.g., B. cereus), Aquimarina (e.g., A. muelleri), Chryseobacterium (e.g., C. palustre), Bacteroides (e.g., B. graminisolvens), Neisseria (e.g., N. meningitidis), Francisella (e.g., F. novicida), or Flavobacterium (e.g., F. frigidarium, F. soli) species, for example. In one aspect a S. pyogenes Cas9 endonuclease is described herein. As another example, a Cas9 endonuclease can be any of the Cas9 proteins disclosed in Chylinski et al. (RNA Biology 10:726-737), which is incorporated herein by reference.
The sequence of a Cas9 endonuclease herein can comprise, for example, any of the Cas9 amino acid sequences disclosed in GenBank Accession Nos.
G3ECR1 (S. thermophilus), WP_026709422, WP_027202655, WP_027318179, WP-027347504, WP_027376815, WP_027414302, WP_027821588, WP_027886314, WP_027963583, WP_028123848, WP_028298935, Q03JI6 (S. thermophilus), EGP66723, EGS38969, EGV05092, EHI65578 (S. pseudoporcinus), EIC75614 (S. oralis), EID22027 (S. constellatus), EIJ69711 , EJP22331 (S. oralis), EJP26004 (S. anginosus), EJP30321 , EPZ44001 (S. pyogenes), EPZ46028 (S. pyogenes), EQL78043 (S. pyogenes), EQL78548 (S. pyogenes), ERL10511 , ERL12345, ERL19088 (S. pyogenes), ESA57807 (S. pyogenes), ESA59254 (S. pyogenes), ESU85303 (S. pyogenes), ETS96804, UC75522, EGR87316 (S. dysgalactiae), EGS33732, EGV01468 (S. oralis), EHJ52063 (S. macacae), EID26207 (S. oralis), EID33364, EIG27013 (S. parasanguinis), EJF37476, EJO19166 (Streptococcus sp. BS35b), EJU16049, EJU32481 , YP_006298249, ERF61304, ERK04546, ETJ95568 (S. agalactiae), TS89875, ETS90967 ( Streptococcus sp. SR4), ETS92439, EUB27844 ( Streptococcus sp. BS21 ), AFJ08616, E U C82735 (Streptococcus sp. CM6), EWC92088, EWC94390, EJP25691 , YP_008027038, YP_008868573, AGM26527, AHK22391 , AHB36273, Q927P4, G3ECR1 , or Q99ZW2 (S. pyogenes), which are incorporated by reference. Alternatively, a Cas9 protein herein can be encoded by any of SEQ ID NOs:462 (S. thermophilus), 474 (S. thermophilus), 489 (S. agalactiae), 494 (S. agalactiae), 499 (S. mutans), 505 (S. pyogenes), or 518 (S. pyogenes) as disclosed in U.S. Appl. Publ. No. 2010/0093617 (incorporated herein by reference), for example.
Given that certain amino acids share similar structural and/or charge features with each other (i.e. , conserved), the amino acid at each position in a Cas9 can be as provided in the disclosed sequences or substituted with a conserved amino acid residue (“conservative amino acid substitution”) as follows:
1 . The following small aliphatic, nonpolar or slightly polar residues can substitute for each other: Ala (A), Ser (S), Thr (T), Pro (P), Gly (G);
2. The following polar, negatively charged residues and their amides can substitute for each other: Asp (D), Asn (N), Glu (E), Gin (Q);
3. The following polar, positively charged residues can substitute for each other: His (H), Arg (R), Lys (K);
4. The following aliphatic, nonpolar residues can substitute for each other: Ala (A), Leu (L), lie (I), Vai (V), Cys (C), Met (M); and
5. The following large aromatic residues can substitute for each other: Phe (F), Tyr (Y), Trp (W).
Fragments and variants can be obtained via methods such as site-directed mutagenesis and synthetic construction. Methods for measuring endonuclease activity are well known in the art such as, but not limiting to, PCT/US 13/39011 , filed May 1 , 2013, PCT/US16/32073 filed May 12, 2016, PCT/US 16/32028 filed May 12, 2016, incorporated by reference herein).
The Cas endonuclease can comprise a modified form of the Cas polypeptide. The modified form of the Cas polypeptide can include an amino acid change (e.g., deletion, integration, or substitution) that reduces the naturally occurring nuclease activity of the Cas protein. For example, in some instances, the modified form of the Cas protein has less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, less than 5%, or less than 1% of the nuclease activity of the corresponding wild-type Cas polypeptide (US patent application US20140068797 A1 , published on March 6, 2014). In some cases, the modified form of the Cas polypeptide has no substantial nuclease activity and is referred to as catalytically “inactivated Cas” or “deactivated Cas (dCas).” An inactivated Cas/deactivated Cas includes a deactivated Cas endonuclease (dCas). A catalytically inactive Cas can be fused to a heterologous sequence. Other Cas9 variants lack the activity of either the HNH or the RuvC nuclease domains and are thus proficient to cleave only 1 strand of the DNA (nickase variants).
Recombinant DNA constructs expressing the Cas endonuclease described herein can be transiently integrated into a microbial cell or stably integrated into the genome of a microbial cell. Cas protein fusions
A Cas endonuclease can be part of a fusion protein comprising one or more heterologous protein domains (e.g., 1 , 2, 3, or more domains in addition to the Cas polypeptide). Such a fusion protein may comprise any additional protein sequence, and optionally a linker sequence between any two domains, such as between Cas polypeptide and a first heterologous domain. Examples of protein domains that may be fused to a Cas polypeptide include, without limitation, epitope tags (e.g., histidine [His], V5, FLAG, influenza hemagglutinin [HA], myc, VSV-G, thioredoxin [Trx]), reporters (e.g., glutathione-5-transferase [GST], horseradish peroxidase [HRP], chloramphenicol acetyltransferase [CAT], beta-galactosidase, beta-glucuronidase
[GUS], luciferase, green fluorescent protein [GFP], HcRed, DsRed, cyan fluorescent protein [CFP], yellow fluorescent protein [YFP], blue fluorescent protein [BFP]), and domains having one or more of the following activities: methylase activity, demethylase activity, transcription activation activity (e.g., VP16 or VP64), transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity and nucleic acid binding activity. A Cas endonuclease can also be in fusion with a protein that binds DNA molecules or other molecules, such as maltose binding protein (MBP), S-tag, Lex A DNA binding domain (DBD), GAL4A DNA binding domain, and herpes simplex virus (HSV) VP16.
A Cas endonuclease can comprise a heterologous regulatory element such as a nuclear localization sequence (NLS). A heterologous NLS amino acid sequence may be of sufficient strength to drive accumulation of a Cas endonuclease in a detectable amount in the nucleus of a cell herein. An NLS may comprise one (monopartite) or more (e.g., bipartite) short sequences (e.g., 2 to 20 residues) of basic, positively charged residues (e.g., lysine and/or arginine), and can be located anywhere in a Cas amino acid sequence but such that it is exposed on the protein surface. An NLS may be operably linked to the N-terminus or C-terminus of a Cas protein herein, for example. Two or more NLS sequences can be linked to a Cas protein, for example, such as on both the N- and C-termini of a Cas protein. The Cas gene can be operably linked to a SV40 nuclear targeting signal upstream of the Cas codon region and a bipartite VirD2 nuclear localization signal (Tinland et al. (1992) Proc. Natl. Acad. Sci. USA 89:7442-6) downstream of the Cas codon region. Non-limiting examples of suitable NLS sequences herein include those disclosed in U.S. Patent Nos. 6660830 and 7309576, which are both incorporated by reference herein. A heterologous NLS amino acid sequence include plant, viral and mammalian nuclear localization signals.
A catalytically active and/ or inactive Cas endonuclease, can be fused to a heterologous sequence (US patent application US20140068797 A1 , published on March 6, 2014). Suitable fusion partners include, but are not limited to, a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target DNA or on a polypeptide (e.g., a histone or other DNA-binding protein) associated with the target DNA. Additional suitable fusion partners include, but are not limited to, a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase
activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity. Further suitable fusion partners include, but are not limited to, a polypeptide that directly provides for increased transcription of the target nucleic acid (e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator, etc.). A catalytically inactive Cas9 endonuclease can also be fused to a Fokl nuclease to generate double-strand breaks (Guilinger et al. Nature biotechnology, volume 32, number s, June 2014).
Guide polynucleotide, guide RNA
As used herein, the term “guide polynucleotide”, relates to a polynucleotide sequence that can form a complex with a Cas endonuclease, and enables the Cas endonuclease to recognize, bind to, and optionally nick or cleave a DNA target sequence (also referred to as target sequence). The guide polynucleotide can be a single molecule or a double molecule. The guide polynucleotide sequence can be a RNA sequence, a DNA sequence, or a combination thereof (a RNA-DNA combination sequence). Optionally, the guide polynucleotide can comprise at least one nucleotide, phosphodiester bond or linkage modification such as, but not limited, to Locked Nucleic Acid (LNA), 5-methyl dC, 2,6-Diaminopurine, 2’-Fluoro A, 2’- Fluoro U, 2'-O-Methyl RNA, phosphorothioate bond, linkage to a cholesterol molecule, linkage to a polyethylene glycol molecule, linkage to a spacer 18 (hexaethylene glycol chain) molecule, or 5’ to 3’ covalent linkage resulting in circularization. A guide polynucleotide that solely comprises ribonucleic acids is also referred to as a “guide RNA” or “gRNA”.
The guide polynucleotide can be a double molecule (also referred to as duplex guide polynucleotide) comprising a crNucleotide sequence and a tracrNucleotide sequence. The crNucleotide includes a first nucleotide sequence domain (referred to as Variable Targeting domain or VT domain) that can hybridize to a nucleotide sequence in a target DNA and a second nucleotide sequence (also referred to as a tracr mate sequence) that is part of a Cas endonuclease recognition (CER) domain. The tracr mate sequence can hybridized to a tracrNucleotide along a region of complementarity and together form the Cas endonuclease recognition
domain or CER domain. The CER domain is capable of interacting with a Cas endonuclease polypeptide. The crNucleotide and the tracrNucleotide of the duplex guide polynucleotide can be RNA, DNA, and/or RNA-DNA- combination sequences. (U.S. Patent Application US20150082478, published on March 19, 2015 and US20150059010, published on February 26, 2015, both are herein incorporated by reference). In some embodiments, the crNucleotide molecule of the duplex guide polynucleotide is referred to as “crDNA” (when composed of a contiguous stretch of DNA nucleotides) or “crRNA” (when composed of a contiguous stretch of RNA nucleotides), or “crDNA-RNA” (when composed of a combination of DNA and RNA nucleotides). The crNucleotide can comprise a fragment of the crRNA naturally occurring in Bacteria and Archaea. The size of the fragment of the crRNA naturally occurring in Bacteria and Archaea that can be present in a crNucleotide disclosed herein can range from, but is not limited to, 2, 3, 4, 5, 6, 7, 8, 9,10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides. In some embodiments the tracrNucleotide is referred to as “tracrRNA” (when composed of a contiguous stretch of RNA nucleotides) or “tracrDNA” (when composed of a contiguous stretch of DNA nucleotides) or “tracrDNA-RNA” (when composed of a combination of DNA and RNA nucleotides. In certain embodiments, the RNA that guides the RNA/ Cas9 endonuclease complex is a duplexed RNA comprising a duplex crRNA-tracrRNA.
The guide polynucleotide includes a dual RNA molecule comprising a chimeric non-naturally occurring crRNA (non-covalently) linked to at least one tracrRNA. A chimeric non-naturally occurring crRNA includes a crRNA that comprises regions that are not found together in nature (i.e. , they are heterologous with each other). For example, a non-naturally occurring crRNA is a crRNA wherein the naturally occurring spacer sequence is exchanged for a heterologous Variable Targeting domain. A non-naturally occurring crRNA comprises a first nucleotide sequence domain (referred to as Variable Targeting domain or VT domain) that can hybridize to a nucleotide sequence in a target DNA, linked to a second nucleotide sequence (also referred to as a tracr mate sequence) such that the first and second sequence are not found linked together in nature.
The guide polynucleotide can also be a single molecule (also referred to as single guide polynucleotide) comprising a crNucleotide sequence linked to a tracrNucleotide sequence. The single guide polynucleotide comprises a first nucleotide sequence domain (referred to as Variable Targeting domain or VT
domain) that can hybridize to a nucleotide sequence in a target DNA and a Cas endonuclease recognition domain (CER domain), that interacts with a Cas endonuclease polypeptide. By “domain” it is meant a contiguous stretch of nucleotides that can be RNA, DNA, and/or RNA-DNA-combination sequence. The VT domain and /or the CER domain of a single guide polynucleotide can comprise a RNA sequence, a DNA sequence, or a RNA-DNA-combination sequence. The single guide polynucleotide being comprised of sequences from the crNucleotide and the tracrNucleotide may be referred to as “single guide RNA” (when composed of a contiguous stretch of RNA nucleotides) or “single guide DNA” (when composed of a contiguous stretch of DNA nucleotides) or “single guide RNA-DNA” (when composed of a combination of RNA and DNA nucleotides). The single guide polynucleotide can form a complex with a Cas endonuclease, wherein said guide polynucleotide/Cas endonuclease complex (also referred to as a guide polynucleotide/Cas endonuclease system) can direct the Cas endonuclease to a genome target sequence, enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double-strand break) the target sequence.
The term “variable targeting domain” or “VT domain” is used interchangeably herein and includes a nucleotide sequence that can hybridize (is complementary) to one strand (nucleotide sequence) of a double strand DNA target sequence. The % complementation between the first nucleotide sequence domain (VT domain) and the target sequence can be at least 50%, 51 %, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 63%, 65%, 66%, 67%, 68%, 69%, 70%, 71%,
72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81 %, 82%, 83%, 84%, 85%,
86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or
100%. The variable targeting domain can be at least 12, 13, 14, 15, 16, 17, 18, 19,
20, 21 , 22, 23, 24, 25, 26, 27, 28, 29 or 30 nucleotides in length.
The variable targeting domain can comprises a contiguous stretch of 12 to 30, 12 to 29, 12 to 28, 12 to 27, 12 to 26, 12 to 25, 12 to 26, 12 to 25, 12 to 24, 12 to 23,
12 to 22, 12 to 21 , 12 to 20, 12 to 19, 12 to 18, 12 to 17, 12 to 16, 12 to 15, 12 to 14,
12 to 13, 13 to 30, 13 to 29, 13 to 28, 13 to 27, 13 to 26, 13 to 25, 13 to 26, 13 to 25,
13 to 24, 13 to 23, 13 to 22, 13 to 21 , 13 to 20, 13 to 19, 13 to 18, 13 to 17, 13 to 16,
13 to 15, 13 to 14, 14 to 30, 14 to 29, 14 to 28, 14 to 27, 14 to 26, 14 to 25, 14 to 26,
14 to 25, 14 to 24, 14 to 23, 14 to 22, 14 to 21 , 14 to 20, 14 to 19, 14 to 18, 14 to 17,
14 to 16, 14 to 15, 15 to 30, 15 to 29, 15 to 28, 15 to 27, 15 to 26, 15 to 25, 15 to 26,
15 to 25, 15 to 24, 15 to 23, 15 to 22, 15 to 21 , 15 to 20, 15 to 19, 15 to 18, 15 to 17,
15 to 16, 16 to 30, 16 to 29, 16 to 28, 16 to 27, 16 to 26, 16 to 25, 16 to 24, 16 to 23,
16 to 22, 16 to 21 , 16 to 20, 16 to 19, 16 to 18, 16 to 17, 17 to 30, 17 to 29, 17 to 28,
17 to 27, 17 to 26, 17 to 25, 17 to 24, 17 to 23, 17 to 22, 17 to 21 , 17 to 20, 17 to 19,
17 to 18, 18 to 30, 18 to 29, 18 to 28, 18 to 27, 18 to 26, 18 to 25, 18 to 24, 18 to 23,
18 to 22, 18 to 21 , 18 to 20, 18 to 19, 19 to 30, 19 to 29, 19 to 28, 19 to 27, 19 to 26,
19 to 25, 19 to 24, 19 to 23, 19 to 22, 19 to 21 , 19 to 20, 20 to 30, 20 to 29, 20 to 28,
20 to 27, 20 to 26, 20 to 25, 20 to 24, 20 to 23, 20 to 22, 20 to 21 , 21 to 30, 21 to 29,
21 to 28, 21 to 27, 21 to 26, 21 to 25, 21 to 24, 21 to 23, 21 to 22, 22 to 30, 22 to 29,
22 to 28, 22 to 27, 22 to 26, 22 to 25, 22 to 24, 22 to 23, 23 to 30, 23 to 29, 23 to 28,
23 to 27, 23 to 26, 23 to 25, 23 to 24, 24 to 30, 24 to 29, 24 to 28, 24 to 27, 24 to 26,
24 to 25, 25 to 30, 25 to 29, 25 to 28, 25 to 27, 25 to 26, 26 to 30, 26 to 29, 26 to 28,
26 to 27, 27 to 30, 27 to 29, 27 to 28, 28 to 30, 28 to 29, or 29 to 30 nucleotides.
The variable targeting domain can be composed of a DNA sequence, a RNA sequence, a modified DNA sequence, a modified RNA sequence, or any combination thereof. The VT domain can be complementary to target sequences derived from prokaryotic or eukaryotic DNA.
The term “Cas endonuclease recognition domain” or “CER domain” (of a guide polynucleotide) is used interchangeably herein and includes a nucleotide sequence that interacts with a Cas endonuclease polypeptide. A CER domain comprises a tracrNucleotide mate sequence followed by a tracrNucleotide sequence. The CER domain can be composed of a DNA sequence, a RNA sequence, a modified DNA sequence, a modified RNA sequence (see for example US 2015- 0059010 A1 , published on February 26, 2015, incorporated in its entirety by reference herein), or any combination thereof.
The nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide can comprise a RNA sequence, a DNA sequence, or a RNA-DNA combination sequence. In one embodiment, the nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide (also referred to as “loop”) can be at least 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37,
38, 39, 40, 41 , 42, 43, 44, 45, 46, 47, 48, 49, 50, 51 , 52, 53, 54, 55, 56, 57, 58, 59,
60, 61 , 62, 63, 64, 65, 66, 67, 68, 69, 70, 71 , 72, 73, 74, 75, 76, 77, 78, 78, 79, 80,
81 , 82, 83, 84, 85, 86, 87, 88, 89, 90, 91 , 92, 93, 94, 95, 96, 97, 98, 99 or 100
nucleotides in length. The loop can be 3-4, 3-5, 3-6, 3-7, 3-8, 3-9, 3-10, 3-11 , 3-12,
3-13, 3-14, 3-15, 3-20, 3-30, 3-40, 3-50, 3-60, 3-70, 3-80, 3-90, 3-100, 4-5, 4-6, 4-7,
4-8, 4-9, 4-10, 4-11 , 4-12, 4-13, 4-14, 4-15, 4-20, 4-30, 4-40, 4-50, 4-60, 4-70, 4-80, 4-90, 4-100, 5-6, 5-7, 5-8, 5-9, 5-10, 5-11 , 5-12, 5-13, 5-14, 5-15, 5-20, 5-30, 5-40, 5- 50, 5-60, 5-70, 5-80, 5-90, 5-100, 6-7, 6-8, 6-9, 6-10, 6-11 , 6-12, 6-13, 6-14, 6-15, 6- 20, 6-30, 6-40, 6-50, 6-60, 6-70, 6-80, 6-90, 6-100, 7-8, 7-9, 7-10, 7-11 , 7-12, 7-13, 7-14, 7-15, 7-20, 7-30, 7-40, 7-50, 7-60, 7-70, 7-80, 7-90, 7-100, 8-9, 8-10, 8-11 , 8- 12, 8-13, 8-14, 8-15, 8-20, 8-30, 8-40, 8-50, 8-60, 8-70, 8-80, 8-90, 8-100, 9-10, 9- 11 , 9-12, 9-13, 9-14, 9-15, 9-20, 9-30, 9-40, 9-50, 9-60, 9-70, 9-80, 9-90, 9-100, IQ- 20, 20-30, 30-40, 40-50, 50-60, 70-80, 80-90 or 90-100 nucleotides in length.
In another aspect, the nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide can comprise a tetraloop sequence, such as, but not limiting to a GAAA tetraloop sequence.
The single guide polynucleotide includes a chimeric non-naturally occurring single guide RNA. The terms “single guide RNA" and “sgRNA” are used interchangeably herein and relate to a synthetic fusion of two RNA molecules, a crRNA (CRISPR RNA) comprising a variable targeting domain (linked to a tracr mate sequence that hybridizes to a tracrRNA), fused to a tracrRNA (trans-activating CRISPR RNA). A chimeric non-naturally occurring guide RNA comprising regions that are not found together in nature (i.e. , they are heterologous with each other). For example, a chimeric non-naturally occurring guide RNA comprising a first nucleotide sequence domain (referred to as Variable Targeting domain or VT domain) that can hybridize to a nucleotide sequence in a target DNA, linked to a second nucleotide sequence that can recognize the Cas endonuclease, such that the first and second nucleotide sequence are not found linked together in nature.
The chimeric non-naturally occurring guide RNA can comprise a crRNA or and a tracrRNA of the type II CRISPR/Cas system that can form a complex with a type II Cas endonuclease, wherein said guide RNA/Cas endonuclease complex can direct the Cas endonuclease to a DNA target sequence, enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double-strand break) the DNA target sequence.
The guide polynucleotide can be produced by any method known in the art, including chemically synthesizing guide polynucleotides (such as but not limiting to Hendel et al. 2015, Nature Biotechnology 33, 985-989), in vitro generated guide
polynucleotides, and/or self-splicing guide RNAs (such as but not limiting to Xie et al. 2015, PNAS 112:3570-3575).
A method of expressing RNA components such as guide RNA in prokaryotic cells for performing Cas9-mediated DNA targeting have been described (WO20 16/099887 published on June 23, 2016 and WO2018/156705 published on August 30, 2018)
In some aspects, a subject nucleic acid (e.g., a guide polynucleotide, a nucleic acid comprising a nucleotide sequence encoding a guide polynucleotide; a nucleic acid encoding Cas protein; a crRNA or a nucleotide encoding a crRNA, a tracrRNA or a nucleotide encoding a tracrRNA, a nucleotide encoding a VT domain, a nucleotide encoding a CPR domain, etc.) comprises a modification or sequence that provides for an additional desirable feature (e.g., modified or regulated stability; subcellular targeting; tracking, e.g., a fluorescent label; a binding site for a protein or protein complex; etc.). Nucleotide sequence modification of the guide polynucleotide, VT domain and/or CER domain can be selected from, but not limited to , the group consisting of a 5' cap, a 3' polyadenylated tail, a riboswitch sequence, a stability control sequence, a sequence that forms a dsRNA duplex, a modification or sequence that targets the guide poly nucleotide to a subcellular location, a modification or sequence that provides for tracking , a modification or sequence that provides a binding site for proteins , a Locked Nucleic Acid (LNA), a 5-methyl dC nucleotide, a 2,6-Diaminopurine nucleotide, a 2’-Fluoro A nucleotide, a 2’-Fluoro U nucleotide; a 2'-O-Methyl RNA nucleotide, a phosphorothioate bond, linkage to a cholesterol molecule, linkage to a polyethylene glycol molecule, linkage to a spacer 18 molecule, a 5’ to 3’ covalent linkage, or any combination thereof. These modifications can result in at least one additional beneficial feature, wherein the additional beneficial feature is selected from the group of a modified or regulated stability, a subcellular targeting, tracking, a fluorescent label, a binding site for a protein or protein complex, modified binding affinity to complementary target sequence, modified resistance to cellular degradation, and increased cellular permeability.
Guided Cas systems (RGENs)
The terms “RGEN” , “RNA-guided endonuclease”, “guide RNA/Cas endonuclease complex”, “guide RNA/Cas endonuclease system”, “guide RNA/Cas
complex”, “guide RNA/Cas system”, “gRNA/Cas complex”, “gRNA/Cas system”, “RNP”, “ribonucleoprotein”, are used interchangeably herein and refer to at least one RNA component and at least one Cas endonuclease, that are capable of forming a complex, wherein said guide RNA/Cas endonuclease complex can direct the Cas endonuclease to a DNA target sequence (also referred to as a DNA target sequence), enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double-strand break) the DNA target sequence. An RGEN herein typically has specific DNA targeting activity, given its association with at least one RNA component.
Briefly, an RNA component of an RGEN contains sequence that is complementary to a DNA sequence in a target sequence. Based on this complementarity, an RGEN can specifically recognize and cleave a particular DNA target sequence. An RGEN herein can comprise Cas protein(s) and suitable RNA component(s) of any of the four known CRISPR systems (Horvath and Barrangou, Science 327:167-170) such as a type I, II, or III CRISPR system. An RGEN in preferred embodiments comprises a Cas9 endonuclease (CRISPR II system) and at least one RNA component (e.g., a crRNA and tracrRNA, or a gRNA).
Any guided endonuclease can be used in the methods disclosed herein. Such endonucleases include, but are not limited to Cas9 and Cpf1 endonucleases. Many endonucleases have been described to date that can recognize specific PAM sequences (see for example -US patent applicationl 4/772711 filed March 12, 2014 and Zetsche B et al. 2015. Cell 163, 1013) and cleave the target DNA at a specific positions. It is understood that based on the methods and embodiments described herein utilizing a guided Cas system, one can now tailor these methods such that they can utilize any guided endonuclease system.
The present disclosure further provides expression constructs for expressing in a microbial cell a guide RNA/Cas system that is capable of recognizing, binding to, and optionally nicking, unwinding, or cleaving all or part of a target sequence.
Expression cassettes and Recombinant DNA constructs
Polynucleotides disclosed herein, such as a polynucleotide of interests, a synthetic sequence of interest, a heterologous sequence of interest, a homologous sequence of interest, a gene of interest, can be provided in an expression cassette (also referred to as DNA construct) for expression in an organism of interest.
The term “expression”, as used herein, refers to the production of a functional end-product (e.g., a crRNA, a tracrRNA, a mRNA, a guide RNA, sRNA, siRNA, antisense RNA, or a polypeptide (protein) in either precursor or mature form. The term "expression" includes any step involved in the production of a polypeptide including, but not limited to, transcription, post-transcriptional modification, translation, post- translational modification, and secretion.
The expression cassette can include 5' and 3' regulatory sequences and or tags and synthetic sequences operably linked to a polynucleotide as disclosed herein.
The expression cassettes disclosed herein may include in the 5'-3' direction of transcription, a transcriptional and translational initiation region (i.e. , a promoter), a 5’ untranslated region, polynucleotides encoding various proteins tags and sequences, a polynucleotide of interest, and a transcriptional and translational termination region (i.e., termination region) functional in the Micorbial(host) cell. Expression cassettes are also provided with a plurality of restriction sites and/or recombination sites for integration of the polynucleotide to be under the transcriptional regulation of the regulatory regions described elsewhere herein. The regulatory regions (i.e., promoters, transcriptional regulatory regions, and translational termination regions) and/or the polynucleotide of interest may be native/analogous to the host cell or to each other. Other polynucleotide sequences encoding various protein sequences may be appended to either the 5’ or 3’ end of the polynucleotide of interest. Alternatively, the regulatory regions and/or the polynucleotide of interest may be heterologous to the host cell or to each other.
In certain embodiments the polynucleotides disclosed herein can be stacked with any combination of polynucleotide sequences of interest or expression cassettes as disclosed elsewhere herein or known in the art. The stacked polynucleotides may be operably linked to the same promoter as the initial polynucleotide, or may be operably linked to a separate promoter polynucleotide.
Expression cassettes may comprise a promoter operably linked to a polynucleotide of interest, along with a corresponding termination region. The termination region may be native to the transcriptional initiation region, may be native to the operably linked polynucleotide of interest or to the promoter sequences, may be native to the host organism, or may be derived from another source (i.e., foreign or heterologous). Convenient termination regions are available from phage
sequences, eg. lambda phage to termination region or strong terminators from prokaryotic ribosomal RNA operons or genes involved in the secretion of extracellular proteins (eg. aprE from B. subtilis, aprL from B. licheniformis). Convenient termination regions are available from the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also Guerineau et al. (1991 ) Mol. Gen. Genet. 262:141-144; Proudfoot (1991 ) Cell 64:671-674; Sanfacon et al. (1991) Genes Dev. 5:141-149; Mogen et al. (1990) Plant Cell 2: 1261 -1272; Munroe et al. (1990) Gene 91 : 151 -158; Ballas et al. (1989) Nucleic Acids Res. 17:7891-7903; and Joshi et al. (1987) Nucleic Acids Res. 15:9627-9639.
Where appropriate, the polynucleotides of interest may be optimized for increased expression in the transformed or targeted organism. For example, the polynucleotides can be synthesized or altered to use organism-preferred codons for improved expression.
Additional sequence modifications are known to enhance gene expression in a cellular host. These include elimination of sequences encoding spurious polyadenylation signals, exon-intron splice site signals, transposon-like repeats, and other such well-characterized sequences that may be deleterious to gene expression. The G-C content of the sequence may be adjusted to levels average for a given cellular host, as calculated by reference to known genes expressed in the host cell. When possible, the sequence is modified to avoid predicted hairpin secondary mRNA structures.
The expression cassettes may additionally contain 5' leader sequences. Such leader sequences can act to enhance translation or the level of RNA stability. Translation leaders are known in the art and include: picornavirus leaders, for example, EMCV leader (Encephalomyocarditis 5' noncoding region) (Elroy-Stein et al. (1989) Proc. Natl. Acad. Sci. USA 86:6126-6130); potyvirus leaders, for example, TEV leader (Tobacco Etch Virus) (Gallie et al. (1995) Gene 165(2):233-238), MDMV leader (Maize Dwarf Mosaic Virus) (Johnson et al. (1986) Virology 154:9-20), and human immunoglobulin heavy-chain binding protein (BiP) (Macejak et al. (1991 ) Nature 353:90-94); untranslated leader from the coat protein mRNA of alfalfa mosaic virus (AMV RNA 4) (Jobling et al. (1987) Nature 325:622-625); tobacco mosaic virus leader (TMV) (Gallie et al. (1989) in Molecular Biology of RNA, ed. Cech (Liss, New York), pp. 237-256); and maize chlorotic mottle virus leader (MCMV) (Lommel et al.
(1991 ) Virology 81 :382-385). See also, Della-Cioppa et al. (1987) Plant Physiol. 84:965-968. Other methods known to enhance translation can also be utilized, for example, introns, and the like.
In preparing the expression cassette, the various DNA fragments may be modified so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers may be employed to join the DNA fragments or other modifications may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be involved.
In some embodiments, a nucleotide sequence encoding a guide RNA and/or a Cas protein is operably linked to a control element, e.g., a transcriptional control element, such as a promoter. The transcriptional control element may be functional in either a eukaryotic cell or a prokaryotic cell.
Non-limiting examples of suitable prokaryotic promoters (promoters functional in a prokaryotic cell) and promoter sequence regions for use in the expression of genes, open reading frames (ORFs) thereof and/or variant sequences thereof in prokaryotic cells are generally known on one of skill in the art.
Non-limiting examples of suitable eukaryotic promoters (promoters functional in a eukaryotic cell) are generally known on one of skill in the art.
As used herein, “recombinant” refers to an artificial combination of two otherwise separated segments of sequence, e.g., by chemical synthesis or by the modification of isolated segments of nucleic acids by genetic engineering techniques. The term “recombinant,” when used in reference to a biological component or composition (e.g., a cell, nucleic acid, polypeptide/enzyme, vector, etc.) indicates that the biological component or composition is in a state that is not found in nature. In other words, the biological component or composition has been modified by human intervention from its natural state. For example, a recombinant cell encompasses a cell that expresses one or more genes that are not found in its native (i.e., non-recombinant) cell, a cell that expresses one or more native genes in an amount that is different than its native cell, and/or a cell that expresses one or more native genes under different conditions than its native cell. Recombinant nucleic acids may differ from a native sequence by one or more nucleotides, be
operably linked to heterologous sequences (e.g., a heterologous promoter, a sequence encoding a non-native or variant signal sequence, etc.), be devoid of intronic sequences, and/or be in an isolated form. Recombinant polypeptides/enzymes may differ from a native sequence by one or more amino acids, may be fused with heterologous sequences, may be truncated or have internal deletions of amino acids, may be expressed in a manner not found in a native cell (e.g., from a recombinant cell that over-expresses the polypeptide due to the presence in the cell of an expression vector encoding the polypeptide), and/or be in an isolated form. It is emphasized that in some embodiments, a recombinant polynucleotide or polypeptide/enzyme has a sequence that is identical to its wild-type counterpart but is in a non-native form (e.g., in an isolated or enriched form).
As used herein, "recombinant DNA " or “recombinant DNA construct” refers to a DNA sequence comprising at least one expression cassette comprising an artificial combination of nucleic acid fragments. The recombinant DNA construct can include 5' and 3' regulatory sequences operably linked to a polynucleotide of interest as disclosed herein. For example, a recombinant DNA construct may comprise regulatory sequences and coding sequences that are derived from different sources. Such a recombinant DNA construct may be used by itself or it may be used in conjunction with a vector, which is referred to herein as a circular recombinant DNA construct. The choice of vector is dependent upon the method that will be used to introduce the vector into the host cells as is well known to those skilled in the art.
For example, a plasmid vector can be used. The skilled artisan is well aware of the genetic elements that must be present on the vector in order to successfully transform, select and propagate host cells.
As used herein, a recombinant DNA construct can be a "linear recombinant DNA construct" referring to a recombinant DNA construct that is linear, and/or a "circular recombinant DNA construct" or “circular recombinant DNA” referring to a recombinant DNA construct that is circular. The term “circular recombinant DNA construct” includes a circular extra chromosomal element comprising autonomously replicating sequences, genome integrating sequences (such as but not limiting to single or multi-copy gene expression cassettes) , phage, or nucleotide sequences, derived from any source, or synthetic (/e. not occurring in nature), in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a polynucleotide of interest into a cell.
Standard recombinant DNA and molecular cloning techniques used herein are well known in the art and are described more fully in Sambrook et al., Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory: Cold Spring Harbor, NY (1989).
Target sequences (Target sites)
The terms “target sequence”, “target site”, “target site sequence, ’’target DNA”, “target locus”, “genome target site”, “genome target sequence”, “genomic target site”, “genomic target sequence”, “genomic target locus” and “protospacer”, are used interchangeably herein and refer to a polynucleotide sequence such as, but not limited to, a nucleotide sequence on a chromosome, episome, a transgenic locus, or any other DNA molecule in the genome (including chromosomal, plasmid DNA, or DNA modification templates introduced into the cell) of a cell, at which a guide polynucleotide/Cas endonuclease complex can recognize, bind to, and optionally nick or cleave all or part of the target sequence.
The target sequence includes a polynucleotide sequence in the genome of a microbial cell at which a Cas endonuclease cleavage is desired to promote a genome modification, e.g., homologous recombination with a DNA modification template. The context in which this term is used, however, can slightly alter its meaning. For example, the target sequence for a Cas endonuclease is generally very specific and can often be defined to the exact nucleotide sequence/position, whereas in some cases the target sequence for a desired genome modification can be defined more broadly than merely the site at which DNA cleavage occurs, e.g., a genome locus or region where homologous recombination is desired. Thus, in certain cases, the genome modification that occurs via the activity of Cas/guide RNA DNA cleavage is described as occurring “at or near” the target sequence
The target sequence can be an endogenous site in the genome of a cell, or alternatively, the target sequence can be heterologous to the cell and thereby not be naturally occurring in the genome of the cell, or the target sequence can be found in a heterologous genome location compared to where it occurs in nature. As used herein, terms “endogenous target sequence” and “native target sequence” are used interchangeable herein to refer to a target sequence that is endogenous or native to the genome of a cell and is at the endogenous or native position of that target
sequence in the genome of the cell. An “artificial target sequence” or “artificial target sequence” are used interchangeably herein and refer to a target sequence that has been introduced into the genome of a cell. Such an artificial target sequence can be identical in sequence to an endogenous or native target sequence in the genome of a cell but be located in a different position (/.e., a non-endogenous or non-native position) in the genome of a cell.
In one aspect the target sequence at which a guide polynucleotide/Cas endonuclease (RGEN) complex can recognize, bind to, and optionally nick or cleave all or part of the target sequence is referred to as a predetermined target sequence. A “predetermined target sequence” or “predetermined RGEN target sequence” described herein, refers to a target sequence that occurs only once in the genome of a microbial cell and has been identified (predetermined) to be the site in which a single selection marker is to be introduced through homologous recombination. This predetermined target sequence is different from any additional target sequence in the genome of the microbial cells in which a DNA modification is to be introduced.
As described herein, identifying a predetermined site to have a single marker introduced (e.g. a predetermined target sequence) which is different from any other target sequence where a DNA modification is desired, allows for the simultaneous introduction of introducing a selection marker in a predetermined target sequence of a microbial cell and simultaneously modifying at least one target sequence that is different from said predetermined DNA sequence in the genome of a microbial cell.
The choice of predetermined target sequences was guided by balanced GC content, the absence of repetitive sequences, distance to repetitive sequences such as telomeres, ORF tail-to-tail reading orientations, effectiveness of simultaneously consistent and high gene of interest (GOI) expression with low cell-to-cell variability, and the availability of unique and active CRISPR sites.
As used herein a “unique target sequence “ is a target sequence that is not found in the genome of a microbial cell that one wants to modify (such as for example target sequence [B] in Figure 1) and as such is different from the predetermined target sequence and from any additional target sequences described herein.
In one aspect the target sequence is a unique target sequence that is used for marker flanking in a DNA modification template, such as shown in Figure 1 , where
the unique marker [B] flanks a selection marker located on a DNA modification template.
In one aspect the target sequence is an additional target sequence that occurs only once in the genome of a microbial cell (and is different from the predetermined target sequence) and that is used for additional genome modifications (comodification) using RGEN and DNA medication templates (See also Figures 1-8 for examples of marker integration and excision while simultaneously modifying at least one additional target sequence. In one aspect, the modification at at least one additional target sequence is selected from the group consisting of an insertion of a polynucleotide of interest, a deletion of a polynucleotide of interest, a replacement of a polynucleotide of interest, and any one combination thereof.
An “altered target site”, “altered target sequence”, “modified target site”, “modified target sequence” are used interchangeably herein and refer to a target sequence as disclosed herein that comprises at least one alteration when compared to non-altered target sequence. Such “alterations” include, for example:
(i) replacement of at least one nucleotide, (ii) a deletion of at least one nucleotide, (iii) an integration of at least one nucleotide, or (iv) any combination of (i) - (iii).
The target sequence for a Cas endonuclease can be very specific and can often be defined to the exact nucleotide position, whereas in some cases the target sequence for a desired genome modification can be defined more broadly than merely the site at which DNA cleavage occurs, e.g., a genome locus or region that is to be deleted from the genome. Thus, in certain cases, the genome modification that occurs via the activity of Cas/guide RNA DNA cleavage is described as occurring “at or near” the target sequence.
Methods for “modifying a target sequence” and “altering a target sequence” are used interchangeably herein and refer to methods for producing an altered target sequence.
A variety of methods are available to identify those cells having an altered genome at or near a target sequence without using a screenable marker phenotype. Such methods can be viewed as directly analyzing a target sequence to detect any change in the target sequence, including but not limited to PCR methods, sequencing methods, nuclease digestion, Southern blots, and any combination thereof.
The length of the target DNA sequence (target sequence) can vary, and includes, for example, target sequences that are at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30 or more nucleotides in length. It is further possible that the target sequence can be palindromic, that is, the sequence on one strand reads the same in the opposite direction on the complementary strand. The nick/cleavage site can be within the target sequence or the nick/cleavage site could be outside of the target sequence. In another variation, the cleavage could occur at nucleotide positions immediately opposite each other to produce a blunt end cut or, in other cases, the incisions could be staggered to produce single-stranded overhangs, also called “sticky ends”, which can be either 5' overhangs, or 3' overhangs. Active variants of genome target sequences can also be used. Such active variants can comprise at least 65%, 70%, 75%, 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the given target sequence, wherein the active variants retain biological activity and hence are capable of being recognized and cleaved by a Cas endonuclease.
Assays to measure the single or double-strand break of a target sequence by an endonuclease are known in the art and generally measure the overall activity and specificity of the agent on DNA substrates containing recognition sites.
The target sequence selected by a user of the disclosed methods can be located within a region of a gene of interest selected from the group consisting of an open reading frame, a promoter, a regulatory sequence, a terminator sequence, a regulatory element sequence, a splice site, a coding sequence, a polyubiquitination site, an intron site, and an intron enhancing motif. Examples of genes of interest include genes encoding acetyl esterases, aminopeptidases, amylases, arabinases, arabinofuranosidases, carboxypeptidases, catalases, cellulases, chitinases, cutinase, deoxyribonucleases, epimerases, esterases, a-galactosidases, [3- galactosidases, a-glucanases, glucan lysases, endo- [3-glucanases, glucoamylases, glucose oxidases, a-glucosidases, [3-glucosidases, glucuronidases, hemicellulases, hexose oxidases, hydrolases, invertases, isomerases, laccases, lipases, lyases, mannosidases, oxidases, oxidoreductases, pectate lyases, pectin acetyl esterases, pectin depolymerases, pectin methyl esterases, pectinolytic enzymes, peroxidases, phenoloxidases, phytases, polygalacturonases, proteases, rhamno-galacturonases, ribonucleases, transferases, transport proteins, transglutaminases, xylanases, hexose oxidases, and combinations thereof. Target genes encoding regulatory
proteins such as transcription factors, repressors, proteins that modifies other proteins such as kinases, proteins involved in post-translational modification (e.g., glycosylation) can be subjected to Cas mediated engineering as well as genes involved in cell signaling, morphology, growth rate, and protein secretion. No limitation in this regard is intended.
Protospacer Adjacent Motif (PAM)
A “protospacer adjacent motif’ (PAM) herein refers to a short nucleotide sequence adjacent to a target sequence (protospacer) that is recognized (targeted) by a guide polynucleotide/Cas endonuclease (PGEN) system. The Cas endonuclease may not successfully recognize a target DNA sequence if the target DNA sequence is not followed by a PAM sequence. The sequence and length of a PAM herein can differ depending on the Cas protein or Cas protein complex used. The PAM sequence can be of any length but is typically 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides long.
A PAM herein is typically selected in view of the type of PGEN being employed. A PAM sequence herein may be one recognized by a PGEN comprising a Cas, such as the Cas9 variants described herein, derived from any of the species disclosed herein from which a Cas can be derived, for example. In certain embodiments, the PAM sequence may be one recognized by an RGEN comprising a Cas9 derived from S. pyogenes, S. thermophilus, S. agalactiae, N. meningitidis, T. denticola, or F. novicida. For example, a suitable Cas9 derived from S. pyogenes, Including the Cas9 Y155 variants described herein, could be used to target genome sequences having a PAM sequence of NGG; N can be A, C, T, or G). As other examples, a suitable Cas9 could be derived from any of the following species when targeting DNA sequences having the following PAM sequences: S. thermophilus (NNAGAA), S. agalactiae (NGG), NNAGAAW [W is A or T], NGGNG), N. meningitidis (NNNNGATT), T. denticola (NAAAAC), or F. novicida (NG) (where N’s in all these particular PAM sequences are A, C, T, or G). Other examples of Cas9/PAMs useful herein include those disclosed in Shah et al. (RNA Biology 10:891-899) and Esvelt et al. (Nature Methods 10:1116-1121 ), which are incorporated herein by reference.
DNA Modification Templates
The present disclosure includes methods and compositions for marker swapping in microbial cells. Specifically, this disclosure pertains to compositions and methods for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct, using uniquely designed DNA modification templates in combination with RGENs.
As used herein, the term “DNA modification template “ refers to a DNA sequence that comprises, at a minimum, a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region of a microbial cell referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region of a microbial cell referred to as the Downstream Genome Region (Downstream Genome Arm, DA) and wherein said DNA modification template in combination with an RNA-guided endonuclease (RGEN) can modify at least one genome target sequence in a microbial cell through homology directed repair (homologous recombination).
In some aspect, the DNA modification template further comprises a DNA sequence (referred to as donor DNA) located in between said UHA and DHA, wherein said DNA modification template in combination with an RNA-guided endonuclease (RGEN) can modify at least one additional genome target sequence in a microbial cell through homology directed repair (homologous recombination), wherein said modifications can be, but are not limited to, a DNA integration, a DNA deletion, a DNA replacement/substitution, or any one combination thereof.
In one aspect the “DNA modification template “ comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein said UHA and DHA flank a DNA sequence (referred to as a donor DNA), wherein said donor DNA comprises a first selection marker ([Marker-1]) flanked by an upstream target sequence ([B]) and an identical downstream target sequence
([B]) that is different from a predetermined target sequence ([A]) present in the genome of a microbial cell (such as but not limiting to Figure 1 [UHA-A]-[B]-[Marker-
1]-[B]-[DHA-A]).
In one aspect the “DNA modification template “ comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein said UHA and DHA flank a DNA sequence (referred to as a donor DNA), wherein said donor DNA comprises a first selection marker ([Marker-1]) flanked by a unique upstream target sequence ([B1 ]) and a different but unique downstream target sequence ([B2]) that are different from a predetermined target sequence ([A]) present in the genome of a microbial cell (such as but not limiting to [UHA-A]-[ Ba]- [Marker-1]-[B ]-[DHA-A]).
In one aspect the “DNA modification template “ comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein said UHA and DHA flank a DNA sequence (referred to as a donor DNA), wherein said donor DNA comprises a second selection marker ([Marker-2]) flanked by an upstream target sequence ([C]) and an identical downstream target sequence ([C]) that is different from a predetermined target sequence ([A]) present in the genome of a microbial cell (such as but not limiting to Figure 1 [UHA-A]-[C]-[Marker-
2]-[C]-[DHA-A]).
In one aspect the “DNA modification template “ comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein said UHA and DHA flank a DNA sequence (referred to as a donor DNA),
wherein said donor DNA comprises a second selection marker ([Marker-2]) flanked by a unique upstream target sequence ([C1]) and a different but unique downstream target sequence ([C2]) that are different from a predetermined target sequence ([A]) present in the genome of a microbial cell (such as but not limiting to [UHA-A]-[Ca]- [Marker-1]-[CP]-[DHA-A]).
As described herein, the first selection marker ([Marker-1]) can be replaced (swapped) by a second selection marker ([Marker2]) using a DNA modification template in combination with an RGEN.
In one aspect the “DNA modification template “ comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein said upstream and downstream homology region flank a DNA sequence (DNA template), wherein said DNA modification template in combination with an RNA-guided endonuclease (RGEN) can result in homologous recombination (HDR) of said DNA template with a target region in the genome of a microbial cell, wherein said homologous recombination results in a genome medication selected from the group consisting of a DNA integration, a DNA deletion, a DNA replacement/substitution, or any one combination thereof.
In one aspect, the “DNA modification template “ comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein the DNA modification template further comprises a DNA sequence (referred to as donor DNA) located in between said UHA and DHA, wherein said donor DNA comprises a DNA sequence to be inserted into said genome (such as but not limiting to Figure 1 [UHA-M]-[insert]-[DHA-M]).
In one aspect, the “DNA modification template “ comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is
homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein the DNA modification template further comprises a DNA sequence (referred to as donor DNA) located in between said UHA and DHA, wherein said donor DNA comprises a first polynucleotide of interest (Insert) that upon integration into the genome will replace a second said genome (see also Figure 5-6).
In one aspect the “DNA modification template “ comprises a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA) wherein said upstream and downstream homology region flank a DNA sequence to be deleted from said genome. The DNA sequence to be deleted from the microbial genome, can comprise a polynucleotide of interest by itself, or comprise a polynucleotide of interest flanked by at least one target sequence that can be recognized by at least one RGEN.
In one aspect, the nucleotide sequence of interest to be integrated into the microbial genome is selected from the group consisting of a polynucleotide of interest, a selection marker, a selection marker DNA flanked by target sequence DNA, a DNA sequence capable of self-excising, a gene of interest, a transcriptional regulatory sequence, a translational regulatory sequence, a promoter sequence, a terminator sequence, a transgenic nucleic acid sequence, an antisense sequence complementary to at least a portion of the messenger RNA, a heterologous sequence, or any one combination thereof.
In another aspect the “DNA modification template “ comprises a DNA sequence flanked by a first region of homology (referred to as Upstream Homology Arm, UHA) and a second region of homology (referred to as Downstream Homology Arm, DHA), wherein said UHA is homologous to a genome region referred to as the Upstream Genome Region (Upstream Genome Arm, UA), and the DHA is homologous to a genome region referred to as the Downstream Genome Region (Downstream Genome Arm, DA), wherein the DNA sequence comprises at least one nucleotide modification when compared to a genome nucleotide sequence to be
edited. A nucleotide modification can be at least one nucleotide substitution, addition or deletion. Use of such a DNA modification template with an RGEN based HDR method described herein, results in the editing of a genome DNA sequence
In one aspect, the homology arms of the present disclosure (UHA, DHA), flanking a double stranded DNA sequence, include about between 1001 base pairs (bps) and 2000 bps; between 2000 bps and 3000 bps; between 2000 bps and 4000 bps; between 2000 bps and 5000 bps; between 2000 bps and 6000 bps, between 3000 bps and 4000 bps; between 3000 bps and 5000 bps; between 3000 bps and 6000 bps, between 4000 bps and 5000 bps; between 4000 bps and 6000 bps, between 5000 bps and up to 6000 bps.
In some embodiments, the 5' and 3' ends of a gene of interest are flanked by a homology arm wherein the homology arm comprises nucleic acid sequences immediately flanking the targeted genome locus of the microbial cell.
Selection markers for marker swapping by replacing a first selection marker construct integrated at a predetermined target seguence of a microbial cell with a second selection marker construct
The present disclosure includes methods and compositions for selection marker swapping in a microbial cell by replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct using DNA modification templates in combination with RGENs.
Disclosed herein are replaceable selection marker constructs comprising a selection marker (shown as [Marker-1] or [Marker-2] in Figures) flanked by a unique RGEN target sequences (say for example target sequence [B] or [C], see Figures, wherein said construct is part of a DNA modification template comprising homologous recombination arms. Use of these DNA modification templates together with the specific RGENs that can recognize and cleave the RGEN target sequence allows for the replacement of a first selection marker construct ( [B]-[Marker-1]-[B]) with a second selection marker ([C]-[Marker-2]-[C]) at a predetermined target sequence of a microbial cell. Furthermore, one can also construct selection marker constructs comprising a selection marker (shown as [Marker 1] or [Marker2] in Figures) flanked by unique but different RGEN target sequences (say for example target sequences [Ba] and [B|3] flanking a selection marker gene [ see for example
construct [Ba]-[Marker-1]-[B[3], or [Ca] and [C|3] flanking a selection marker gene [ see for example construct [Ca]-[Marker-2]-[C[3], wherein said construct is part of a DNA modification template comprising homologous recombination arms. Use of these DNA modification templates together with the specific RGENs that can recognize and cleave the RGEN target sequence allows for the replacement of a first selection marker construct ([Ba]-[Marker1 ]-[ B|3]) with a second selection marker ([Ca]-[Marker2]-[C C|3 at a predetermined target sequence of a microbial cell.
Examples of such selection markers include, but are not limited to pyr4 (Smith et al., Curr Genet 1991 , 19(1 ):27-33), pyr2 (Jorgensen et al., 2014, Microbial Cell Factories, 13(1 )33), hph (Mach et al., Curr. Genet., 1994,25(6):567-570), amdS (Penttila et al., Gene, 1987, (2): 155-164), alS (W02008039370A1 ; Ouedraogo et al., Appl. Microbial. Biotechnol., 2015, 99(23): 10083-95)
In one embodiment of the disclosure, the method comprises a method for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct, the method comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]-[Marker-2]-[C]) comprising a second selection marker ([Marker2]) flanked by a second unique target sequence ([C]), wherein said first RGEN in combination with said first DNA modification template enables the replacement of said first selection marker construct with said second selection marker construct via homologous recombination; and, c) identifying one or more microbial cells from (b) that has said second selection marker construct integrated at said predetermined target sequence.
In one embodiment of the disclosure, the method comprises a method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell, the method comprising: a) providing one or more microbial cells having a second selection marker construct ([C]-[Marker-2]-[C]) integrated at a predetermined target sequence ([A]), wherein said second selection marker construct comprises a second selection marker ([Marker-2]) flanked by a first
unique target sequence ([C]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]- [B]) comprising a first selection marker ([Marker-1]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the replacement of said second selection marker construct with said first selection marker construct via homologous recombination, thereby reestablishing said first selection marker construct at said predetermined target sequence; and, c) identifying one or more microbial cells from (b) that has said first selection marker construct reestablished at said predetermined target sequence.
Selection markers for marker swapping by replacing a first selection marker construct integrated at a predetermined target seguence of a microbial cell with a second selection marker construct while simultaneously modifying at least one additional target seguence in a microbial genome.
The present disclosure further includes methods and compositions in which the selection marker swapping system described herein is combined with simultaneously modifying at least one additional target sequence at a different genome target sequence.
More specifically, the methods and compositions employ homologous recombination-based selection marker swapping at a predetermined target sequence of a microbial cell while simultaneously modifying at least one target sequence that is different from said predetermined DNA sequence in the genome of a microbial cell using RNA-guided endonucleases (RGENs) mediated and DNA modification template based methods.
The term “ marker swapping” refers to a process of integrating a (first) selection marker at a predetermined target sequence in the genome of a microbial cell which is later replaced (swapped) by a second selection marker at the site where the first marker was integrated, template based methods. The term “ marker swapping” also refers to a process of replacing (swapping) a selection marker integrated at a predetermined target sequence in the genome of a microbial cell with a second selection marker at the site where the first marker was integrated.
In one embodiment of the disclosure, the method comprises a method for replacing a first selection marker construct integrated at a predetermined target
sequence of a microbial cell with a second selection marker construct while simultaneously modifying at least one additional target sequence, the method comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]- [Marker-2]-[C]) comprising a second selection marker ([Marker-2]) flanked by a second unique target sequence ([C]), wherein said first RGEN in combination with said first DNA modification template enables the replacement of said first selection marker construct with said second selection marker construct via homologous recombination; c) simultaneously with step (b), introducing into the microbial cells of (a) a modification at said at least one additional target sequence; and, d) identifying one or more microbial cells from (c) that has said second selection marker construct replacing said first marker construct, and that has said modification at said at least one additional target sequence.
In one embodiment of the disclosure, the method comprises a A method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell, the method comprising: a) providing one or more microbial cells having a second selection marker construct (referred to as [C]- [Marker2]-[C]) integrated at a predetermined target sequence ([A]), wherein said a second selection marker construct comprises a second selection marker ([Marker2]) flanked by a first unique target sequence ([C]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]- [B]) comprising a first selection marker ([Marked ]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the replacement of said second selection marker construct with said first selection marker construct via homologous recombination, thereby reestablishing said first selection marker construct at said predetermined target sequence; c) simultaneously with step (b), introducing into the microbial cells of (a) a
modification at said at least one additional target sequence; and, d) identifying one or more microbial cells from (c) that has said first selection marker construct reestablished at said predetermined target sequence and that has said modification at said at least one additional target sequence.
In one aspect, the modification at said at least one additional target sequence is selected from the group consisting of an insertion of a polynucleotide of interest, a deletion of a polynucleotide of interest, a replacement of a polynucleotide of interest, and any one combination thereof.
In one aspect, the microbial cells of (a) have at least one additional target sequence ([M]), and simultaneously introducing a modification comprises introducing at least a second RGEN (RGEN-M) and at least a second DNA modification template ([UHA-M]-[lnsert]-[DHA-M]) comprising a polynucleotide of interest ([Insert]), wherein said second RGEN in combination with said second DNA modification template enables the integration of said polynucleotide of interest at said at least one additional target sequence ([M]).
In one aspect, the microbial cells of (a) have at least a first additional target sequence [(Ma)] and a second additional target sequence p(M[3)] flanking a polynucleotide of interest to be deleted, and wherein said simultaneously introducing a modification comprises introducing at least a second RGEN (RGEN-Ma), a third RGEN (RGEN-M[3) and at least a third DNA modification template ([UHA-D]-[DHA- D]) comprising an Upstream Homology Arm ([UHA-D]) directly linked to Downstream Homology Arm ([DHA-D]), wherein said UHA-D and DHA-D are homologous to a genomic region of said microbial cell flanking said polynucleotide sequence of interest to be deleted, wherein said third RGEN-Ma and fourth RGEN-M[3 in combination with said third DNA modification template enables the deletion of said polynucleotide of interest.
In one aspect, the microbial cells of (a) have at least a first additional target sequence (Ma) and a second additional target sequence (M|3) flanking a first polynucleotide of interest to be replaced, and wherein said simultaneously introducing a modification comprises introducing at least a third RNA-guided endonuclease (RGEN-Ma), a fourth RNA guided endonuclease (RGEN-M[3) and at least a third DNA modification template ([UHA-M]-[lnsert]-[DHA-M]) comprising a second polynucleotide of interest, wherein said RGEN-Ma and RGEN-M[3 in combination with said third DNA modification template enables the replacement of
said first polynucleotide sequence of interest with said second polynucleotide of interest
Definitions
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present compositions and methods apply.
An “allele” or “allelic variant” is one of several alternative forms of a gene occupying a given locus on a chromosome. When all the alleles present at a given locus on a chromosome are the same, that organism is homozygous at that locus. If the alleles present at a given locus on a chromosome differ, that organism is heterozygous at that locus. An allelic variant of a polypeptide is a polypeptide encoded by an allelic variant of a gene.
As used herein, “host cell” refers to a cell that has the capacity to act as a host or expression vehicle for a newly introduced DNA sequence. Thus, in certain embodiments of the disclosure, the host cells are microbial cells.
The term “cell” herein refers to any type of cell such as a prokaryotic or eukaryotic cell. A eukaryotic cell has a nucleus and other membrane-enclosed structures (organelles), whereas a prokaryotic cell lacks a nucleus.
A microbial cell herein can refer to a fungal cell (e.g., yeast cell), prokaryotic cell, protist cell (e.g., algal cell), euglenoid cell, stramenopile cell, or oomycete cell, for example. A prokaryotic cell herein can refer to a bacterial cell or archaeal cell, for example. Fungal cells (e.g., yeast cells), protist cells (e.g., algal cells), euglenoid cells, stramenopile cells, and oomycete cells represent examples of eukaryotic microbial cells. A eukaryotic microbial cell has a nucleus and other membrane- enclosed structures (organelles), whereas a prokaryotic cell lacks a nucleus.
Fungal cells that find use in the subject methods can be filamentous fungal cell species. “Fungal cell”, “fungi”, “fungal host cell”, and the like, as used herein includes the phyla Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota (as defined by Hawksworth et al., In, Ainsworth and Bisby's Dictionary of The Fungi, 8th edition, 1995, CAB International, University Press, Cambridge, UK) as well as the Oomycota (as cited in Hawksworth et al., supra) and all mitosporic fungi (Hawksworth et al., supra). In certain embodiments, the fungal host cell is a yeast
cell, where by “yeast” is meant ascosporogenous yeast (Endomycetales), basidiosporogenous yeast, and yeast belonging to the Fungi Imperfecti (Blastomycetes). As such, a yeast host cell includes a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell. Species of yeast include, but are not limited to, the following: Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviform is, Kluyveromyces lactis, and Yarrowia lipolytica cell.
The term “filamentous fungal cell” includes all filamentous forms of the subdivision Eumycotina or Pezizomycotina. Suitable cells of filamentous fungal genera include, but are not limited to, cells of Acremonium, Aspergillus, Chrysosporium, Corynascus, Chaetomium, Fusarium, Gibberella, Humicola, Magnaporthe, Myceliophthora, Neurospora, Paecilomyces, Penicillium, Scytaldium, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Hypocrea, and Trichoderma.
Suitable cells of filamentous fungal species include, but are not limited to, cells of Aspergillus awamori, Aspergillus fumigatus, Aspergillus foetidus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Chrysosporium lucknowense, Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium sambucinum, Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium sulphureum, Fusarium torulosum, Fusarium trichothecioides, Fusarium venenatum, Humicola insolens, Humicola lanuginosa, Hypocrea jecorina, Myceliophthora thermophila, Neurospora crassa, Neurospora intermedia, Penicillium purpurogenum, Penicillium canescens, Penicillium solitum, Penicillium funiculosum Phanerochaete chrysosporium, Talaromyces flavus, Thielavia terrestris, Trichoderma harzianum, Trichoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei, and Trichoderma viride.
The term “yeast” herein refers to fungal species that predominantly exist in unicellular form. Yeast can alternatively be referred to as “yeast cells”. A yeast herein can be characterized as either a conventional yeast or non-conventional yeast, for example.
The term “conventional yeast” (“model yeast”) herein generally refers to Saccharomyces or Schizosaccharomyces yeast species. Conventional yeast in
certain embodiments are yeast that favor homologous recombination (HR) DNA repair processes over repair processes mediated by non-homologous end-joining (NHEJ).
The term “non-conventional yeast” herein refers to any yeast that is not a Saccharomyces or Schizosaccharomyces yeast species. Non-conventional yeast are described in Non-Conventional Yeasts in Genetics, Biochemistry and Biotechnology: Practical Protocols (K. Wolf, K.D. Breunig, G. Barth, Eds., Springer- Verlag, Berlin, Germany, 2003) and Spencer et al. (Appl. Microbiol. Biotechnol. 58:147-156), which are incorporated herein by reference. Non-conventional yeast in certain embodiments may additionally (or alternatively) be yeast that favor NHEJ DNA repair processes over repair processes mediated by HR. Definition of a non- conventional yeast along these lines - preference of NHEJ over HR - is further disclosed by Chen et al. (PLoS ONE 8:e57952), which is incorporated herein by reference. Preferred non-conventional yeast herein are those of the genus Yarrowia (e.g., Yarrowia Hpolytica).
A "recombinant host cell" (also referred to as a "genetically modified host cell") is a host cell into which has been introduced a heterologous nucleic acid, e.g., a recombinant DNA construct, or which has been introduced and comprises a genome modification system such as the guide RNA/Cas endonuclease system described herein. For example, a subject microbial host cell includes a genetically modified microbial cell by virtue of introduction into a suitable microbial cell of an exogenous nucleic acid (e.g., a plasmid or circular recombinant DNA construct).
As defined herein, a “parental cell” or a “parental (host) cell” may be used interchangeably and refer to “unmodified” parental cells. For example, a “parental” cell refers to any cell or strain of microorganism in which the genome of the “parental” cell is altered (e.g., via one or more mutations/modifications introduced into the parental cell) to generate a modified “daughter” cell thereof.
As used herein, a “modified cell” or a “modified (host) cell” may be used interchangeably and refer to recombinant (host) cells that comprise at least one genetic modification which is not present in the “parental” host cell from which the modified cells are derived.
As used herein, a “genome region” or “genomic region” is a segment of a chromosome in the genome of a cell. In one aspect the genome region is present on either side of the target sequence or, alternatively, also comprises a portion of the
target sequence. The genome region can comprise at least 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5-40, 5-45, 5- 50, 5-55, 5-60, 5-65, 5- 70, 5-75, 5-80, 5-85, 5-90, 5-95, 5- 100, 5-200, 5-300, 5-400, 5-500, 5-600, 5-700, 5-800, 5-900, 5-1000, 5-1100, 5- 1200, 5-1300, 5-1400, 5-1500, 5-1600, 5-1700, 5-1800, 5-1900, 5-2000, 5-2100, 5- 2200, 5-2300, 5-2400, 5-2500, 5-2600, 5-2700, 5-2800. 5-2900, 5-3000, 5-3100 or more bases such that the genome region has sufficient homology to undergo homologous recombination with the corresponding region of homology.
The structural similarity between a given genome region and the corresponding region of homology found on the DNA modification template can be any degree of sequence identity that allows for homologous recombination to occur. For example, the amount of homology or sequence identity shared by the “region of homology” of the DNA modification template and the “genome region” of the organism genome can be at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81 %, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity, such that the sequences undergo homologous recombination
The region of homology on the DNA modification template can have homology to any sequence flanking the target sequence. While in some instances the regions of homology share significant sequence homology to the genome sequence immediately flanking the target sequence, it is recognized that the regions of homology can be designed to have sufficient homology to regions that may be further 5' or 3' to the target sequence. The regions of homology can also have homology with a fragment of the target sequence along with downstream genome regions.
In one embodiment, the first region of homology further comprises a first fragment of the target sequence and the second region of homology comprises a second fragment of the target sequence, wherein the first and second fragments are dissimilar.
In one aspect, the DNA modification template sequence comprises an upstream homology arm (HR1) and a downstream homology arm (HR2), wherein each homology arm is greater than 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600, 3700, 3800, 3900, 4000, 5000 and up to
6000 nucleotides in length and comprises sequence homology to said target sequence on the genome of the microbial cell.
As used herein, “homologous recombination” includes the exchange of DNA fragments between two DNA molecules at the sites of homology. The frequency of homologous recombination is influenced by a number of factors. Different organisms vary with respect to the amount of homologous recombination and the relative proportion of homologous to non-homologous recombination. The length of the homology region (homology arm) needed to observe homologous recombination varies among organisms.
Homologous recombination has also been in many organisms. For example, at least 150-200 bp of homology was required for homologous recombination in the parasitic protozoan Leishmania (Papadopoulou and Dumas, (1997) Nucleic Acids Res 25:4278-86) and 150-200bp of homology is required for efficient recombination in the protobacterium E coli (Lovett et al (2002) Genetics 160:851-859). In Bacillus cells homology lengths of as little as 70bp can be involved in homologous recombination but homology arm lengths of 25bp cannot (Kahsanov FK et al Mol Gen Genetics (1992) 234:494-497).
Homology-directed repair (HDR) is a mechanism in cells to repair doublestranded and single stranded DNA breaks. Homology-directed repair includes homologous recombination (HR) and single-strand annealing (SSA) (Lieber. 2010 Annu. Rev. Biochem. 79:181-211 ). The most common form of HDR is called homologous recombination (HR), which has the longest sequence homology requirements between the donor and acceptor DNA. Other forms of HDR include single-stranded annealing (SSA) and breakage-induced replication, and these require shorter sequence homology relative to HR. Homology-directed repair at nicks (single-stranded breaks) can occur via a mechanism distinct from HDR at doublestrand breaks (Davis and Maizels. PNAS (0027-8424), 111 (10), p. E924-E932).
By “homology” is meant DNA sequences that are similar. For example, a “region of homology to a genome region” that is found on the DNA modification template is a region of DNA that has a similar sequence to a given “genome region” in the cell or organism genome. A region of homology can be of any length that is sufficient to promote homologous recombination at the cleaved target sequence. For example, the region of homology can comprise at least 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5-40, 5-45, 5- 50, 5-55, 5-60, 5-65, 5- 70, 5-75, 5-80, 5-85, 5-90, 5-95, 5-100,
5-200, 5-300, 5-400, 5-500, 5-600, 5-700, 5-800, 5-900, 5-1000, 5-1100, 5-1200, 5- 1300, 5-1400, 5-1500, 5-1600, 5-1700, 5-1800, 5-1900, 5-2000, 5-2100, 5-2200, 5- 2300, 5-2400, 5-2500, 5-2600, 5-2700, 5-2800, 5-2900, 5-3000, 5-3100 or more bases in length such that the region of homology has sufficient homology to undergo homologous recombination with the corresponding genome region. “Sufficient homology” indicates that two polynucleotide sequences have sufficient structural similarity to act as substrates for a homologous recombination reaction. The structural similarity includes overall length of each polynucleotide fragment, as well as the sequence similarity of the polynucleotides. Sequence similarity can be described by the percent sequence identity over the whole length of the sequences, and/or by conserved regions comprising localized similarities such as contiguous nucleotides having 100% sequence identity, and percent sequence identity over a portion of the length of the sequences.
The amount of homology or sequence identity shared by a target and a DNA modification template can vary and includes total lengths and/or regions having unit integral values in the ranges of about 1-20 bp, 20-50 bp, 50-100 bp, 75-150 bp, 100- 250 bp, 150-300 bp, 200-400 bp, 250-500 bp, 300-600 bp, 350-750 bp, 400-800 bp, 450-900 bp, 500-1000 bp, 600-1250 bp, 700-1500 bp, 800-1750 bp, 900-2000 bp, 1- 2.5 kb, 1 .5-3 kb, 2-4 kb, 2.5-5 kb, 3-6 kb, 3.5-7 kb, 4-8 kb, 5-10 kb, or up to and including the total length of the target sequence. These ranges include every integer within the range, for example, the range of 1-20 bp includes 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19 and 20 bps. The amount of homology can also be described by percent sequence identity over the full aligned length of the two polynucleotides which includes percent sequence identity of about at least 50%, 55%, 60%, 65%, 70%, 71 %, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81 %, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%. Sufficient homology includes any combination of polynucleotide length, global percent sequence identity, and optionally conserved regions of contiguous nucleotides or local percent sequence identity, for example sufficient homology can be described as a region of 75-150 bp having at least 80% sequence identity to a region of the target locus. Sufficient homology can also be described by the predicted ability of two polynucleotides to specifically hybridize under high stringency conditions, see, for example, Sambrook et al., (1989) Molecular Cloning: A Laboratory Manual, (Cold Spring Harbor Laboratory Press,
NY); Current Protocols in Molecular Biology, Ausubel et al., Eds (1994) Current Protocols, (Greene Publishing Associates, Inc. and John Wiley & Sons, Inc.); and, Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology- Hybridization with Nucleic Acid Probes, (Elsevier, New York).
The term “increased” as used herein may refer to a quantity or activity that is at least 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 100%, or at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13,14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390,400, 410, 420,430, 440, 440, 450, 460, 470, 480, 490, or 500 fold more than the quantity or activity for which the increased quantity or activity is being compared. The terms “increased”, “greater than”, and “improved” are used interchangeably herein. The term “increased” can be used to characterize the transformation or gene engineering efficiency obtained by a multicomponent method described herein when compared to a control method described herein,
As used herein, the term “integration efficiency” is defined by diving the number of transformed cells having the desired gene of interest integrated into its genome by the total number of transformed cells. This number can be multiplied by 100 to express it as a %.
Integration efficiency (%) = (number of transformed cells having gene of interest integrated in its genome /number of total transformed cells) * 100
The term “conserved domain” or “motif” means a set of amino acids conserved at specific positions along an aligned sequence of evolutionarily related proteins. While amino acids at other positions can vary between homologous proteins, amino acids that are highly conserved at specific positions indicate amino acids that are essential to the structure, the stability, or the activity of a protein. Because they are identified by their high degree of conservation in aligned sequences of a family of protein homologues, they can be used as identifiers, or “signatures”, to determine if a protein with a newly determined sequence belongs to a previously identified protein family.
The terms “knock-in”, “gene knock-in, “gene integration” and “genetic knock- in” are used interchangeably herein. A knock-in represents the replacement or
integration of a DNA sequence at a specific DNA sequence in cell by targeting with a Cas protein (for example by homologous recombination (HR), wherein a suitable donor DNA polynucleotide is also used). Examples of knock-ins are a specific integration of a heterologous amino acid coding sequence in a coding region of a gene, or a specific integration of a transcriptional regulatory element in a genetic locus.
As used herein, “nucleic acid” means a polynucleotide and includes a single or a double-stranded polymer of deoxyribonucleotide or ribonucleotide bases. Nucleic acids may also include fragments and modified nucleotides. Thus, the terms “polynucleotide”, “nucleic acid sequence”, “nucleotide sequence” and “nucleic acid fragment” are used interchangeably to denote a polymer of RNA and/or DNA and/or RNA-DNA that is single- or double-stranded, optionally containing synthetic, nonnatural, or altered nucleotide bases. Nucleotides (usually found in their 5’- monophosphate form) are referred to by their single letter designation as follows: “[A]” for adenosine or deoxyadenosine (for RNA or DNA, respectively), “C” for cytosine or deoxycytosine, “G” for guanosine or deoxyguanosine, “U” for undine, “T” for deoxythymidine, “R” for purines (A or G), “Y” for pyrimidines (C or T), “K” for G or T, “H” for A or C or T, “[I]” for inosine, and “N” for any nucleotide (nucleotide (e.g., N can be A, C, T, or G, if referring to a DNA sequence; N can be A, C, U, or G, if referring to an RNA sequence).
It is understood that the polynucleotides (or nucleic acid molecules) described herein include “genes”, “vectors” and “plasmids”.
The term “gene” refers to a polynucleotide that codes for a functional molecule such as, but not limited to, a particular sequence of amino acids, which comprise all, or part of a protein coding sequence, and may include regulatory (non-transcribed) sequences, such as promoter sequences, which determine for example the conditions under which the gene is expressed. The transcribed region of the gene may include untranslated regions (UTRs), including introns, 5'-untranslated regions (UTRs), and 3'-UTRs, as well as the coding sequence. “Native gene” refers to a gene as found in nature with its own regulatory sequences.
A “codon-modified gene” or “codon-preferred gene” or “codon-optimized gene” is a gene having its frequency of codon usage designed to mimic the frequency of preferred codon usage of the host cell. The nucleic acid changes made to codon- optimize a gene are “synonymous”, meaning that they do not alter the amino acid
sequence of the encoded polypeptide of the parent gene. However, both native and variant genes can be codon-optimized for a particular host cell, and as such no limitation in this regard is intended. Methods are available in the art for synthesizing codon-preferred genes. See, for example, U.S. Patent Nos. 5,380,831 , and 5,436,391 , and Murray et al. (1989) Nucleic Acids Res. 17:477-498, herein incorporated by reference.
Additional sequence modifications are known to enhance gene expression in a host organism. These include, for example, elimination of: one or more sequences encoding spurious polyadenylation signals, one or more exon-intron splice site signals, one or more transposon-like repeats, and other such well-characterized sequences that may be deleterious to gene expression. The G-C content of the sequence may be adjusted to levels average for a given host organism, as calculated by reference to known genes expressed in the host cell. When possible, the sequence is modified to avoid one or more predicted hairpin secondary mRNA structures.
As used herein, the term “coding sequence” refers to a nucleotide sequence, which directly specifies the amino acid sequence of its (encoded) protein product. The boundaries of the coding sequence are generally determined by an open reading frame (hereinafter, “ORF”), which usually begins with an ATG start codon. The coding sequence typically includes DNA, cDNA, and recombinant nucleotide sequences.
As defined herein, the term “open reading frame” (hereinafter, “ORF”) means a nucleic acid or nucleic acid sequence (whether naturally occurring, non-naturally occurring, or synthetic) comprising an uninterrupted reading frame consisting of (i) an initiation codon, (ii) a series of two (2) or more codons representing amino acids, and (iii) a termination codon, the ORF being read (or translated) in the 5' to 3' direction.
The term “chromosomal integration” as used herein refers to a process where a polynucleotide of interest is integrated into a microbial chromosome. The homology arms of the DNA modification template will align with homologous regions of the microbial chromosome. Subsequently, the sequence between the homology arms is replaced by the polynucleotide of interest in a double crossover (i.e. , homologous recombination).
“Regulatory sequences” refer to nucleotide sequences located upstream (5’ non-coding sequences), within, or downstream (3’ non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include, but are not limited to, promoters, translation leader sequences, 5’ untranslated sequences, 3’ untranslated sequences, introns, polyadenylation target sequences, RNA processing sites, effector binding sites, and stem-loop structures.
The term “promoter” as used herein refers to a nucleic acid sequence capable of controlling the expression of a coding sequence or functional RNA. In general, a coding sequence is located 3' (downstream) to a promoter sequence. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic nucleic acid segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different cell types, or at different stages of development, or in response to different environmental or physiological conditions. Promoters which cause a gene to be expressed in most cell types at most times are commonly referred to as “constitutive promoters”. It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of different lengths may have identical promoter activity.
"Operably linked" is intended to mean a functional linkage between two or more elements. For example, an operable linkage between a polynucleotide of interest and a regulatory sequence (e.g., a promoter) is a functional link that allows for expression of the polynucleotide of interest (i.e., the polynucleotide of interest is under transcriptional control of the promoter). Operably linked elements may be contiguous or non-contiguous. Coding sequences (e.g., an ORF) can be operably linked to regulatory sequences in sense or antisense orientation. When used to refer to the joining of two protein coding regions, by operably linked is intended that the coding regions are in the same reading frame.
A nucleic acid is “operably linked” when it is placed into a functional relationship with another nucleic acid sequence. For example, DNA encoding a secretory leader (i.e., a signal peptide), is operably linked to DNA for a polypeptide if it is expressed as a pre-protein that participates in the secretion of the polypeptide; a promoter or enhancer is operably linked to a coding sequence if it affects the
transcription of the sequence; or a ribosome binding site is operably linked to a coding sequence if it is positioned so as to facilitate translation. Generally, “operably linked” means that the DNA sequences being linked are contiguous, and, in the case of a secretory leader, contiguous and in reading phase. However, enhancers do not have to be contiguous. Linking is accomplished by ligation at convenient restriction sites. If such sites do not exist, the synthetic oligonucleotide adaptors or linkers are used in accordance with conventional practice.
As used herein, “a functional promoter sequence controlling the expression of a gene of interest (or open reading frame thereof) linked to the gene of interest’s protein coding sequence” refers to a promoter sequence which controls the transcription and translation of the coding sequence in Bacillus. For example, in certain embodiments, the present disclosure is directed to a polynucleotide comprising a 5' promoter (or 5' promoter region, or tandem 5' promoters and the like), wherein the promoter region is operably linked to a nucleic acid sequence encoding a protein of interest. Thus, in certain embodiments, a functional promoter sequence controls the expression of a gene of interest encoding a protein of interest. In other embodiments, a functional promoter sequence controls the expression of a heterologous gene or an endogenous gene encoding a protein of interest in a microbial cell.
The promoter sequence consists of proximal and more distal upstream elements, the latter elements often referred to as enhancers. An “enhancer” is a DNA sequence that can stimulate promoter activity, and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissuespecificity of a promoter.
The recombinant DNAs (such as, but not limiting to, DNA modification templates) disclosed herein can be introduced into a microbial cell using any method known in the art.
As defined herein, the term “introducing”, as used in phrases such as “introducing into a microbial cell” or “introducing into a microbial cell” at least one recombinant DNA, polynucleotide, or a gene thereof, or a vector thereof, includes methods known in the art for introducing polynucleotides into a cell, including, but not limited to protoplast fusion, natural or artificial transformation (e.g., calcium chloride, electroporation, heat shock), transduction, transfection, conjugation and the like (e.g., see Ferrari et al., 1989).
"Introducing" is intended to mean presenting to the organism, such as a cell or organism, DNAs disclosed herein (such as but not limiting to a DNA modification template, a donor DNA, a recombinant DNA construct/expression construct), in such a manner that the component(s) gains access to the interior of a cell of the organism or to the cell itself. The methods and compositions do not depend on a particular method for introducing a sequence into an organism or cell, only that DNAs disclosed herein gains access to the interior of at least one cell of the organism. Introducing includes reference to the incorporation of a nucleic acid into a microbial cell where the nucleic acid may be incorporated (integrated) into the genome of the cell, and includes reference to the transient (direct) provision of a nucleic acid to the cell.
Methods for introducing polynucleotides, expression cassettes, recombinant DNA into cells or organisms are known in the art including, but not limited to, natural competence (as described in WO2017/075195, W02002/14490 and WO2008/7989), microinjection Crossway et al., (1986) Biotechniques 4:320-34 and U.S. Patent No. 6,300,543), meristem transformation (U.S. Patent No. 5,736,369), electroporation (Riggs et al., (1986) Proc. Natl. Acad. Sci. USA 83:5602-6), stable transformation methods, transient transformation methods, ballistic particle acceleration (particle bombardment) (U.S. Patent Nos. 4,945,050; 5,879,918; 5,886,244; 5,932,782), whiskers mediated transformation (Ainley et al. 2013, Plant Biotechnology Journal 11 :1126-1134; Shaheen A. and M. Arshad 2011 Properties and Applications of Silicon Carbide (2011 ), 345-358 Editor(s): Gerhardt, Rosario. Publisher: InTech, Rijeka, Croatia. CODEN: 69PQBP; ISBN: 978-953-307-201-2), Agrobacterium- mediated transformation (U.S. Patent Nos. 5,563,055 and 5,981 ,840), direct gene transfer (Paszkowski et al., (1984) EMBO J 3:2717-22), viral-mediated introduction (U.S. Patent Nos. 5,889,191 , 5,889,190, 5,866,785, 5,589,367 and 5,316,931 ), transfection, transduction, cell-penetrating peptides, mesoporous silica nanoparticle (MSN)-mediated direct protein delivery, topical applications, sexual crossing , sexual breeding, and any combination thereof. Stable transformation is intended to mean that the nucleotide construct introduced into an organism integrates into a genome of the organism and is capable of being inherited by the progeny thereof. Transient transformation is intended to mean that a polynucleotide is introduced (directly or indirectly) into the organism and does not integrate into a genome of the organism or
a polypeptide is introduced into an organism. Transient transformation indicates that the introduced composition is only temporarily expressed or present in the organism.
By “introduced transiently”, “transiently introduced”, “transient introduction”, “transiently express” and the like is meant that a biomolecule is introduced into a host cell (or a population of host cells) in a non-permanent manner. With respect to double stranded DNA, transient introduction includes situations in which the introduced DNA does not integrate into the chromosome of the host cell and thus is not transmitted to all daughter cells during growth as well as situations in which an introduced DNA molecule that may have integrated into the chromosome is removed at a desired time using any convenient method (e.g., employing a cre-lox system, by removing positive selective pressure for an episomal DNA construct, by promoting looping out of all or part of the integrated polynucleotide from the chromosome using a selection media, etc.). No limitation in this regard is intended.
A variety of methods are available for identifying those cells with integration into the genome at or near to the target sequence. Such methods can be viewed as directly analyzing a target sequence to detect any change in the target sequence, including but not limited to PCR methods, sequencing methods, nuclease digestion, Southern blots, and any combination thereof. See, for example, US Patent Application 12/147,834, herein incorporated by reference to the extent necessary for the methods described herein. The method also comprises recovering an organism from the cell comprising a polynucleotide of interest integrated into its genome.
The term “genome” or a microbial (host) cell “genome includes not only chromosomal DNA found within the nucleus, but organelle DNA found within subcellular components of the cell (extrachromosomal DNA).
As used herein, the terms “plasmid”, “vector” and “cassette” refer to extrachromosomal elements, often carrying genes which are typically not part of the central metabolism of the cell, and usually in the form of double-stranded DNA molecules. Such elements may be autonomously replicating sequences, genome integrating sequences, phage or nucleotide sequences, linear or circular, of a singlestranded or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a promoter fragment and DNA sequence for a selected gene product along with appropriate 3' untranslated sequence into a cell.
The term “vector” includes any nucleic acid that can be replicated (propagated) in cells and can carry new genes or DNA segments into cells. Vectors include viruses, bacteriophage, pro-viruses, plasmids, phagemids, transposons, and artificial chromosomes such as BACs (bacterial artificial chromosomes), and the like, that are “episomes” (/.e. , replicate autonomously or can integrate into a chromosome of a host organism).
The term “expression cassette” and “expression vector” refer to a nucleic acid construct generated recombinantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular nucleic acid in a cell. The recombinant expression cassette can be incorporated into a plasmid, chromosome, mitochondrial DNA, plastid DNA, virus, or nucleic acid fragment. Typically, the recombinant expression cassette portion of an expression vector includes, among other sequences, a nucleic acid sequence to be transcribed and a promoter. In some embodiments, DNA constructs also include a series of specified nucleic acid elements that permit transcription of a particular nucleic acid in a target cell. In certain embodiments, a DNA construct of the disclosure comprises a selective marker and an inactivating chromosomal or gene or DNA segment as defined herein. Many prokaryotic expression vectors are commercially available and know to one skilled in the art. Selection of appropriate expression vectors is within the knowledge of one skilled in the art.
As used herein, a “targeting vector” is a vector that includes polynucleotide sequences that are homologous to a region in the chromosome of a host cell into which the targeting vector is transformed and that can drive homologous recombination at that region. For example, targeting vectors find use in introducing mutations into the chromosome of a host cell through homologous recombination. In some embodiments, the targeting vector comprises other non-homologous sequences, e.g., added to the ends (/.e., stuffer sequences or flanking sequences). The ends can be closed such that the targeting vector forms a closed circle, such as, for example, integration into a vector. Selection and/or construction of appropriate vectors is well within the knowledge of those having skill in the art.
As used herein, the term “plasmid” refers to a circular double-stranded (ds) DNA construct used as a cloning vector, and which forms an extrachromosomal selfreplicating genetic element in many bacteria and some eukaryotes. In some embodiments, plasmids become incorporated into the genome of the host cell.
Polynucleotides of interest are further described herein and include polynucleotides reflective of the commercial markets and interests of those involved in the production of enzymes (such as, but not limiting to, through fermentation of bacteria thereby producing the enzymes.
A polynucleotide of interest can code for one or more proteins of interest. It can have other biological functions. The polynucleotide of interest may or may not already be present in the genome of the host cell to be transformed, i.e. , either a homologous or heterologous sequence.
Nucleotides of interest may comprise antisense sequences complementary to at least a portion of the messenger RNA (mRNA) for a targeted gene sequence of interest. Antisense nucleotides are constructed to hybridize with the corresponding mRNA. Modifications of the antisense sequences may be made as long as the sequences hybridize to and interfere with expression of the corresponding mRNA. In this manner, antisense constructions having 70%, 80%, or 85% sequence identity to the corresponding antisense sequences may be used. Furthermore, portions of the antisense nucleotides may be used to disrupt the expression of the target gene. Generally, sequences of at least 50 nucleotides, 100 nucleotides, 200 nucleotides, or greater may be used.
In addition, the polynucleotide of interest may also be used in the sense orientation to suppress the expression of endogenous genes in organisms. Methods for suppressing gene expression in organisms using polynucleotides in the sense orientation are known in the art. The methods generally involve transforming an organism with a DNA construct comprising a promoter that drives expression in an organism operably linked to at least a portion of a nucleotide sequence that corresponds to the transcript of the endogenous gene. Typically, such a nucleotide sequence has substantial sequence identity to the sequence of the transcript of the endogenous gene, generally greater than about 65% sequence identity, about 85% sequence identity, or greater than about 95% sequence identity. See, U.S. Patent Nos. 5,283,184 and 5,034,323; herein incorporated by reference.
A phenotypic marker is a screenable or a selection marker that includes visual markers and selection markers whether it is a positive or negative selection marker. Any phenotypic marker can be used. Specifically, a selection or screenable marker comprises a DNA segment that allows one to identify, or select for or against a molecule or a cell that contains it, often under particular conditions. These markers
can encode an activity, such as, but not limited to, production of RNA, peptide, or protein, or can provide a binding site for RNA, peptides, proteins, inorganic and organic compounds or compositions and the like.
The term “selection marker”, “selectable marker” and “selection markerencoding nucleotide sequence” refers to a nucleotide sequence which is capable of expression in (host) cells and where expression of the selection marker confers to cells containing the expressed gene the ability to grow in the presence of a corresponding selective agent or lack of an essential nutrient. In one aspect the selective marker refers to a nucleic acid (e.g., a gene) capable of expression in host cell which allows for ease of selection of those hosts containing the vector
Examples of such selection markers include, but are not limited to pyr4 (Smith et al., Curr Genet 1991 , 19(1 ):27-33), pyr2 (Jorgensen et al., 2014, Microbial Cell Factories, 13(1 )33), hph (Mach et al., Curr. Genet., 1994,25(6):567-570), amdS (Penttila et al., Gene, 1987, (2): 155-164), alS (W02008039370A1 ; Ouedraogo et al., Appl. Microbial. Biotechnol., 2015, 99(23): 10083-95).
The term “selection marker” includes genes that provide an indication that a host cell has taken up an incoming DNA of interest or some other reaction has occurred. Typically, selection markers are genes that confer antimicrobial resistance or a metabolic advantage on the host cell to allow cells containing the exogenous DNA to be distinguished from cells that have not received any exogenous sequence during the transformation.
A “residing selection marker” is one that is located on the chromosome of the microorganism to be transformed. A residing selection marker encodes a gene that is different from the selection marker on the transforming DNA construct. Selective markers are well known to those of skill in the art. As indicated above, the marker can be an antimicrobial resistance marker (e.g., ampR, phleoR, specR, kanR, eryR, tetR, cmpR and neoR (see e.g., Guerot-Fleury, 1995; Palmeros et al., 2000; and Trieu-Cuot et al., 1983). In some embodiments, the present invention provides a chloramphenicol resistance gene (e.g., the gene present on pC194, as well as the resistance gene present in the Bacillus licheniformis genome). This resistance gene is particularly useful in the present invention, as well as in embodiments involving chromosomal amplification of chromosomally integrated cassettes and integrative plasmids (See e.g., Albertini and Galizzi, 1985; Stahl and Ferrari, 1984). Other markers useful in accordance with the invention include, but are not limited to
auxotrophic markers, such as serine, lysine, tryptophan; and detection markers, such as [3-galactosidase.
Polynucleotides of interest includes genes that can be stacked or used in combination with other traits.
As used herein, the terms “polypeptide” and “protein” are used interchangeably, and refer to polymers of any length comprising amino acid residues linked by peptide bonds. The conventional one (1 ) letter or three (3) letter codes for amino acid residues are used herein. The polypeptide may be linear or branched, it may comprise modified amino acids, and it may be interrupted by non-amino acids. The term polypeptide also encompasses an amino acid polymer that has been modified naturally or by intervention; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation or modification, such as conjugation with a labeling component. Also included within the definition are, for example, polypeptides containing one or more analogs of an amino acid (including, for example, unnatural amino acids, etc.), as well as other modifications known in the art.
The term “protein of interest” or “POI” refers to a polypeptide of interest that is desired to be expressed in a modified Bacillus (daughter) cell. Thus, as used herein, a POI may be an enzyme, a substrate-binding protein, a surface-active protein, a structural protein, a receptor protein, an antibody and the like
As used herein, a “gene of interest” or “GOI” refers a nucleic acid sequence (e.g., a polynucleotide, a gene or an ORF) which encodes a POI. A “gene of interest” encoding a “protein of interest” may be a naturally occurring gene, a mutated gene or a synthetic gene.
In certain embodiments, a gene of interest of the instant disclosure encodes a commercially relevant industrial protein of interest, such as an enzyme (e.g., a acetyl esterases, aminopeptidases, amylases, arabinases, arabinofuranosidases, carbonic anhydrases, carboxypeptidases, catalases, cellulases, chitinases, chymosins, cutinases, deoxyribonucleases, epimerases, esterases, a-galactosidases, [3- galactosidases, a-glucanases, glucan lysases, endo-[3-glucanases, glucoamylases, glucose oxidases, a- glucosidases, [3-glucosidases, glucuronidases, glycosyl hydrolases, hemicellulases, hexose oxidases, hydrolases, invertases, isomerases, laccases, lipases, lyases, mannosidases, oxidases, oxidoreductases, pectate lyases, pectin acetyl esterases, pectin depolymerases, pectin methyl esterases, pectinolytic
enzymes, perhydrolases, polyol oxidases, peroxidases, phenoloxidases, phytases, polygalacturonases, proteases, peptidases, rhamno-galacturonases, ribonucleases, transferases, transport proteins, transglutaminases, xylanases, hexose oxidases, and combinations thereof).
A “mutation” refers to any change or alteration in a nucleic acid sequence. Several types of mutations exist, including point mutations, deletion mutations, silent mutations, frame shift mutations, splicing mutations and the like. Mutations may be performed specifically (e.g., via site directed mutagenesis) or randomly (e.g., via chemical agents, passage through repair minus bacterial strains).
A “mutated gene” is a gene that has been altered through human intervention. Such a “mutated gene” has a sequence that differs from the sequence of the corresponding non-mutated gene by at least one nucleotide addition, deletion, or substitution. In certain embodiments of the disclosure, the mutated gene comprises an alteration that results from a guide polynucleotide/Cas protein system as disclosed herein. A mutated cell or organism is a cell or organism comprising a mutated gene.
As used herein, a “targeted mutation” is a mutation in a gene (referred to as the target gene), including a native gene, that was made by altering a target sequence within the target gene using any method known to one skilled in the art, including a method involving a guided Cas protein system. Where the Cas protein is a cas endonuclease, a guide polynucleotide/Cas endonuclease induced targeted mutation can occur in a nucleotide sequence that is located within or outside a genome target sequence that is recognized and cleaved by the Cas endonuclease.
As used herein, in the context of a polypeptide or a sequence thereof, the term “substitution” means the replacement (i.e., substitution) of one amino acid with another amino acid.
As defined herein, an “endogenous gene” refers to a gene in its natural location in the genome of an organism.
As used herein, "heterologous" in reference to a polynucleotide or polypeptide sequence is a sequence that originates from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genome locus by deliberate human intervention. For example, a promoter operably linked to a heterologous polynucleotide is from a species different from the species from which the polynucleotide was derived, or, if from the same/analogous species, one
or both are substantially modified from their original form and/or genome locus, or the promoter is not the native promoter for the operably linked polynucleotide. As used herein, unless otherwise specified, a chimeric polynucleotide comprises a coding sequence operably linked to a transcription initiation region that is heterologous to the coding sequence.
As defined herein, a “heterologous” gene, a “non-endogenous” gene, or a “foreign” gene refer to a gene (or ORF) not normally found in the host organism, but that is introduced into the host organism by gene transfer. As used herein, the term “foreign” gene(s) comprise native genes (or ORFs) inserted into a non-native organism and/or chimeric genes inserted into a native or non-native organism.
As defined herein, a “heterologous” nucleic acid construct or a “heterologous” nucleic acid sequence has a portion of the sequence which is not native to the cell in which it is expressed.
As defined herein, a “heterologous control sequence”, refers to a gene expression control sequence (e.g., a promoter or enhancer) which does not function in nature to regulate (control) the expression of the gene of interest. Generally, heterologous nucleic acid sequences are not endogenous (native) to the cell, or a part of the genome in which they are present, and have been added to the cell, by infection, transfection, transformation, microinjection, electroporation, and the like. A “heterologous” nucleic acid construct may contain a control sequence/DNA coding (ORF) sequence combination that is the same as, or different, from a control sequence/DNA coding sequence combination found in the native host cell.
As used herein, the terms “signal sequence” and “signal peptide” refer to a sequence of amino acid residues that may participate in the secretion or direct transport of a mature protein or precursor form of a protein. The signal sequence is typically located N-terminal to the precursor or mature protein sequence. The signal sequence may be endogenous or exogenous. A signal sequence is normally absent from the mature protein. A signal sequence is typically cleaved from the protein by a signal peptidase after the protein is transported.
The term “derived” encompasses the terms “originated” “obtained,” “obtainable,” and “created,” and generally indicates that one specified material or composition finds its origin in another specified material or composition, or has features that can be described with reference to the another specified material or composition.
As used herein, a “flanking sequence” refers to any sequence that is either upstream or downstream of the sequence being discussed (e.g., for genes A-B-C, gene B is flanked by the A and C gene sequences). In certain embodiments, the incoming sequence is flanked by a homology arm on each side. In some embodiments, a flanking sequence is present on only a single side (either 3' or 5'), while in other embodiments, it is on each side of the sequence being flanked. The sequence of each homology arm is homologous to a sequence in the host cell genome (such as the microbial chromosome).
As used herein, the term “stuffer sequence” refers to any extra DNA that flanks homology arms (typically vector sequences). However, the term encompasses any non- homologous DNA sequence. Not to be limited by any theory, a stuffer sequence provides a non-critical target for a cell to initiate DNA uptake.
Sequence identity” or “identity” in the context of nucleic acid or polypeptide sequences refers to the nucleic acid bases or amino acid residues in two sequences that are the same when aligned for maximum correspondence over a specified comparison window.
The term “percentage of sequence identity” refers to the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions or deletions (i.e. , gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the results by 100 to yield the percentage of sequence identity. Useful examples of percent sequence identities include, but are not limited to, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%, or any integer percentage from 50% to 100%. These identities can be determined using any of the programs described herein.
Sequence alignments and percent identity or similarity calculations may be determined using a variety of comparison methods designed to detect homologous
sequences including, but not limited to, the MegAlign™ program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, Wl). Within the context of this application it will be understood that where sequence analysis software is used for analysis, that the results of the analysis will be based on the “default values” of the program referenced, unless otherwise specified. As used herein “default values” will mean any set of values or parameters that originally load with the software when first initialized.
The “Clustal V method of alignment” corresponds to the alignment method labeled Clustal V (described by Higgins and Sharp, (1989) CABIOS 5:151-153; Higgins et al., (1992) Comput Appl Biosci 8:189-191 ) and found in the MegAlign™ program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, Wl). For multiple alignments, the default values correspond to GAP PENALTY=10 and GAP LENGTH PENALTY=10. Default parameters for pairwise alignments and calculation of percent identity of protein sequences using the Clustal method are KTUPLE=1 , GAP PENALTY=3, WIND0W=5 and DIAGONALS SAVED=5. For nucleic acids these parameters are KTUPLE=2, GAP PENALTY=5, WIND0W=4 and DIAGONALS SAVED=4. After alignment of the sequences using the Clustal V program, it is possible to obtain a “percent identity” by viewing the “sequence distances” table in the same program.
The “Clustal W method of alignment” corresponds to the alignment method labeled Clustal W (described by Higgins and Sharp, (1989) CABIOS 5:151-153; Higgins et al., (1992) Comput Appl Biosci 8:189-191 ) and found in the MegAlign™ v6.1 program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, Wl). Default parameters for multiple alignment (GAP PENALTY=10, GAP LENGTH PENALTY=0.2, Delay Divergen Seqs (%)=30, DNA Transition Weight=0.5, Protein Weight Matrix=Gonnet Series, DNA Weight Matrix=IUB). After alignment of the sequences using the Clustal W program, it is possible to obtain a “percent identity” by viewing the “sequence distances” table in the same program.
Unless otherwise stated, sequence identity/similarity values provided herein refer to the value obtained using GAP Version 10 (GCG, Accelrys, San Diego, CA) using the following parameters: % identity and % similarity for a nucleotide sequence using a gap creation penalty weight of 50 and a gap length extension penalty weight of 3, and the nwsgapdna.cmp scoring matrix; % identity and % similarity for an
amino acid sequence using a GAP creation penalty weight of 8 and a gap length extension penalty of 2, and the BLOSUM62 scoring matrix (Henikoff and Henikoff, (1989) Proc. Natl. Acad. Sci. USA 89:10915). GAP uses the algorithm of Needleman and Wunsch, (1970) J Mol Biol 48.443-53, to find an alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. GAP considers all possible alignments and gap positions and creates the alignment with the largest number of matched bases and the fewest gaps, using a gap creation penalty and a gap extension penalty in units of matched bases.
“BLAST” is a searching algorithm provided by the National Center for Biotechnology Information (NCBI) used to find regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches to identify sequences having sufficient similarity to a query sequence such that the similarity would not be predicted to have occurred randomly. BLAST reports the identified sequences and their local alignment to the query sequence.
It is well understood by one skilled in the art that many levels of sequence identity are useful in identifying polypeptides from other species or modified naturally or synthetically wherein such polypeptides have the same or similar function or activity. Useful examples of percent identities include, but are not limited to, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%, or any integer percentage from 50% to 100%. Indeed, any integer amino acid identity from 50% to 100% may be useful in describing the present disclosure, such as 51 %, 52%, 53%, 54%, 55%,
56%, 57%, 58%, 59%, 60%, 61 %, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%,
70%, 71 %, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81 %, 82%, 83%,
84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%,
98% or 99%.
“Translation leader sequence” refers to a polynucleotide sequence located between the promoter sequence of a gene and the coding sequence. The translation leader sequence is present in the mRNA upstream of the translation start sequence. The translation leader sequence may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency. Examples of translation leader sequences have been described (e.g., Turner and Foster, (1995) Mol Biotechnol 3:225-236).
“3’ non-coding sequences”, “transcription terminator” or “termination sequences” refer to DNA sequences located downstream of a coding sequence and include polyadenylation recognition sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression. The polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3’ end of the mRNA precursor. The use of different 3’ non-coding sequences is exemplified by Ingelbrecht et al., (1989) Plant Cell 1 :671- 680.
As used herein, “RNA transcript” refers to the product resulting from RNA polymerase-catalyzed transcription of a DNA sequence. When the RNA transcript is a perfect complimentary copy of the DNA sequence, it is referred to as the primary transcript or pre-mRNA. A RNA transcript is referred to as the mature RNA or mRNA when it is a RNA sequence derived from post-transcriptional processing of the primary transcript pre-mRNA. “Messenger RNA” or “mRNA” refers to the RNA that is without introns and that can be translated into protein by the cell. “cDNA” refers to a DNA that is complementary to, and synthesized from, an mRNA template using the enzyme reverse transcriptase. The cDNA can be single-stranded or converted into double-stranded form using the Klenow fragment of DNA polymerase I. “Sense” RNA refers to RNA transcript that includes the mRNA and can be translated into protein within a cell or in vitro. “Antisense RNA” refers to an RNA transcript that is complementary to all or part of a target primary transcript or mRNA, and that blocks the expression of a target gene (see, e.g., U.S. Patent No. 5,107,065). The complementarity of an antisense RNA may be with any part of the specific gene transcript, i.e. , at the 5’ non-coding sequence, 3’ non-coding sequence, introns, or the coding sequence. “Functional RNA” refers to antisense RNA, ribozyme RNA, or other RNA that may not be translated but yet has an effect on cellular processes. The terms “complement” and “reverse complement” are used interchangeably herein with respect to mRNA transcripts, and are meant to define the antisense RNA of the message.
“Mature” protein refers to a post-translationally processed polypeptide (i.e., one from which any pre- or propeptides present in the primary translation product have been removed). “Precursor” protein refers to the primary product of translation of mRNA (i.e., with pre- and propeptides still present). Pre- and propeptides may be but are not limited to intracellular localization signals.
Proteins may be altered in various ways including amino acid substitutions, deletions, truncations, and integrations. Methods for such modifications are generally known. For example, amino acid sequence variants of the protein(s) can be prepared by mutations in the DNA. Methods for mutagenesis and nucleotide sequence alterations include, for example, Kunkel, (1985) Proc. Natl. Acad. Sci. USA 82:488-92; Kunkel et al., (1987) Meth E nzy mol 154:367 -82; U.S. Patent No. 4,873,192; Walker and Gaastra, eds. (1983) Techniques in Molecular Biology (MacMillan Publishing Company, New York) and the references cited therein. Guidance regarding amino acid substitutions not likely to affect biological activity of the protein is found, for example, in the model of Dayhoff et al., (1978) Atlas of Protein Sequence and Structure (Natl Biomed Res Found, Washington, D.C.). Conservative substitutions, such as exchanging one amino acid with another having similar properties, may be preferable. Conservative deletions, integrations, and amino acid substitutions are not expected to produce radical changes in the characteristics of the protein, and the effect of any substitution, deletion, integration, or combination thereof can be evaluated by routine screening assays. Assays for double-strand-break-inducing activity are known and generally measure the overall activity and specificity of the agent on DNA substrates containing target sequences.
Standard DNA isolation, purification, molecular cloning, vector construction, and verification/characterization methods are well established, see, for example Sambrook et al., (1989) Molecular Cloning: A Laboratory Manual, (Cold Spring Harbor Laboratory Press, NY). Vectors and constructs include circular plasmids, and linear polynucleotides, comprising a polynucleotide of interest and optionally other components including linkers, adapters, regulatory or analysis. In some examples a recognition site and/or target sequence can be contained within an intron, coding sequence, 5' UTRs, 3' UTRs, and/or regulatory regions.
The meaning of abbreviations is as follows: “sec” means second(s), “min” means minute(s), “h” means hour(s), “d” means day(s), “pL” means microliter(s), “mL” means milliliter(s), “L” means liter(s), “pM” means micromolar, “mM” means millimolar, “M” means molar, “mmol” means millimole(s), “pmole” mean micromole(s), “g” means gram(s), “pg” means microgram(s), “ng” means nanogram(s), “U” means unit(s), “bp” means base pair(s) and “kb” means kilobase(s).
Non-limiting examples of compositions and methods disclosed herein are as follows:
1 . A method for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct, the method comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]-[Marker-2]-[C]) comprising a second selection marker ([Marker2]) flanked by a second unique target sequence ([C]), wherein said first RGEN in combination with said first DNA modification template enables the replacement of said first selection marker construct with said second selection marker construct via homologous recombination; and, c) identifying one or more microbial cells from (b) that has said second selection marker construct integrated at said predetermined target sequence.
2. A method for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct while simultaneously modifying at least one additional target sequence, the method comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]-[Marker-2]-[C]) comprising a second selection marker ([Marker-2]) flanked by a second unique target sequence ([C]), wherein said first RGEN in combination with said first DNA modification template enables the replacement of said first selection marker construct with said second selection marker construct via homologous recombination;
c) simultaneously with step (b), introducing into the microbial cells of (a) a modification at said at least one additional target sequence; and, d) identifying one or more microbial cells from (c) that has said second selection marker construct replacing said first marker construct, and that has said modification at said at least one additional target sequence.
3. A method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell, the method comprising: a) providing one or more microbial cells having a second selection marker construct ([C]-[Marker-2]-[C]) integrated at a predetermined target sequence ([A]), wherein said second selection marker construct comprises a second selection marker ([Marker-2]) flanked by a first unique target sequence ([C]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]-[B]) comprising a first selection marker ([Marker-1]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the replacement of said second selection marker construct with said first selection marker construct via homologous recombination, thereby reestablishing said first selection marker construct at said predetermined target sequence; and, c) identifying one or more microbial cells from (b) that has said first selection marker construct reestablished at said predetermined target sequence.
4. A method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell, the method comprising: a) providing one or more microbial cells having a second selection marker construct (referred to as [C]-[Marker2]-[C]) integrated at a predetermined target sequence ([A]), wherein said a second selection marker construct comprises a second selection marker ([Marker2]) flanked by a first unique target sequence ([C]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]-[B]) comprising a first selection marker ([Marked ]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the replacement of said second selection marker construct with said first selection
marker construct via homologous recombination, thereby reestablishing said first selection marker construct at said predetermined target sequence; c) simultaneously with step (b), introducing into the microbial cells of (a) a modification at said at least one additional target sequence; and, d) identifying one or more microbial cells from (c) that has said first selection marker construct reestablished at said predetermined target sequence and that has said modification at said at least one additional target sequence.
5. The method of embodiment 2 or embodiment 4, wherein said modification at said at least one additional target sequence is selected from the group consisting of an insertion of a polynucleotide of interest, a deletion of a polynucleotide of interest, a replacement of a polynucleotide of interest, and any one combination thereof.
6. The method of embodiments 2 or embodiment 4, wherein said simultaneously introducing a modification comprises introducing at least a second RGEN (RGEN-M) and at least a second DNA modification template ([UHA-M]-[lnsert]-[DHA-M]) comprising a polynucleotide of interest ([Insert]), wherein said second RGEN in combination with said second DNA modification template enables the integration of said polynucleotide of interest at said at least one additional target sequence ([M]).
7. The method of embodiment 2 or embodiment 4, wherein the microbial cells of (a) have at least a first additional target sequence [(Ma)] and a second additional target sequence p(M[3)] flanking a polynucleotide of interest to be deleted, and wherein said simultaneously introducing a modification comprises introducing at least a second RGEN (RGEN-Ma), a third RGEN (RGEN-M[3) and at least a third DNA modification template ([UHA-D]-[DHA-D]) comprising an Upstream Homology Arm ([UHA-D]) directly linked to Downstream Homology Arm ([DHA-D]), wherein said UHA-D and DHA-D are homologous to a genomic region of said microbial cell flanking said polynucleotide sequence of interest to be deleted, wherein said third RGEN-Ma and fourth RGEN-M[3 in combination with said third DNA modification template enables the deletion of said polynucleotide of interest.
8. The method of embodiments 2 or embodiment 4, wherein the microbial cells of (a) have at least a first additional target sequence (Ma) and a second additional target sequence (M|3) flanking a first polynucleotide of interest to be replaced, and
wherein said simultaneously introducing a modification comprises introducing at least a third RNA-guided endonuclease (RGEN-Ma), a fourth RNA guided endonuclease (RGEN-M[3) and at least a third DNA modification template ([UHA-M]- [lnsert]-[DHA-M]) comprising a second polynucleotide of interest, wherein said RGEN-Ma and RGEN-M[3 in combination with said third DNA modification template enables the replacement of said first polynucleotide sequence of interest with said second polynucleotide of interest
EXAMPLES
The disclosed disclosure is further defined in the following Examples. It should be understood that these Examples, while indicating certain preferred aspects of the disclosure, are given by way of illustration only. From the above discussion and these Examples, one skilled in the art can ascertain the essential characteristics of this disclosure, and without departing from the spirit and scope thereof, can make various changes and modifications of the disclosure to adapt it to various uses and conditions.
EXAMPLE 1
Inserting a stop codon into Trichoderma reesei ade2 while replacing a removable pyr2 selection marker construct with a removable hph selection marker construct at a predetermined genome sequence
This example discloses the usage of two selection markers in a marker swapping system while simultaneously editing a gene of interest (GOI). In a first genome modification assay (Figure 1 , 1st step), the first marker is integrated at a predetermined genome sequence. In a subsequent genome modification assay (Figure 1 , 2nd step), the previously integrated marker is excised again and replaced by a second marker while simultaneously inserting a polynucleotide sequence into a GOI at another genome sequence (multiplex genome engineering). The method described herein allows for iterative rounds of multiplex genome engineering using the same marker swapping system.
RGENs and DNA modification templates were designed to replace a previously integrated removable construct comprising the marker pyr2 (Jorgensen et al., Microbial Cell Factories 2014) with a removable construct comprising the marker hph (Mach et al., Curr Genet 1994) at a predetermined target sequence within the
genome of Trichoderma reesei QM6a. In parallel, ade2 is edited, coding for a phosphoribosyl aminoimidazole carboxylase necessary for the synthesis of purines (Jorgensen et al., Microbial Cell Factories 2014). The modification of ade2 was designed to insert a stop codon, giving rise to a ade2 phenotype showing red- colored colonies when supplementing with adenine (due to the accumulation and polymerization of the purine precursor 5-aminoimidazole ribonucleotide).
Figure 1 illustrates RGENs, DNA modification templates and the genome composition before and after editing, in this case RGEN-A targeting a predetermined target sequence [A], allowing for homologous recombination with the DNA modification template [UHA-A]-[B]-[Marker-1HBHDHA-A] (SEQ ID NO: 1), harboring the pyr2 cassette [Marker-1 ] flanked by the unique target sequence [B] and framed by the upstream and downstream homology arms [UHA-A] and [DHA-A] (1st step).
In a subsequent genome modification assay (2nd step), the pyr2 cassette [Marker-1] is excised again by targeting the two previously introduced flanking target sequences [B] with RGEN-B, allowing for homologous recombination with the DNA modification template [UHA-A]-[C]-[Marker-2HC]-[DHA-A] (SEQ ID NO: 2), harboring the hph cassette [Marker-2] flanked by the unique target sequences [C] and framed by the upstream and downstream homology arms [UHA-A] and [DHA-A]; simultaneously, the additional target sequence [M] within the coding sequence of ade2 (SEQ ID NO: 3) is targeted by RGEN-M, resulting in the in-frame insertion of a stop codon via homologous recombination with the DNA modification template [UHA- M]-[lnsert]-[DHA-M] (SEQ ID NO: 4), harboring the polynucleotide sequence of interest to be inserted into the genome [Insert] framed by the upstream and downstream homology arms [UHA-M] and [DHA-M],
Table 1 illustrates targeting DNA sequences of RGENs, including their respective PAM sequences, in this case target sequence [A] (SEQ ID NO: 5) for RGEN-A, cutting T. reesei QM6a chromosome 3 after bp position 3,895,292 (Li et al., Biotechnology for Biofuels 2017), framed by the upstream arm [IIA-A] (QM6a chromosome 2, bp position 3,894,161 - 3,895,222) and the downstream arm [DA-A] (QM6a chromosome 2, bp position 3,895,298 - 3,896,303); target sequence [B] (SEQ ID NO: 6) for RGEN-B, cutting the pyr2 flanking sequences within DNA modification template [UHA-A]-[B]-[Marker-1]-[B]-[DHA-A]; target sequence [C] (SEQ ID NO: 7) for RGEN-C, cutting the hph flanking sequences within DNA modification template [UHA-A]-[C]-[Marker-2]-[C]-[DHA-A]; target sequence [M] (SEQ ID NO: 8) for RGEN-M, cutting T. reesei QM6a chromosome 3 after bp position 5,160,514 (within the coding sequence of ade2), framed by the upstream arm [IIA-M] (QM6a chromosome 3, bp position 5,161 ,515 - 5,160,515) and the downstream arm [DA-M] (QM6a chromosome 2, bp position 5,160,514 - 5,159,506) (inverse reading orientations). All RGENs consist of in vitro assemblies of commercially available S. pyogenes Cas9 (NEB: EnGen® Spy Cas9 NLS) together with synthetic sgRNA (Biolegio: Synthego).
Table 2 illustrates oligonucleotides used to assemble subcloning vectors and to PCR-amplify DNA modification templates. All vectors are based on pUC18 (Yanisch-Perron et al., Gene 1985), and vector construction was carried out via seamless assembly (Thermo Fisher Scientific: GeneArt™ Seamless Cloning and Assembly Kit) and subcloning in E. coli (Thermo Fisher Scientific: One Shot™ TOP10 Chemically Competent E. coli). The DNA modification template [UHA-A]-[B]- [Marker-1]-[B]-[DHA-A] (SEQ ID NO: 1) was PCR-amplified using the oligonucleotide pair RAS210/RAS213 (SEQ ID NO: 9/10) from a vector constructed by assembling the PCR products RAS210/RAS415X (SEQ ID NO:9/11 ), RAS301X/RAS414 (SEQ ID NO: 12/13) and RAS303/RAS213 (SEQ ID NO: 14/10) amplified from QM6a genomic DNA together with RAS234/RAS233 (SEQ ID NO: 15/16) amplified from pUC18. The DNA modification template [UHA-A]-[C]-[Marker-
2]— [C]— [DHA-A] (SEQ ID NO: 2) was PCR-amplified using the oligonucleotide pair RAS210/RAS213 (SEQ ID NO: 9/10), from a vector constructed by assembling the PCR products RAS210/RAS417X (SEQ ID NO: 9/17) and RAS303/RAS213 (SEQ ID NO: 14/10) amplified from QM6a genomic DNA together with RAS304X/RAS416 (SEQ ID NO: 18/19) amplified from a synthetic construct together with RAS234/RAS233 (SEQ ID NO: 15/16) amplified from pUC18. The DNA modification template [UHA-M]-[lnsert]-[DHA-M] (SEQ ID NO: 4) was PCR-amplified using the oligonucleotide pair RAS532/RAS535 (SEQ ID NQ:20/21 ), from a vector constructed by assembling the PCR products RAS531/RAS533 (SEQ ID NO: 22/23) and RAS534/RAS536 (SEQ ID NO: 24/25) amplified from QM6a genomic DNA together with RAS537/RAS538 (SEQ ID NO: 26/27) amplified from pUC18.
T. reesei QM6a genome editing was carried out via protoplast transformation according to standard protocol (Penttila et al., Gene 1987). In a volume of 150 pL, approximately 5*10A6 protoplasts, 10 pmol per RGEN, and 0.2, 0.5 or 1 pmol per DNA modification template were used.
For the 1st step, protoplasts from a pyr2-auxotrophic background were used, together with RGEN-A and with the DNA modification template [UHA-A]-[B]- [Marker-1 ]-[B]-[DHA-A], Assays were plated within Vogel’s agar with 1 M sorbitol (Vogel, Microbiol Genet Bull 1956). Pyr2 prototrophy was used for selection, and marker integration was confirmed by colony-PCR analysis with the oligonucleotide pair RAS209/RAS214 (SEQ ID NO: 28/29).
For the 2nd step, protoplasts originating from successfully edited cells of the 1st step were used, together with RGEN-B and RGEN-M, and with the DNA modification templates [UHA-A]-[CHMarker-2]-[CHDHA-A] and [UHA-M]-[lnsert]-[DHA-M]. Assays were plated within Vogel’s agar with 1 M sorbitol supplemented with 1 mM adenine, and with 5 mM uridine and 100 pg/mL Hygromycin B. Hygromycin B resistance was used for selection. Marker swapping was confirmed by colony-PCR analysis with the oligonucleotide pair RAS209/RAS214 (SEQ ID NO: 28/29). Editing of ade2 in emerging red colonies was analyzed by colony-PCR using the oligonucleotide pair RAS531/RAS536 (SEQ ID NO: 22/25); successful homologous recombination with the DNA modification template [UHA-M]-[lnsert]-[DHA-M] was designed to result in distinguishable Eag\ restriction patterns of colony-PCR products (86 bps + 941 bps + 1036 bps), compared with patterns from wild-type ade2 (86 bps + 1968 bps) or editing events by non-homologous end joining (NHEJ).
Table 3. Multiplex genome engineering: stop codon insertion into acte2 while swapping pyr2 with hph.
Table 3 illustrates results of editing ade2 while swapping pyr2 with hph (2nd step), using different amounts of DNA modification templates. The number of red colonies indicative for Aacte2, the number of white colonies indicative for no ade2 editing, and the fraction of 12 selected red colonies with colony-PCR product Eag\ restriction patterns indicative for homologous recombination are shown. In summary, when editing ade2, red colonies were observed for all tested concentrations of the DNA modification template [UHA-M]-[lnsert]-[DHA-M] (0.2, 0.5 and 1.0 pmol), and RAS531/RAS536 colony-PCR product Eag\ restriction patterns indicated high freguency of homologous recombination. When no DNA modification template [UHA- M]-[lnsert]-[DHA-M] was provided, red colonies still emerged, likely because of inaccurate ade2 repair after RGEN-M cutting via NHEJ. Without providing the DNA modification template [UHA-A]-[B]-[Marker-2]-[B]-[DHA-A], no colonies emerged.
EXAMPLE 2
Inserting a stop codon into Trichoderma reesei ade2 while replacing a removable hph selection marker construct with a removable pyr2 selection marker construct at a predetermined genome seguence
This example discloses the usage of two selection markers in a marker swapping system while simultaneously editing a gene of interest (GOI). In a first genome modification assay (Figure 2, 1st step), the first marker is integrated at a
predetermined genome sequence. In a subsequent genome modification assay (Figure 2, 2nd step), the previously integrated marker is excised again and replaced by a second marker while simultaneously inserting a polynucleotide sequence into a GOI at another genome sequence (multiplex genome engineering). The method described herein allows for iterative rounds of multiplex genome engineering using the same marker swapping system.
RGENs and DNA modification templates were designed to replace a previously integrated removable construct comprising the marker hph (Mach et al., Curr Genet 1994) with a removable construct comprising the marker pyr2 (Jorgensen et al., Microbial Cell Factories 2014) at a predetermined target sequence within the genome of Trichoderma reesei QM6a. In parallel, ade2 is edited, coding for a phosphoribosyl aminoimidazole carboxylase necessary for the synthesis of purines (Jorgensen et al., Microbial Cell Factories 2014). The modification of ade2 was designed to insert a stop codon, giving rise to a ade2 phenotype showing red- colored colonies when supplementing with adenine (due to the accumulation and polymerization of the purine precursor 5-aminoimidazole ribonucleotide).
Figure 2 illustrates RGENs, DNA modification templates and the genome composition before and after editing, in this case RGEN-A targeting a predetermined target sequence [A], allowing for homologous recombination with the DNA modification template [UHA-A]-[C]-[Marker-2HC]-[DHA-A] (SEQ ID NO: 2), harboring the hph cassette [Marker-2] flanked by the unique target sequence [C] and framed by the upstream and downstream homology arms [UHA-A] and [DHA-A] (1st step).
In a subsequent genome modification assay (2nd step), the hph cassette [Marker-2] is excised again by targeting the two previously introduced flanking target sequences [C] with RGEN-C, allowing for homologous recombination with the DNA modification template [UHA-A]-[B]-[Marker-1]-[B]-[DHA-A] (SEQ ID NO: 1), harboring the pyr2 cassette [Marker-1 ] flanked by the unique target sequence [B] and framed by the upstream and downstream homology arms [UHA-A] and [DHA-A]; simultaneously, the additional target sequence [M] within the coding sequence of ade2 (SEQ ID NO: 3) is targeted by RGEN-M, resulting in the in-frame insertion of a stop codon via homologous recombination with the DNA modification template [UHA- M]-[lnsert]-[DHA-M] (SEQ ID NO: 4), harboring the polynucleotide sequence of
interest to be inserted into the genome [Insert] framed by the upstream and downstream homology arms [UHA-M] and [DHA-M],
Table 1 (see Example 1 ) illustrates targeting DNA sequences of RGENs, including their respective PAM sequences, in this case target sequence [A] (SEQ ID NO: 5) for RGEN-A, cutting T. reesei QM6a chromosome 3 after bp position 3,895,292 (Li et al., Biotechnology for Biofuels 2017), framed by the upstream arm [UA-A] (QM6a chromosome 2, bp position 3,894,161 - 3,895,222) and the downstream arm [DA-A] (QM6a chromosome 2, bp position 3,895,298 - 3,896,303); target sequence [B] (SEQ ID NO: 6) for RGEN-B, cutting the pyr2 flanking sequences within DNA modification template [UHA-A]-[B]-[Marker-1]-[B]-[DHA-A]; target sequence [C] (SEQ ID NO: 7) for RGEN-C, cutting the hph flanking sequences within DNA modification template [UHA-A]-[C]-[Marker-2]-[C]-[DHA-A]; target sequence [M] (SEQ ID NO: 8) for RGEN-M, cutting T. reesei QM6a chromosome 3 after bp position 5,160,514 (within the coding sequence of ade2), framed by the upstream arm [IIA-M] (QM6a chromosome 3, bp position 5, 161 ,515 - 5,160,515) and the downstream arm [DA-M] (QM6a chromosome 2, bp position 5,160,514 - 5,159,506) (inverse reading orientations). All RGENs consist of in vitro assemblies of commercially available S. pyogenes Cas9 (NEB: EnGen® Spy Cas9 NLS) together with synthetic sgRNA (Biolegio: Synthego).
Table 2 (see Example 1 ) illustrates oligonucleotides used to assemble subcloning vectors and to PCR-amplify DNA modification templates. All vectors are based on pUC18 (Yanisch-Perron et al., Gene 1985), and vector construction was carried out via seamless assembly (Thermo Fisher Scientific: GeneArt™ Seamless Cloning and Assembly Kit) and subcloning in E. coli (Thermo Fisher Scientific: One Shot™ TQP10 Chemically Competent E. coli). The DNA modification template [UHA- AHBHMarker-1]-[B]-[DHA-A] (SEQ ID NO: 1 ) was PCR-amplified using the oligonucleotide pair RAS210/RAS213 (SEQ ID NO: 9/10) from a vector constructed by assembling the PCR products RAS210/RAS415X (SEQ ID NO:9/11 ), RAS301X/RAS414 (SEQ ID NO: 12/13) and RAS303/RAS213 (SEQ ID NO: 14/10) amplified from QM6a genomic DNA together with RAS234/RAS233 (SEQ ID NO: 15/16) amplified from pUC18. The DNA modification template [UHA-A]-[C]-[Marker- 2]-[C]-[DHA-A] (SEQ ID NO: 2) was PCR-amplified using the oligonucleotide pair RAS210/RAS213 (SEQ ID NO: 9/10), from a vector constructed by assembling the PCR products RAS210/RAS417X (SEQ ID NO: 9/17) and RAS303/RAS213 (SEQ ID
NO: 14/10) amplified from QM6a genomic DNA together with RAS304X/RAS416 (SEQ ID NO: 18/19) amplified from a synthetic construct together with RAS234/RAS233 (SEQ ID NO: 15/16) amplified from pUC18. The DNA modification template [UHA-M]-[lnsert]-[DHA-M] (SEQ ID NO: 4) was PCR-amplified using the oligonucleotide pair RAS532/RAS535 (SEQ ID NQ:20/21 ), from a vector constructed by assembling the PCR products RAS531/RAS533 (SEQ ID NO: 22/23) and RAS534/RAS536 (SEQ ID NO: 24/25) amplified from QM6a genomic DNA together with RAS537/RAS538 (SEQ ID NO: 26/27) amplified from pUC18.
T. reesei QM6a genome editing was carried out via protoplast transformation according to standard protocol (Penttila et al., Gene 1987). In a volume of 150 pL, approximately 5*10A6 protoplasts, 10 pmol per RGEN, and 0.2, 0.5 or 1 pmol per DNA modification template were used.
For the 1st step, protoplasts from a pyr2-auxotrophic background were used, together with RGEN-A and with the DNA modification template [UHA-A]-[C]- [Marker-2]-[C]-[DHA-A]. Assays were plated within Vogel’s agar with 1 M sorbitol (Vogel, Microbiol Genet Bull 1956) supplemented with 5 mM undine and 100 pg/mL Hygromycin B. Hygromycin B resistance was used for selection, and marker integration was confirmed by colony-PCR analysis with the oligonucleotide pair RAS209/RAS214 (SEQ ID NO: 28/29).
For the 2nd step, protoplasts originating from successfully edited cells of the 1st step were used, together with RGEN-C and RGEN-M, and with the DNA modification templates [UHA-A]-[B]-[Marker-1]-[B]-[DHA-A] and [UHA-M]-[lnsert]-[DHA-M]. Assays were plated within Vogel’s agar with 1 M sorbitol supplemented with 1 mM adenine. Pyr2 prototrophy was used for selection. Marker swapping was confirmed by colony-PCR analysis with the oligonucleotide pair RAS209/RAS214 (SEQ ID NO: 28/29). Editing of ade2 in emerging red colonies was analyzed by colony-PCR using the oligonucleotide pair RAS531/RAS536 (SEQ ID NO: 22/25); successful homologous recombination with the DNA modification template [UHA-M]-[lnsert]- [DHA-M] was designed to result in distinguishable Eag\ restriction patterns of colony- PCR products (86 bps + 941 bps + 1036 bps), compared with patterns from wild-type ade2 (86 bps + 1968 bps) or editing events by non-homologous end joining (NHEJ).
Table 4. Multiplex genome engineering: stop codon insertion into acte2 while swapping hph with pyr2.
Table 4 illustrates results of editing ade2 while swapping hph with pyr2 (2nd step), using different amounts of DNA modification templates. The number of red colonies indicative for Aacte2, the number of white colonies indicative for no ade2 editing, and the fraction of 12 selected red colonies with colony-PCR product Eag\ restriction patterns indicative for homologous recombination are shown. In summary, when editing ade2, red colonies were observed for all tested concentrations of the DNA modification template [UHA-M]-[lnsert]-[DHA-M] (0.2, 0.5 and 1.0 pmol), and RAS531/RAS536 colony-PCR product Eag\ restriction patterns indicated high frequency of homologous recombination. When no DNA modification template [UHA- M]-[lnsert]-[DHA-M] was provided, red colonies still emerged, likely because of inaccurate ade2 repair after RGEN-M cutting via NHEJ. Without providing the DNA modification template [UHA-A]-[B]-[Marker-1]-[B]-[DHA-A], no colonies emerged.
Claims
1 . A method for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct, the method comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]-[Marker-2]-[C]) comprising a second selection marker ([Marker2]) flanked by a second unique target sequence ([C]), wherein said first RGEN in combination with said first DNA modification template enables the replacement of said first selection marker construct with said second selection marker construct via homologous recombination; and, c) identifying one or more microbial cells from (b) that has said second selection marker construct integrated at said predetermined target sequence.
2. A method for replacing a first selection marker construct integrated at a predetermined target sequence of a microbial cell with a second selection marker construct while simultaneously modifying at least one additional target sequence, the method comprising: a) providing one or more microbial cells having a first selection marker construct ([B]-[Marker-1]-[B]) integrated at a predetermined target sequence ([A]), wherein said first selection marker construct comprises a first selection marker ([Marker-1]) flanked by a first unique target sequence ([B]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN -B) and a first DNA modification template, wherein said first DNA modification template comprises a second selection marker construct ([C]-[Marker-2]-[C]) comprising a second selection marker ([Marker-2]) flanked by a second unique target sequence ([C]), wherein said first RGEN in combination with said first DNA modification template
enables the replacement of said first selection marker construct with said second selection marker construct via homologous recombination; c) simultaneously with step (b), introducing into the microbial cells of (a) a modification at said at least one additional target sequence; and, d) identifying one or more microbial cells from (c) that has said second selection marker construct replacing said first marker construct, and that has said modification at said at least one additional target sequence.
3. A method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell, the method comprising: a) providing one or more microbial cells having a second selection marker construct ([C]-[Marker-2]-[C]) integrated at a predetermined target sequence ([A]), wherein said second selection marker construct comprises a second selection marker ([Marker-2]) flanked by a first unique target sequence ([C]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first DNA modification template, wherein said first DNA modification template comprises a first selection marker construct ([B]-[Marker-1]-[B]) comprising a first selection marker ([Marker-1]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the replacement of said second selection marker construct with said first selection marker construct via homologous recombination, thereby reestablishing said first selection marker construct at said predetermined target sequence; and, c) identifying one or more microbial cells from (b) that has said first selection marker construct reestablished at said predetermined target sequence.
4. A method for reestablishing a first selection marker construct integrated at a predetermined target sequence of a microbial cell, the method comprising: a) providing one or more microbial cells having a second selection marker construct (referred to as [C]-[Marker2]-[C]) integrated at a predetermined target sequence ([A]), wherein said a second selection marker construct comprises a second selection marker ([Marker2]) flanked by a first unique target sequence ([C]), wherein said cells have at least one additional target sequence ([M]); b) introducing into the microbial cells of (a) a first RGEN (RGEN-C) and a first
DNA modification template, wherein said first DNA modification template comprises
a first selection marker construct ([B]-[Marker-1]-[B]) comprising a first selection marker ([Marked]) flanked by a second unique target sequence ([B]), wherein said first RGEN in combination with said DNA modification template enables the replacement of said second selection marker construct with said first selection marker construct via homologous recombination, thereby reestablishing said first selection marker construct at said predetermined target sequence; c) simultaneously with step (b), introducing into the microbial cells of (a) a modification at said at least one additional target sequence; and, d) identifying one or more microbial cells from (c) that has said first selection marker construct reestablished at said predetermined target sequence and that has said modification at said at least one additional target sequence.
5. The method of claim 2 or claim 4, wherein said modification at said at least one additional target sequence is selected from the group consisting of an insertion of a polynucleotide of interest, a deletion of a polynucleotide of interest, a replacement of a polynucleotide of interest, and any one combination thereof.
6. The method of claims 2 or claim 4, wherein said simultaneously introducing a modification comprises introducing at least a second RGEN (RGEN-M) and at least a second DNA modification template ([UHA-M]-[lnsert]-[DHA-M]) comprising a polynucleotide of interest ([Insert]), wherein said second RGEN in combination with said second DNA modification template enables the integration of said polynucleotide of interest at said at least one additional target sequence ([M]).
7. The method of claim 2 or claim 4, wherein the microbial cells of (a) have at least a first additional target sequence [(Ma)] and a second additional target sequence p(M[3)] flanking a polynucleotide of interest to be deleted, and wherein said simultaneously introducing a modification comprises introducing at least a second RGEN (RGEN-Ma), a third RGEN (RGEN-M[3) and at least a third DNA modification template ([UHA-D]-[DHA-D]) comprising an Upstream Homology Arm ([UHA-D]) directly linked to Downstream Homology Arm ([DHA-D]), wherein said UHA-D and DHA-D are homologous to a genomic region of said microbial cell flanking said polynucleotide sequence of interest to be deleted, wherein said third
RGEN-Ma and fourth RGEN-M[3 in combination with said third DNA modification template enables the deletion of said polynucleotide of interest.
8. The method of claims 2 or claim 4, wherein the microbial cells of (a) have at least a first additional target sequence (Ma) and a second additional target sequence (M|3) flanking a first polynucleotide of interest to be replaced, and wherein said simultaneously introducing a modification comprises introducing at least a third RNA-guided endonuclease (RGEN-Ma), a fourth RNA guided endonuclease (RGEN-M[3) and at least a third DNA modification template ([UHA-M]- [lnsert]-[DHA-M]) comprising a second polynucleotide of interest, wherein said RGEN-Ma and RGEN-M[3 in combination with said third DNA modification template enables the replacement of said first polynucleotide sequence of interest with said second polynucleotide of interest.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263385663P | 2022-12-01 | 2022-12-01 | |
US63/385,663 | 2022-12-01 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024118882A1 true WO2024118882A1 (en) | 2024-06-06 |
Family
ID=89474539
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/081763 WO2024118882A1 (en) | 2022-12-01 | 2023-11-30 | Iterative multiplex genome engineering in microbial cells using a selection marker swapping system |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024118882A1 (en) |
Citations (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4873192A (en) | 1987-02-17 | 1989-10-10 | The United States Of America As Represented By The Department Of Health And Human Services | Process for site specific mutagenesis without phenotypic selection |
US4945050A (en) | 1984-11-13 | 1990-07-31 | Cornell Research Foundation, Inc. | Method for transporting substances into living cells and tissues and apparatus therefor |
US5034323A (en) | 1989-03-30 | 1991-07-23 | Dna Plant Technology Corporation | Genetic engineering of novel plant phenotypes |
US5107065A (en) | 1986-03-28 | 1992-04-21 | Calgene, Inc. | Anti-sense regulation of gene expression in plant cells |
US5283184A (en) | 1989-03-30 | 1994-02-01 | Dna Plant Technology Corporation | Genetic engineering of novel plant phenotypes |
US5316931A (en) | 1988-02-26 | 1994-05-31 | Biosource Genetics Corp. | Plant viral vectors having heterologous subgenomic promoters for systemic expression of foreign genes |
US5380831A (en) | 1986-04-04 | 1995-01-10 | Mycogen Plant Science, Inc. | Synthetic insecticidal crystal protein gene |
US5436391A (en) | 1991-11-29 | 1995-07-25 | Mitsubishi Corporation | Synthetic insecticidal gene, plants of the genus oryza transformed with the gene, and production thereof |
US5563055A (en) | 1992-07-27 | 1996-10-08 | Pioneer Hi-Bred International, Inc. | Method of Agrobacterium-mediated transformation of cultured soybean cells |
US5736369A (en) | 1994-07-29 | 1998-04-07 | Pioneer Hi-Bred International, Inc. | Method for producing transgenic cereal plants |
US5879918A (en) | 1989-05-12 | 1999-03-09 | Pioneer Hi-Bred International, Inc. | Pretreatment of microprojectiles prior to using in a particle gun |
US5886244A (en) | 1988-06-10 | 1999-03-23 | Pioneer Hi-Bred International, Inc. | Stable transformation of plant cells |
US5889191A (en) | 1992-12-30 | 1999-03-30 | Biosource Technologies, Inc. | Viral amplification of recombinant messenger RNA in transgenic plants |
US5932782A (en) | 1990-11-14 | 1999-08-03 | Pioneer Hi-Bred International, Inc. | Plant transformation method using agrobacterium species adhered to microprojectiles |
US5981840A (en) | 1997-01-24 | 1999-11-09 | Pioneer Hi-Bred International, Inc. | Methods for agrobacterium-mediated transformation |
US6300543B1 (en) | 1996-07-08 | 2001-10-09 | Pioneer Hi-Bred International, Inc. | Transformation of zygote, egg or sperm cells and recovery of transformed plants from isolated embryo sacs |
WO2002014490A3 (en) | 2000-08-11 | 2003-02-06 | Genencor Int | Bacillus transformation, transformants and mutant libraries |
US6660830B1 (en) | 1996-03-26 | 2003-12-09 | Razvan T Radulescu | Peptides with antiproliferative properties |
WO2007025097A2 (en) | 2005-08-26 | 2007-03-01 | Danisco A/S | Use |
US7309576B2 (en) | 2002-04-12 | 2007-12-18 | O'dowd Brian F | Method of identifying transmembrane protein-interacting compounds |
WO2008007989A1 (en) | 2006-07-11 | 2008-01-17 | Grabania, Bogdan | Head for directing objects, especially for displaying screens |
WO2008039370A1 (en) | 2006-09-22 | 2008-04-03 | Danisco Us, Inc., Genencor Division | Acetolactate synthase (als) selectable marker from trichoderma reesei |
WO2013176772A1 (en) | 2012-05-25 | 2013-11-28 | The Regents Of The University Of California | Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription |
US8697359B1 (en) | 2012-12-12 | 2014-04-15 | The Broad Institute, Inc. | CRISPR-Cas systems and methods for altering expression of gene products |
WO2014102241A1 (en) * | 2012-12-28 | 2014-07-03 | Ab Enzymes Gmbh | Genes/genetic elements associated with mating impairment in trichoderma reesei qm6a and its derivatives and process for their identification |
US20150059010A1 (en) | 2013-08-22 | 2015-02-26 | Pioneer Hi-Bred International Inc | Genome modification using guide polynucleotide/cas endonuclease systems and methods of use |
WO2016099887A1 (en) | 2014-12-17 | 2016-06-23 | E. I. Du Pont De Nemours And Company | Compositions and methods for efficient gene editing in e. coli using guide rna/cas endonuclease systems in combination with circular polynucleotide modification templates |
WO2016110453A1 (en) * | 2015-01-06 | 2016-07-14 | Dsm Ip Assets B.V. | A crispr-cas system for a filamentous fungal host cell |
WO2017075195A1 (en) | 2015-10-30 | 2017-05-04 | Danisco Us Inc | Enhanced protein expression and methods thereof |
WO2018156705A1 (en) | 2017-02-24 | 2018-08-30 | Danisco Us Inc. | Compositions and methods for increased protein production in bacillus licheniformis |
-
2023
- 2023-11-30 WO PCT/US2023/081763 patent/WO2024118882A1/en unknown
Patent Citations (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4945050A (en) | 1984-11-13 | 1990-07-31 | Cornell Research Foundation, Inc. | Method for transporting substances into living cells and tissues and apparatus therefor |
US5107065A (en) | 1986-03-28 | 1992-04-21 | Calgene, Inc. | Anti-sense regulation of gene expression in plant cells |
US5380831A (en) | 1986-04-04 | 1995-01-10 | Mycogen Plant Science, Inc. | Synthetic insecticidal crystal protein gene |
US4873192A (en) | 1987-02-17 | 1989-10-10 | The United States Of America As Represented By The Department Of Health And Human Services | Process for site specific mutagenesis without phenotypic selection |
US5866785A (en) | 1988-02-26 | 1999-02-02 | Biosource Technologies, Inc. | Recombinant plant viral nucleic acids |
US5316931A (en) | 1988-02-26 | 1994-05-31 | Biosource Genetics Corp. | Plant viral vectors having heterologous subgenomic promoters for systemic expression of foreign genes |
US5889190A (en) | 1988-02-26 | 1999-03-30 | Biosource Technologies, Inc. | Recombinant plant viral nucleic acids |
US5589367A (en) | 1988-02-26 | 1996-12-31 | Biosource Technologies, Inc. | Recombinant plant viral nucleic acids |
US5886244A (en) | 1988-06-10 | 1999-03-23 | Pioneer Hi-Bred International, Inc. | Stable transformation of plant cells |
US5034323A (en) | 1989-03-30 | 1991-07-23 | Dna Plant Technology Corporation | Genetic engineering of novel plant phenotypes |
US5283184A (en) | 1989-03-30 | 1994-02-01 | Dna Plant Technology Corporation | Genetic engineering of novel plant phenotypes |
US5879918A (en) | 1989-05-12 | 1999-03-09 | Pioneer Hi-Bred International, Inc. | Pretreatment of microprojectiles prior to using in a particle gun |
US5932782A (en) | 1990-11-14 | 1999-08-03 | Pioneer Hi-Bred International, Inc. | Plant transformation method using agrobacterium species adhered to microprojectiles |
US5436391A (en) | 1991-11-29 | 1995-07-25 | Mitsubishi Corporation | Synthetic insecticidal gene, plants of the genus oryza transformed with the gene, and production thereof |
US5563055A (en) | 1992-07-27 | 1996-10-08 | Pioneer Hi-Bred International, Inc. | Method of Agrobacterium-mediated transformation of cultured soybean cells |
US5889191A (en) | 1992-12-30 | 1999-03-30 | Biosource Technologies, Inc. | Viral amplification of recombinant messenger RNA in transgenic plants |
US5736369A (en) | 1994-07-29 | 1998-04-07 | Pioneer Hi-Bred International, Inc. | Method for producing transgenic cereal plants |
US6660830B1 (en) | 1996-03-26 | 2003-12-09 | Razvan T Radulescu | Peptides with antiproliferative properties |
US6300543B1 (en) | 1996-07-08 | 2001-10-09 | Pioneer Hi-Bred International, Inc. | Transformation of zygote, egg or sperm cells and recovery of transformed plants from isolated embryo sacs |
US5981840A (en) | 1997-01-24 | 1999-11-09 | Pioneer Hi-Bred International, Inc. | Methods for agrobacterium-mediated transformation |
WO2002014490A3 (en) | 2000-08-11 | 2003-02-06 | Genencor Int | Bacillus transformation, transformants and mutant libraries |
US7309576B2 (en) | 2002-04-12 | 2007-12-18 | O'dowd Brian F | Method of identifying transmembrane protein-interacting compounds |
WO2007025097A2 (en) | 2005-08-26 | 2007-03-01 | Danisco A/S | Use |
US20100093617A1 (en) | 2005-08-26 | 2010-04-15 | Rodolphe Barrangou | Use |
WO2008007989A1 (en) | 2006-07-11 | 2008-01-17 | Grabania, Bogdan | Head for directing objects, especially for displaying screens |
WO2008039370A1 (en) | 2006-09-22 | 2008-04-03 | Danisco Us, Inc., Genencor Division | Acetolactate synthase (als) selectable marker from trichoderma reesei |
US20140068797A1 (en) | 2012-05-25 | 2014-03-06 | University Of Vienna | Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription |
WO2013176772A1 (en) | 2012-05-25 | 2013-11-28 | The Regents Of The University Of California | Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription |
US8697359B1 (en) | 2012-12-12 | 2014-04-15 | The Broad Institute, Inc. | CRISPR-Cas systems and methods for altering expression of gene products |
WO2014102241A1 (en) * | 2012-12-28 | 2014-07-03 | Ab Enzymes Gmbh | Genes/genetic elements associated with mating impairment in trichoderma reesei qm6a and its derivatives and process for their identification |
US20150059010A1 (en) | 2013-08-22 | 2015-02-26 | Pioneer Hi-Bred International Inc | Genome modification using guide polynucleotide/cas endonuclease systems and methods of use |
US20150082478A1 (en) | 2013-08-22 | 2015-03-19 | E I Du Pont De Nemours And Company | Plant genome modification using guide rna/cas endonuclease systems and methods of use |
WO2016099887A1 (en) | 2014-12-17 | 2016-06-23 | E. I. Du Pont De Nemours And Company | Compositions and methods for efficient gene editing in e. coli using guide rna/cas endonuclease systems in combination with circular polynucleotide modification templates |
US20170369866A1 (en) * | 2014-12-17 | 2017-12-28 | E I Du Pont De Nemours And Company | Compositions and methods for efficient gene editing in e. coli using guide rna/cas endonuclease systems in combination with circular polynucleotide modification templates |
WO2016110453A1 (en) * | 2015-01-06 | 2016-07-14 | Dsm Ip Assets B.V. | A crispr-cas system for a filamentous fungal host cell |
WO2017075195A1 (en) | 2015-10-30 | 2017-05-04 | Danisco Us Inc | Enhanced protein expression and methods thereof |
WO2018156705A1 (en) | 2017-02-24 | 2018-08-30 | Danisco Us Inc. | Compositions and methods for increased protein production in bacillus licheniformis |
Non-Patent Citations (74)
Title |
---|
"Current Protocols", 1994, GREENE PUBLISHING ASSOCIATES, INC. AND JOHN WILEY & SONS, INC., article "Current Protocols in Molecular Biology" |
"GenBank", Database accession no. YP_008868573 |
"Non-Conventional Yeasts in Genetics, Biochemistry and Biotechnology: Practical Protocols", 2003, SPRINGER-VERLAG |
"Techniques in Molecular Biology", 1983, MACMILLAN PUBLISHING COMPANY |
AINLEY ET AL., PLANT BIOTECHNOLOGY JOURNAL, vol. 11, 2013, pages 1126 - 1134 |
ALANI ET AL., GENETICS, vol. 116, 1987, pages 541 - 545 |
BALLAS ET AL., NUCLEIC ACIDS RES, vol. 17, 1989, pages 7891 - 7903 |
BOTSTEIN ET AL., GENE, vol. 8, 1979, pages 17 - 24 |
BROUNS, S.J.J. ET AL., SCIENCE, vol. 327, pages 167 - 170 |
CHEN ET AL., PLOS ONE, vol. 8, pages e57952 |
CHYLINSKI ET AL., RNA BIOLOGY, vol. 10, 2013, pages 726 - 737 |
CHYLINSKI ET AL., RNA BIOLOGY, vol. 10, pages 891 - 899 |
CROSSWAY ET AL., BIOTECHNIQUES, vol. 4, 1986, pages 320 - 34 |
DAVISMAIZELS, PNAS (0027-8424, vol. 111, no. 10, pages E924 - E932 |
DAYHOFF ET AL.: "Atlas of Protein Sequence and Structure", 1978, NATL BIOMED RES FOUND |
DELLA-CIOPPA ET AL., PLANT PHYSIOL., vol. 84, 1987, pages 965 - 968 |
ELROY-STEIN ET AL., PROC. NATL. ACAD. SCI. USA, vol. 89, 1989, pages 10915 - 6130 |
ESVELT ET AL., NATURE METHODS, vol. 10, pages 1116 - 1121 |
GALLIE ET AL., GENE, vol. 165, no. 2, 1995, pages 233 - 238 |
GUERINEAU ET AL., MOL. GEN. GENET., vol. 262, 1991, pages 141 - 144 |
GUILINGER ET AL., NATURE BIOTECHNOLOGY, vol. 32, no. 6, June 2014 (2014-06-01) |
HAFT ET AL.: "Computational Biology", PLOS COMPUT BIOL, vol. 1, no. 6, 2005, pages e60 |
HARTLSEIBOTH, CURR. GENET., vol. 48, 2005, pages 204 - 211 |
HENDEL ET AL., NATURE BIOTECHNOLOGY, vol. 33, 2015, pages 985 - 989 |
HIGGINS ET AL., COMPUT APPL BIOSCI, vol. 8, 1992, pages 189 - 191 |
HIGGINSSHARP, CABIOS, vol. 5, 1989, pages 151 - 153 |
HORVATHBARRANGOU, SCIENCE, vol. 327, 2010, pages 167 - 170 |
HSU ET AL., CELL, vol. 157, 2013, pages 1262 - 1278 |
INGELBRECHT ET AL., PLANT CELL, vol. 1, 1989, pages 671 - 680 |
ISHIBASHI AIRI ET AL: "A simple method using CRISPR-Cas9 to knock-out genes in murine cancerous cell lines", SCIENTIFIC REPORTS, vol. 10, no. 1, 18 December 2020 (2020-12-18), US, XP093133679, ISSN: 2045-2322, Retrieved from the Internet <URL:https://www.nature.com/articles/s41598-020-79303-0> DOI: 10.1038/s41598-020-79303-0 * |
JOBLING ET AL., NATURE, vol. 325, 1987, pages 622 - 625 |
JOHNSON ET AL., VIROLOGY, vol. 154, 1986, pages 9 - 20 |
JORGENSEN ET AL., MICROBIAL CELL FACTORIES, vol. 13, no. 1, 2014, pages 33 |
JOSHI ET AL., NUCLEIC ACIDS RES, vol. 15, 1987, pages 9627 - 9639 |
KAHSANOV FK ET AL., MOL GEN GENETICS, vol. 234, 1992, pages 494 - 497 |
KUNKEL ET AL., METH ENZYMOL, vol. 154, 1987, pages 367 - 82 |
KUNKEL, PROC. NATL. ACAD. SCI. USA, vol. 82, 1985, pages 488 - 92 |
KUZNETSOV GLEB ET AL: "Optimizing complex phenotypes through model-guided multiplex genome engineering", BIORXIV, 15 December 2016 (2016-12-15), XP093134138, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/086595v3> [retrieved on 20240222], DOI: 10.1101/086595 * |
LI ET AL., BIOTECHNOLOGY FOR BIOFUELS, 2017 |
LI ET AL., MICROB. CEL.L FACT, vol. 16, 2017, pages 168 |
LIEBER, ANNU. REV. BIOCHEM., vol. 79, 2010, pages 181 - 211 |
LOMMEL, VIROLOGY, vol. 81, 1991, pages 382 - 385 |
LOVETT ET AL., GENETICS, vol. 160, 2002, pages 851 - 859 |
MACEJAK ET AL., NATURE, vol. 353, 1991, pages 90 - 94 |
MACH ET AL., CURR GENET, 1994 |
MACH ET AL., CURR. GENET., vol. 25, no. 6, 1994, pages 567 - 570 |
MAKAROVA ET AL., NATURE REVIEWS MICROBIOLOGY, vol. 13, 2015, pages 1 - 15 |
MAKAROVA ET AL.: "Microbiology", NATURE REVIEWS, vol. 13, 2015, pages 1 - 15 |
MOGEN ET AL., PLANT CELL, vol. 2, 1990, pages 1261 - 1272 |
MUNROE ET AL., GENE, vol. 91, 1990, pages 151 - 158 |
MURRAY ET AL., NUCLEIC ACIDS RES., vol. 17, 1989, pages 477 - 498 |
NEEDLEMANWUNSCH, J MOL BIOL, vol. 48, 1970, pages 443 - 53 |
OUEDRAOGO ET AL., APPL. MICROBIAL. BIOTECHNOL., vol. 99, no. 23, 2015, pages 10083 - 95 |
PAPADOPOULOUDUMAS, NUCLEIC ACIDS RES, vol. 25, 1997, pages 4278 - 86 |
PASZKOWSKI ET AL., EMBO J, vol. 3, 1984, pages 2717 - 22 |
PENTTILA ET AL., GENE, no. 2, 1987, pages 155 - 164 |
PROUDFOOT, CELL, vol. 64, 1991, pages 671 - 674 |
RIGGS ET AL., PROC. NATL. ACAD. SCI. USA, vol. 83, 1986, pages 5602 - 6 |
SANFACON ET AL., GENES DEV, vol. 5, 1991, pages 141 - 149 |
SCHUSTERKAHMANN, FUNGAL GENET BIOL, vol. 130, 2019, pages 43 - 53 |
SHAHEEN A.M. ARSHAD: "Properties and Applications of Silicon Carbide", 2011, INTECH, pages: 345 - 358 |
SHMAKOV ET AL., MOLECULAR CELL, vol. 60, 2015, pages 1 - 13 |
SMITH ET AL., CURR GENET, vol. 19, no. 1, 1991, pages 27 - 33 |
SONG ET AL., APPL MICROBIOL BIOTECHNOL, vol. 103, 2019, pages 6919 - 6932 |
SPENCER ET AL., APPL. MICROBIOL. BIOTECHNOL., vol. 58, pages 147 - 156 |
TIJSSEN: "Laboratory Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic Acid Probes", 1993, ELSEVIER |
TINLAND, PROC. NATL. ACAD. SCI. USA, vol. 89, 1992, pages 7442 - 6 |
TURNERFOSTER, MOL BIOTECHNOL, vol. 3, 1995, pages 225 - 236 |
URSACHE ROBERTAS ET AL: "Combined fluorescent seed selection and multiplex CRISPR/Cas9 assembly for fast generation of multiple Arabidopsis mutants", PLANT METHODS, vol. 17, no. 1, 1 December 2021 (2021-12-01), pages 111, XP093008430, Retrieved from the Internet <URL:https://plantmethods.biomedcentral.com/counter/pdf/10.1186/s13007-021-00811-9.pdf> DOI: 10.1186/s13007-021-00811-9 * |
VOGEL, MICROBIOL GENET BULL, 1956 |
XIE ET AL., PNAS, vol. 112, 2015, pages 3570 - 3575 |
YANISCH-PERRON ET AL., GENE, 1985 |
YUZBASHEV TIGRAN V. ET AL: "A DNA assembly toolkit to unlock the CRISPR/Cas9 potential for metabolic engineering", COMMUNICATIONS BIOLOGY, vol. 6, no. 1, 18 August 2023 (2023-08-18), XP093133683, ISSN: 2399-3642, Retrieved from the Internet <URL:https://www.nature.com/articles/s42003-023-05202-5> DOI: 10.1038/s42003-023-05202-5 * |
ZETSCHE B ET AL., CELL, vol. 163, 2015, pages 1013 - 13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3390631B1 (en) | Methods and compositions for t-rna based guide rna expression | |
EP3387134B1 (en) | Methods and compositions for enhanced nuclease-mediated genome modification and reduced off-target site effects | |
AU2015364629B2 (en) | Fungal genome modification systems and methods of use | |
CN107223157B (en) | Compositions and methods for helper-strain-mediated modification of fungal genomes | |
CN107849562B (en) | Genome editing system and method of use | |
US20220162621A1 (en) | Methods For Polynucleotide Integration Into The Genome Of Bacillus Using Dual Circular Recombinant DNA Constructs And Compositions Thereof | |
EP3180425B1 (en) | Genetic targeting in non-conventional yeast using an rna-guided endonuclease | |
US20220177923A1 (en) | Methods for integrating a donor DNA sequence into the genome of bacillus using linear recombinant DNA constructs and compositions thereof | |
CN103228789B (en) | For excising the nucleic acid of target nucleic acid, composition and method | |
EP3684927B1 (en) | Methods for genomic integration for kluyveromyces host cells | |
WO2024118882A1 (en) | Iterative multiplex genome engineering in microbial cells using a selection marker swapping system | |
WO2024118881A1 (en) | Iterative muliplex genome engineering in microbial cells using a bidirectional selection marker system | |
WO2024118876A1 (en) | Iterative multiplex genome engineering in microbial cells using a recombinant self-excisable selection marker system | |
US20220389459A1 (en) | Selection marker free methods for modifying the genome of bacillus and compositions thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23836686 Country of ref document: EP Kind code of ref document: A1 |