CN110607320A - A plant genome directed base editing backbone vector and its application - Google Patents
A plant genome directed base editing backbone vector and its application Download PDFInfo
- Publication number
- CN110607320A CN110607320A CN201811403794.0A CN201811403794A CN110607320A CN 110607320 A CN110607320 A CN 110607320A CN 201811403794 A CN201811403794 A CN 201811403794A CN 110607320 A CN110607320 A CN 110607320A
- Authority
- CN
- China
- Prior art keywords
- lys
- pmcda1
- leu
- ncas9
- editing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000013598 vector Substances 0.000 title claims description 54
- 238000013518 transcription Methods 0.000 claims abstract description 58
- 230000035897 transcription Effects 0.000 claims abstract description 58
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 claims abstract description 48
- 229940104302 cytosine Drugs 0.000 claims abstract description 25
- 230000014509 gene expression Effects 0.000 claims abstract description 19
- 108020001507 fusion proteins Proteins 0.000 claims abstract description 13
- 102000037865 fusion proteins Human genes 0.000 claims abstract description 13
- 108020005004 Guide RNA Proteins 0.000 claims abstract description 7
- 241000196324 Embryophyta Species 0.000 claims description 67
- 235000007164 Oryza sativa Nutrition 0.000 claims description 39
- 235000009566 rice Nutrition 0.000 claims description 39
- 108091026890 Coding region Proteins 0.000 claims description 34
- 239000002773 nucleotide Substances 0.000 claims description 32
- 125000003729 nucleotide group Chemical group 0.000 claims description 32
- 238000010367 cloning Methods 0.000 claims description 31
- 239000013604 expression vector Substances 0.000 claims description 31
- 108020004414 DNA Proteins 0.000 claims description 30
- 238000003259 recombinant expression Methods 0.000 claims description 29
- 102000000311 Cytosine Deaminase Human genes 0.000 claims description 19
- 108010080611 Cytosine Deaminase Proteins 0.000 claims description 19
- 150000001413 amino acids Chemical class 0.000 claims description 15
- 102220605874 Cytosolic arginine sensor for mTORC1 subunit 2_D10A_mutation Human genes 0.000 claims description 14
- 101710163270 Nuclease Proteins 0.000 claims description 14
- 108091034117 Oligonucleotide Proteins 0.000 claims description 13
- 108010076504 Protein Sorting Signals Proteins 0.000 claims description 11
- 239000012634 fragment Substances 0.000 claims description 10
- 108700026244 Open Reading Frames Proteins 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 230000000295 complement effect Effects 0.000 claims description 9
- 241000193996 Streptococcus pyogenes Species 0.000 claims description 8
- 102000012410 DNA Ligases Human genes 0.000 claims description 6
- 108010061982 DNA Ligases Proteins 0.000 claims description 6
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 5
- 108010076491 BsaI endonuclease Proteins 0.000 claims description 4
- 240000008042 Zea mays Species 0.000 claims description 4
- 235000002017 Zea mays subsp mays Nutrition 0.000 claims description 4
- 102000004190 Enzymes Human genes 0.000 claims description 3
- 108090000790 Enzymes Proteins 0.000 claims description 3
- 238000000137 annealing Methods 0.000 claims description 2
- 125000003275 alpha amino acid group Chemical group 0.000 claims 2
- 240000007594 Oryza sativa Species 0.000 claims 1
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 claims 1
- 235000005822 corn Nutrition 0.000 claims 1
- 230000001351 cycling effect Effects 0.000 claims 1
- 238000002360 preparation method Methods 0.000 claims 1
- 230000002194 synthesizing effect Effects 0.000 claims 1
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 abstract description 24
- 229940113082 thymine Drugs 0.000 abstract description 12
- 238000010353 genetic engineering Methods 0.000 abstract description 2
- 241000209094 Oryza Species 0.000 description 38
- 108090000623 proteins and genes Proteins 0.000 description 18
- 230000009466 transformation Effects 0.000 description 18
- 238000000034 method Methods 0.000 description 13
- 210000001938 protoplast Anatomy 0.000 description 13
- 238000004458 analytical method Methods 0.000 description 12
- 238000010362 genome editing Methods 0.000 description 12
- 210000004027 cell Anatomy 0.000 description 9
- 238000010276 construction Methods 0.000 description 9
- KOSRFJWDECSPRO-UHFFFAOYSA-N alpha-L-glutamyl-L-glutamic acid Natural products OC(=O)CCC(N)C(=O)NC(CCC(O)=O)C(O)=O KOSRFJWDECSPRO-UHFFFAOYSA-N 0.000 description 8
- 108091033409 CRISPR Proteins 0.000 description 7
- 108010062796 arginyllysine Proteins 0.000 description 7
- 108010050848 glycylleucine Proteins 0.000 description 7
- 238000012165 high-throughput sequencing Methods 0.000 description 7
- 108010054155 lysyllysine Proteins 0.000 description 7
- 241000589158 Agrobacterium Species 0.000 description 6
- 238000013461 design Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 108010055341 glutamyl-glutamic acid Proteins 0.000 description 6
- 230000001404 mediated effect Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 230000008439 repair process Effects 0.000 description 6
- 238000010356 CRISPR-Cas9 genome editing Methods 0.000 description 5
- 238000012408 PCR amplification Methods 0.000 description 5
- 108010092854 aspartyllysine Proteins 0.000 description 5
- 238000001976 enzyme digestion Methods 0.000 description 5
- 230000004927 fusion Effects 0.000 description 5
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 4
- 206010020649 Hyperkeratosis Diseases 0.000 description 4
- DRRXXZBXDMLGFC-IHRRRGAJSA-N Lys-Val-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CCCCN DRRXXZBXDMLGFC-IHRRRGAJSA-N 0.000 description 4
- YBAFDPFAUTYYRW-UHFFFAOYSA-N N-L-alpha-glutamyl-L-leucine Natural products CC(C)CC(C(O)=O)NC(=O)C(N)CCC(O)=O YBAFDPFAUTYYRW-UHFFFAOYSA-N 0.000 description 4
- 108020004566 Transfer RNA Proteins 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000011559 double-strand break repair via nonhomologous end joining Effects 0.000 description 4
- 229930027917 kanamycin Natural products 0.000 description 4
- SBUJHOSQTJFQJX-NOAMYHISSA-N kanamycin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N SBUJHOSQTJFQJX-NOAMYHISSA-N 0.000 description 4
- 229960000318 kanamycin Drugs 0.000 description 4
- 229930182823 kanamycin A Natural products 0.000 description 4
- 108010034529 leucyl-lysine Proteins 0.000 description 4
- 108010057821 leucylproline Proteins 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000001052 transient effect Effects 0.000 description 4
- 238000010354 CRISPR gene editing Methods 0.000 description 3
- 108020004705 Codon Proteins 0.000 description 3
- FADYJNXDPBKVCA-UHFFFAOYSA-N L-Phenylalanyl-L-lysin Natural products NCCCCC(C(O)=O)NC(=O)C(N)CC1=CC=CC=C1 FADYJNXDPBKVCA-UHFFFAOYSA-N 0.000 description 3
- 235000016383 Zea mays subsp huehuetenangensis Nutrition 0.000 description 3
- 108010038633 aspartylglutamate Proteins 0.000 description 3
- 108010068265 aspartyltyrosine Proteins 0.000 description 3
- 230000037429 base substitution Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 210000004899 c-terminal region Anatomy 0.000 description 3
- 108010089804 glycyl-threonine Proteins 0.000 description 3
- 108010025306 histidylleucine Proteins 0.000 description 3
- 235000009973 maize Nutrition 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- 230000037361 pathway Effects 0.000 description 3
- 108010051242 phenylalanylserine Proteins 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 108091008146 restriction endonucleases Proteins 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 108010073969 valyllysine Proteins 0.000 description 3
- ROLXPVQSRCPVGK-XDTLVQLUSA-N Ala-Glu-Tyr Chemical compound N[C@@H](C)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)O ROLXPVQSRCPVGK-XDTLVQLUSA-N 0.000 description 2
- VNFWDYWTSHFRRG-SRVKXCTJSA-N Arg-Gln-Leu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(C)C)C(O)=O VNFWDYWTSHFRRG-SRVKXCTJSA-N 0.000 description 2
- PAPSMOYMQDWIOR-AVGNSLFASA-N Arg-Lys-Val Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O PAPSMOYMQDWIOR-AVGNSLFASA-N 0.000 description 2
- IZSMEUDYADKZTJ-KJEVXHAQSA-N Arg-Tyr-Thr Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H]([C@@H](C)O)C(O)=O IZSMEUDYADKZTJ-KJEVXHAQSA-N 0.000 description 2
- BDMIFVIWCNLDCT-CIUDSAMLSA-N Asn-Arg-Glu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(O)=O BDMIFVIWCNLDCT-CIUDSAMLSA-N 0.000 description 2
- UJGRZQYSNYTCAX-SRVKXCTJSA-N Asp-Leu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC(O)=O UJGRZQYSNYTCAX-SRVKXCTJSA-N 0.000 description 2
- BJDHEININLSZOT-KKUMJFAQSA-N Asp-Tyr-Lys Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCCCN)C(O)=O BJDHEININLSZOT-KKUMJFAQSA-N 0.000 description 2
- 101800001415 Bri23 peptide Proteins 0.000 description 2
- 101800000655 C-terminal peptide Proteins 0.000 description 2
- 102400000107 C-terminal peptide Human genes 0.000 description 2
- 102000053602 DNA Human genes 0.000 description 2
- 230000033616 DNA repair Effects 0.000 description 2
- 230000007018 DNA scission Effects 0.000 description 2
- YQYJSBFKSSDGFO-UHFFFAOYSA-N Epihygromycin Natural products OC1C(O)C(C(=O)C)OC1OC(C(=C1)O)=CC=C1C=C(C)C(=O)NC1C(O)C(O)C2OCOC2C1O YQYJSBFKSSDGFO-UHFFFAOYSA-N 0.000 description 2
- 241000588724 Escherichia coli Species 0.000 description 2
- PNENQZWRFMUZOM-DCAQKATOSA-N Gln-Glu-Leu Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O PNENQZWRFMUZOM-DCAQKATOSA-N 0.000 description 2
- WTMZXOPHTIVFCP-QEWYBTABSA-N Glu-Ile-Phe Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 WTMZXOPHTIVFCP-QEWYBTABSA-N 0.000 description 2
- MWMJCGBSIORNCD-AVGNSLFASA-N Glu-Leu-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O MWMJCGBSIORNCD-AVGNSLFASA-N 0.000 description 2
- SWRVAQHFBRZVNX-GUBZILKMSA-N Glu-Lys-Asn Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(O)=O SWRVAQHFBRZVNX-GUBZILKMSA-N 0.000 description 2
- SYAYROHMAIHWFB-KBIXCLLPSA-N Glu-Ser-Ile Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O SYAYROHMAIHWFB-KBIXCLLPSA-N 0.000 description 2
- LCNXZQROPKFGQK-WHFBIAKZSA-N Gly-Asp-Ser Chemical compound NCC(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O LCNXZQROPKFGQK-WHFBIAKZSA-N 0.000 description 2
- JBCLFWXMTIKCCB-UHFFFAOYSA-N H-Gly-Phe-OH Natural products NCC(=O)NC(C(O)=O)CC1=CC=CC=C1 JBCLFWXMTIKCCB-UHFFFAOYSA-N 0.000 description 2
- BCISUQVFDGYZBO-QSFUFRPTSA-N Ile-Val-Asp Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CC(O)=O BCISUQVFDGYZBO-QSFUFRPTSA-N 0.000 description 2
- 241000880493 Leptailurus serval Species 0.000 description 2
- WUFYAPWIHCUMLL-CIUDSAMLSA-N Leu-Asn-Ala Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](C)C(O)=O WUFYAPWIHCUMLL-CIUDSAMLSA-N 0.000 description 2
- DBSLVQBXKVKDKJ-BJDJZHNGSA-N Leu-Ile-Ala Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C)C(O)=O DBSLVQBXKVKDKJ-BJDJZHNGSA-N 0.000 description 2
- IZPVWNSAVUQBGP-CIUDSAMLSA-N Leu-Ser-Asp Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(O)=O IZPVWNSAVUQBGP-CIUDSAMLSA-N 0.000 description 2
- YIRIDPUGZKHMHT-ACRUOGEOSA-N Leu-Tyr-Tyr Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O YIRIDPUGZKHMHT-ACRUOGEOSA-N 0.000 description 2
- HQVDJTYKCMIWJP-YUMQZZPRSA-N Lys-Asn-Gly Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(O)=O HQVDJTYKCMIWJP-YUMQZZPRSA-N 0.000 description 2
- FACUGMGEFUEBTI-SRVKXCTJSA-N Lys-Asn-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N)CCCCN FACUGMGEFUEBTI-SRVKXCTJSA-N 0.000 description 2
- ZXFRGTAIIZHNHG-AJNGGQMLSA-N Lys-Ile-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)O)NC(=O)[C@H](CCCCN)N ZXFRGTAIIZHNHG-AJNGGQMLSA-N 0.000 description 2
- WVJNGSFKBKOKRV-AJNGGQMLSA-N Lys-Leu-Ile Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O WVJNGSFKBKOKRV-AJNGGQMLSA-N 0.000 description 2
- LJADEBULDNKJNK-IHRRRGAJSA-N Lys-Leu-Val Chemical compound CC(C)C[C@H](NC(=O)[C@@H](N)CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O LJADEBULDNKJNK-IHRRRGAJSA-N 0.000 description 2
- RRIHXWPHQSXHAQ-XUXIUFHCSA-N Met-Ile-Lys Chemical compound CSCC[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCCN)C(O)=O RRIHXWPHQSXHAQ-XUXIUFHCSA-N 0.000 description 2
- 108010047562 NGR peptide Proteins 0.000 description 2
- WEMYTDDMDBLPMI-DKIMLUQUSA-N Phe-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)N WEMYTDDMDBLPMI-DKIMLUQUSA-N 0.000 description 2
- DSGYZICNAMEJOC-AVGNSLFASA-N Ser-Glu-Phe Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O DSGYZICNAMEJOC-AVGNSLFASA-N 0.000 description 2
- MIJWOJAXARLEHA-WDSKDSINSA-N Ser-Gly-Glu Chemical compound OC[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CCC(O)=O MIJWOJAXARLEHA-WDSKDSINSA-N 0.000 description 2
- MECLEFZMPPOEAC-VOAKCMCISA-N Thr-Leu-Lys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)O)N)O MECLEFZMPPOEAC-VOAKCMCISA-N 0.000 description 2
- HSBZWINKRYZCSQ-KKUMJFAQSA-N Tyr-Lys-Asp Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(O)=O)C(O)=O HSBZWINKRYZCSQ-KKUMJFAQSA-N 0.000 description 2
- WNZSAUMKZQXHNC-UKJIMTQDSA-N Val-Ile-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)O)NC(=O)[C@H](C(C)C)N WNZSAUMKZQXHNC-UKJIMTQDSA-N 0.000 description 2
- 108010008685 alanyl-glutamyl-aspartic acid Proteins 0.000 description 2
- 108010069205 aspartyl-phenylalanine Proteins 0.000 description 2
- FSXRLASFHBWESK-UHFFFAOYSA-N dipeptide phenylalanyl-tyrosine Natural products C=1C=C(O)C=CC=1CC(C(O)=O)NC(=O)C(N)CC1=CC=CC=C1 FSXRLASFHBWESK-UHFFFAOYSA-N 0.000 description 2
- 230000005782 double-strand break Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 108010063718 gamma-glutamylaspartic acid Proteins 0.000 description 2
- 108010048994 glycyl-tyrosyl-alanine Proteins 0.000 description 2
- 108010018006 histidylserine Proteins 0.000 description 2
- 230000006801 homologous recombination Effects 0.000 description 2
- 238000002744 homologous recombination Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 108010003700 lysyl aspartic acid Proteins 0.000 description 2
- 108010017391 lysylvaline Proteins 0.000 description 2
- 230000006780 non-homologous end joining Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 239000013612 plasmid Substances 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- JQXXHWHPUNPDRT-WLSIYKJHSA-N rifampicin Chemical compound O([C@](C1=O)(C)O/C=C/[C@@H]([C@H]([C@@H](OC(C)=O)[C@H](C)[C@H](O)[C@H](C)[C@@H](O)[C@@H](C)\C=C\C=C(C)/C(=O)NC=2C(O)=C3C([O-])=C4C)C)OC)C4=C1C3=C(O)C=2\C=N\N1CC[NH+](C)CC1 JQXXHWHPUNPDRT-WLSIYKJHSA-N 0.000 description 2
- 229960001225 rifampicin Drugs 0.000 description 2
- 238000007480 sanger sequencing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000008685 targeting Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 108010061238 threonyl-glycine Proteins 0.000 description 2
- 238000011426 transformation method Methods 0.000 description 2
- 108010051110 tyrosyl-lysine Proteins 0.000 description 2
- RRBGTUQJDFBWNN-MUGJNUQGSA-N (2s)-6-amino-2-[[(2s)-6-amino-2-[[(2s)-6-amino-2-[[(2s)-2,6-diaminohexanoyl]amino]hexanoyl]amino]hexanoyl]amino]hexanoic acid Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(O)=O RRBGTUQJDFBWNN-MUGJNUQGSA-N 0.000 description 1
- OZRFYUJEXYKQDV-UHFFFAOYSA-N 2-[[2-[[2-[(2-amino-3-carboxypropanoyl)amino]-3-carboxypropanoyl]amino]-3-carboxypropanoyl]amino]butanedioic acid Chemical compound OC(=O)CC(N)C(=O)NC(CC(O)=O)C(=O)NC(CC(O)=O)C(=O)NC(CC(O)=O)C(O)=O OZRFYUJEXYKQDV-UHFFFAOYSA-N 0.000 description 1
- QMOQBVOBWVNSNO-UHFFFAOYSA-N 2-[[2-[[2-[(2-azaniumylacetyl)amino]acetyl]amino]acetyl]amino]acetate Chemical compound NCC(=O)NCC(=O)NCC(=O)NCC(O)=O QMOQBVOBWVNSNO-UHFFFAOYSA-N 0.000 description 1
- ZBMRKNMTMPPMMK-UHFFFAOYSA-N 2-amino-4-[hydroxy(methyl)phosphoryl]butanoic acid;azane Chemical compound [NH4+].CP(O)(=O)CCC(N)C([O-])=O ZBMRKNMTMPPMMK-UHFFFAOYSA-N 0.000 description 1
- FJVAQLJNTSUQPY-CIUDSAMLSA-N Ala-Ala-Lys Chemical compound C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CCCCN FJVAQLJNTSUQPY-CIUDSAMLSA-N 0.000 description 1
- DVWVZSJAYIJZFI-FXQIFTODSA-N Ala-Arg-Asn Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(N)=O)C(O)=O DVWVZSJAYIJZFI-FXQIFTODSA-N 0.000 description 1
- LWUWMHIOBPTZBA-DCAQKATOSA-N Ala-Arg-Lys Chemical compound NC(=N)NCCC[C@H](NC(=O)[C@@H](N)C)C(=O)N[C@@H](CCCCN)C(O)=O LWUWMHIOBPTZBA-DCAQKATOSA-N 0.000 description 1
- XEXJJJRVTFGWIC-FXQIFTODSA-N Ala-Asn-Arg Chemical compound C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)N XEXJJJRVTFGWIC-FXQIFTODSA-N 0.000 description 1
- CVGNCMIULZNYES-WHFBIAKZSA-N Ala-Asn-Gly Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(O)=O CVGNCMIULZNYES-WHFBIAKZSA-N 0.000 description 1
- YSMPVONNIWLJML-FXQIFTODSA-N Ala-Asp-Pro Chemical compound C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N1CCC[C@H]1C(O)=O YSMPVONNIWLJML-FXQIFTODSA-N 0.000 description 1
- FVSOUJZKYWEFOB-KBIXCLLPSA-N Ala-Gln-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](C)N FVSOUJZKYWEFOB-KBIXCLLPSA-N 0.000 description 1
- MVBWLRJESQOQTM-ACZMJKKPSA-N Ala-Gln-Ser Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(O)=O MVBWLRJESQOQTM-ACZMJKKPSA-N 0.000 description 1
- YIGLXQRFQVWFEY-NRPADANISA-N Ala-Gln-Val Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](C(C)C)C(O)=O YIGLXQRFQVWFEY-NRPADANISA-N 0.000 description 1
- NWVVKQZOVSTDBQ-CIUDSAMLSA-N Ala-Glu-Arg Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O NWVVKQZOVSTDBQ-CIUDSAMLSA-N 0.000 description 1
- NJPMYXWVWQWCSR-ACZMJKKPSA-N Ala-Glu-Asn Chemical compound C[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O NJPMYXWVWQWCSR-ACZMJKKPSA-N 0.000 description 1
- BEMGNWZECGIJOI-WDSKDSINSA-N Ala-Gly-Glu Chemical compound [H]N[C@@H](C)C(=O)NCC(=O)N[C@@H](CCC(O)=O)C(O)=O BEMGNWZECGIJOI-WDSKDSINSA-N 0.000 description 1
- DVJSJDDYCYSMFR-ZKWXMUAHSA-N Ala-Ile-Gly Chemical compound [H]N[C@@H](C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)NCC(O)=O DVJSJDDYCYSMFR-ZKWXMUAHSA-N 0.000 description 1
- VHVVPYOJIIQCKS-QEJZJMRPSA-N Ala-Leu-Phe Chemical compound C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 VHVVPYOJIIQCKS-QEJZJMRPSA-N 0.000 description 1
- OYJCVIGKMXUVKB-GARJFASQSA-N Ala-Leu-Pro Chemical compound C[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N1CCC[C@@H]1C(=O)O)N OYJCVIGKMXUVKB-GARJFASQSA-N 0.000 description 1
- KQESEZXHYOUIIM-CQDKDKBSSA-N Ala-Lys-Tyr Chemical compound [H]N[C@@H](C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O KQESEZXHYOUIIM-CQDKDKBSSA-N 0.000 description 1
- MDNAVFBZPROEHO-UHFFFAOYSA-N Ala-Lys-Val Natural products CC(C)C(C(O)=O)NC(=O)C(NC(=O)C(C)N)CCCCN MDNAVFBZPROEHO-UHFFFAOYSA-N 0.000 description 1
- ZBLQIYPCUWZSRZ-QEJZJMRPSA-N Ala-Phe-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@H](C)N)CC1=CC=CC=C1 ZBLQIYPCUWZSRZ-QEJZJMRPSA-N 0.000 description 1
- WEZNQZHACPSMEF-QEJZJMRPSA-N Ala-Phe-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)C)CC1=CC=CC=C1 WEZNQZHACPSMEF-QEJZJMRPSA-N 0.000 description 1
- NHWYNIZWLJYZAG-XVYDVKMFSA-N Ala-Ser-His Chemical compound C[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N NHWYNIZWLJYZAG-XVYDVKMFSA-N 0.000 description 1
- HOVPGJUNRLMIOZ-CIUDSAMLSA-N Ala-Ser-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CO)NC(=O)[C@H](C)N HOVPGJUNRLMIOZ-CIUDSAMLSA-N 0.000 description 1
- VRTOMXFZHGWHIJ-KZVJFYERSA-N Ala-Thr-Arg Chemical compound [H]N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O VRTOMXFZHGWHIJ-KZVJFYERSA-N 0.000 description 1
- IOFVWPYSRSCWHI-JXUBOQSCSA-N Ala-Thr-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](C)N IOFVWPYSRSCWHI-JXUBOQSCSA-N 0.000 description 1
- 241000219194 Arabidopsis Species 0.000 description 1
- HULHGJZIZXCPLD-FXQIFTODSA-N Arg-Ala-Cys Chemical compound C[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N HULHGJZIZXCPLD-FXQIFTODSA-N 0.000 description 1
- VKKYFICVTYKFIO-CIUDSAMLSA-N Arg-Ala-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CCCN=C(N)N VKKYFICVTYKFIO-CIUDSAMLSA-N 0.000 description 1
- MUXONAMCEUBVGA-DCAQKATOSA-N Arg-Arg-Gln Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCC(N)=O)C(O)=O MUXONAMCEUBVGA-DCAQKATOSA-N 0.000 description 1
- HJVGMOYJDDXLMI-AVGNSLFASA-N Arg-Arg-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@@H](N)CCCNC(N)=N HJVGMOYJDDXLMI-AVGNSLFASA-N 0.000 description 1
- IIABBYGHLYWVOS-FXQIFTODSA-N Arg-Asn-Ser Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(O)=O IIABBYGHLYWVOS-FXQIFTODSA-N 0.000 description 1
- PQWTZSNVWSOFFK-FXQIFTODSA-N Arg-Asp-Asn Chemical compound C(C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CC(=O)N)C(=O)O)N)CN=C(N)N PQWTZSNVWSOFFK-FXQIFTODSA-N 0.000 description 1
- RRGPUNYIPJXJBU-GUBZILKMSA-N Arg-Asp-Met Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCSC)C(O)=O RRGPUNYIPJXJBU-GUBZILKMSA-N 0.000 description 1
- JCAISGGAOQXEHJ-ZPFDUUQYSA-N Arg-Gln-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CCCN=C(N)N)N JCAISGGAOQXEHJ-ZPFDUUQYSA-N 0.000 description 1
- QAODJPUKWNNNRP-DCAQKATOSA-N Arg-Glu-Arg Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O QAODJPUKWNNNRP-DCAQKATOSA-N 0.000 description 1
- RKRSYHCNPFGMTA-CIUDSAMLSA-N Arg-Glu-Asn Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O RKRSYHCNPFGMTA-CIUDSAMLSA-N 0.000 description 1
- MZRBYBIQTIKERR-GUBZILKMSA-N Arg-Glu-Gln Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O MZRBYBIQTIKERR-GUBZILKMSA-N 0.000 description 1
- GOWZVQXTHUCNSQ-NHCYSSNCSA-N Arg-Glu-Val Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C(C)C)C(O)=O GOWZVQXTHUCNSQ-NHCYSSNCSA-N 0.000 description 1
- PNIGSVZJNVUVJA-BQBZGAKWSA-N Arg-Gly-Asn Chemical compound NC(N)=NCCC[C@H](N)C(=O)NCC(=O)N[C@@H](CC(N)=O)C(O)=O PNIGSVZJNVUVJA-BQBZGAKWSA-N 0.000 description 1
- AUFHLLPVPSMEOG-YUMQZZPRSA-N Arg-Gly-Glu Chemical compound NC(N)=NCCC[C@H](N)C(=O)NCC(=O)N[C@@H](CCC(O)=O)C(O)=O AUFHLLPVPSMEOG-YUMQZZPRSA-N 0.000 description 1
- SYAUZLVLXCDRSH-IUCAKERBSA-N Arg-Gly-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CCCN=C(N)N)N SYAUZLVLXCDRSH-IUCAKERBSA-N 0.000 description 1
- RKQRHMKFNBYOTN-IHRRRGAJSA-N Arg-His-Lys Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N RKQRHMKFNBYOTN-IHRRRGAJSA-N 0.000 description 1
- NVUIWHJLPSZZQC-CYDGBPFRSA-N Arg-Ile-Arg Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O NVUIWHJLPSZZQC-CYDGBPFRSA-N 0.000 description 1
- FLYANDHDFRGGTM-PYJNHQTQSA-N Arg-Ile-His Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N FLYANDHDFRGGTM-PYJNHQTQSA-N 0.000 description 1
- OTZMRMHZCMZOJZ-SRVKXCTJSA-N Arg-Leu-Glu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O OTZMRMHZCMZOJZ-SRVKXCTJSA-N 0.000 description 1
- FSNVAJOPUDVQAR-AVGNSLFASA-N Arg-Lys-Arg Chemical compound NC(=N)NCCC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O FSNVAJOPUDVQAR-AVGNSLFASA-N 0.000 description 1
- MJINRRBEMOLJAK-DCAQKATOSA-N Arg-Lys-Asp Chemical compound OC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CCCN=C(N)N MJINRRBEMOLJAK-DCAQKATOSA-N 0.000 description 1
- GRRXPUAICOGISM-RWMBFGLXSA-N Arg-Lys-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CCCCN)NC(=O)[C@H](CCCN=C(N)N)N)C(=O)O GRRXPUAICOGISM-RWMBFGLXSA-N 0.000 description 1
- PYZPXCZNQSEHDT-GUBZILKMSA-N Arg-Met-Asn Chemical compound CSCC[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N PYZPXCZNQSEHDT-GUBZILKMSA-N 0.000 description 1
- KSUALAGYYLQSHJ-RCWTZXSCSA-N Arg-Met-Thr Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O KSUALAGYYLQSHJ-RCWTZXSCSA-N 0.000 description 1
- CZUHPNLXLWMYMG-UBHSHLNASA-N Arg-Phe-Ala Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](C)C(O)=O)CC1=CC=CC=C1 CZUHPNLXLWMYMG-UBHSHLNASA-N 0.000 description 1
- YTMKMRSYXHBGER-IHRRRGAJSA-N Arg-Phe-Asn Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCCN=C(N)N)N YTMKMRSYXHBGER-IHRRRGAJSA-N 0.000 description 1
- NGYHSXDNNOFHNE-AVGNSLFASA-N Arg-Pro-Leu Chemical compound [H]N[C@@H](CCCNC(N)=N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(C)C)C(O)=O NGYHSXDNNOFHNE-AVGNSLFASA-N 0.000 description 1
- AOJYORNRFWWEIV-IHRRRGAJSA-N Arg-Tyr-Asp Chemical compound NC(N)=NCCC[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CC(O)=O)C(O)=O)CC1=CC=C(O)C=C1 AOJYORNRFWWEIV-IHRRRGAJSA-N 0.000 description 1
- XWGJDUSDTRPQRK-ZLUOBGJFSA-N Asn-Ala-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC(N)=O XWGJDUSDTRPQRK-ZLUOBGJFSA-N 0.000 description 1
- MEFGKQUUYZOLHM-GMOBBJLQSA-N Asn-Arg-Ile Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O MEFGKQUUYZOLHM-GMOBBJLQSA-N 0.000 description 1
- CQMQJWRCRQSBAF-BPUTZDHNSA-N Asn-Arg-Trp Chemical compound C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CC(=O)N)N CQMQJWRCRQSBAF-BPUTZDHNSA-N 0.000 description 1
- WPOLSNAQGVHROR-GUBZILKMSA-N Asn-Gln-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CC(=O)N)N WPOLSNAQGVHROR-GUBZILKMSA-N 0.000 description 1
- HCAUEJAQCXVQQM-ACZMJKKPSA-N Asn-Glu-Asp Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O HCAUEJAQCXVQQM-ACZMJKKPSA-N 0.000 description 1
- QYXNFROWLZPWPC-FXQIFTODSA-N Asn-Glu-Gln Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O QYXNFROWLZPWPC-FXQIFTODSA-N 0.000 description 1
- BZMWJLLUAKSIMH-FXQIFTODSA-N Asn-Glu-Glu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O BZMWJLLUAKSIMH-FXQIFTODSA-N 0.000 description 1
- OPEPUCYIGFEGSW-WDSKDSINSA-N Asn-Gly-Glu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)NCC(=O)N[C@@H](CCC(O)=O)C(O)=O OPEPUCYIGFEGSW-WDSKDSINSA-N 0.000 description 1
- FTCGGKNCJZOPNB-WHFBIAKZSA-N Asn-Gly-Ser Chemical compound NC(=O)C[C@H](N)C(=O)NCC(=O)N[C@@H](CO)C(O)=O FTCGGKNCJZOPNB-WHFBIAKZSA-N 0.000 description 1
- LVHMEJJWEXBMKK-GMOBBJLQSA-N Asn-Ile-Met Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)O)NC(=O)[C@H](CC(=O)N)N LVHMEJJWEXBMKK-GMOBBJLQSA-N 0.000 description 1
- DJIMLSXHXKWADV-CIUDSAMLSA-N Asn-Leu-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC(N)=O DJIMLSXHXKWADV-CIUDSAMLSA-N 0.000 description 1
- GIQCDTKOIPUDSG-GARJFASQSA-N Asn-Lys-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CCCCN)NC(=O)[C@H](CC(=O)N)N)C(=O)O GIQCDTKOIPUDSG-GARJFASQSA-N 0.000 description 1
- LSJQOMAZIKQMTJ-SRVKXCTJSA-N Asn-Phe-Asp Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(O)=O)C(O)=O LSJQOMAZIKQMTJ-SRVKXCTJSA-N 0.000 description 1
- RAUPFUCUDBQYHE-AVGNSLFASA-N Asn-Phe-Glu Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(O)=O)C(O)=O RAUPFUCUDBQYHE-AVGNSLFASA-N 0.000 description 1
- YUUIAUXBNOHFRJ-IHRRRGAJSA-N Asn-Phe-Met Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCSC)C(O)=O YUUIAUXBNOHFRJ-IHRRRGAJSA-N 0.000 description 1
- FTNRWCPWDWRPAV-BZSNNMDCSA-N Asn-Phe-Phe Chemical compound C([C@H](NC(=O)[C@H](CC(N)=O)N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=CC=C1 FTNRWCPWDWRPAV-BZSNNMDCSA-N 0.000 description 1
- NJSNXIOKBHPFMB-GMOBBJLQSA-N Asn-Pro-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CC(=O)N)N NJSNXIOKBHPFMB-GMOBBJLQSA-N 0.000 description 1
- KTDWFWNZLLFEFU-KKUMJFAQSA-N Asn-Tyr-His Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)NC(=O)[C@H](CC(=O)N)N)O KTDWFWNZLLFEFU-KKUMJFAQSA-N 0.000 description 1
- CGYKCTPUGXFPMG-IHPCNDPISA-N Asn-Tyr-Trp Chemical compound [H]N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(O)=O CGYKCTPUGXFPMG-IHPCNDPISA-N 0.000 description 1
- XBQSLMACWDXWLJ-GHCJXIJMSA-N Asp-Ala-Ile Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O XBQSLMACWDXWLJ-GHCJXIJMSA-N 0.000 description 1
- XPGVTUBABLRGHY-BIIVOSGPSA-N Asp-Ala-Pro Chemical compound C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC(=O)O)N XPGVTUBABLRGHY-BIIVOSGPSA-N 0.000 description 1
- BLQBMRNMBAYREH-UWJYBYFXSA-N Asp-Ala-Tyr Chemical compound N[C@@H](CC(O)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)O BLQBMRNMBAYREH-UWJYBYFXSA-N 0.000 description 1
- MFMJRYHVLLEMQM-DCAQKATOSA-N Asp-Arg-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CC(=O)O)N MFMJRYHVLLEMQM-DCAQKATOSA-N 0.000 description 1
- SDHFVYLZFBDSQT-DCAQKATOSA-N Asp-Arg-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CC(=O)O)N SDHFVYLZFBDSQT-DCAQKATOSA-N 0.000 description 1
- UGKZHCBLMLSANF-CIUDSAMLSA-N Asp-Asn-Leu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(O)=O UGKZHCBLMLSANF-CIUDSAMLSA-N 0.000 description 1
- UGIBTKGQVWFTGX-BIIVOSGPSA-N Asp-Asn-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC(=O)N)NC(=O)[C@H](CC(=O)O)N)C(=O)O UGIBTKGQVWFTGX-BIIVOSGPSA-N 0.000 description 1
- KNMRXHIAVXHCLW-ZLUOBGJFSA-N Asp-Asn-Ser Chemical compound C([C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CO)C(=O)O)N)C(=O)O KNMRXHIAVXHCLW-ZLUOBGJFSA-N 0.000 description 1
- RDRMWJBLOSRRAW-BYULHYEWSA-N Asp-Asn-Val Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](C(C)C)C(O)=O RDRMWJBLOSRRAW-BYULHYEWSA-N 0.000 description 1
- SBHUBSDEZQFJHJ-CIUDSAMLSA-N Asp-Asp-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CC(O)=O SBHUBSDEZQFJHJ-CIUDSAMLSA-N 0.000 description 1
- CELPEWWLSXMVPH-CIUDSAMLSA-N Asp-Asp-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CC(O)=O CELPEWWLSXMVPH-CIUDSAMLSA-N 0.000 description 1
- QXHVOUSPVAWEMX-ZLUOBGJFSA-N Asp-Asp-Ser Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O QXHVOUSPVAWEMX-ZLUOBGJFSA-N 0.000 description 1
- DXQOQMCLWWADMU-ACZMJKKPSA-N Asp-Gln-Ser Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(O)=O DXQOQMCLWWADMU-ACZMJKKPSA-N 0.000 description 1
- XJQRWGXKUSDEFI-ACZMJKKPSA-N Asp-Glu-Asn Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O XJQRWGXKUSDEFI-ACZMJKKPSA-N 0.000 description 1
- DGKCOYGQLNWNCJ-ACZMJKKPSA-N Asp-Glu-Ser Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(O)=O DGKCOYGQLNWNCJ-ACZMJKKPSA-N 0.000 description 1
- RRKCPMGSRIDLNC-AVGNSLFASA-N Asp-Glu-Tyr Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O RRKCPMGSRIDLNC-AVGNSLFASA-N 0.000 description 1
- WBDWQKRLTVCDSY-WHFBIAKZSA-N Asp-Gly-Asp Chemical compound OC(=O)C[C@H](N)C(=O)NCC(=O)N[C@@H](CC(O)=O)C(O)=O WBDWQKRLTVCDSY-WHFBIAKZSA-N 0.000 description 1
- TVIZQBFURPLQDV-DJFWLOJKSA-N Asp-His-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CN=CN1)NC(=O)[C@H](CC(=O)O)N TVIZQBFURPLQDV-DJFWLOJKSA-N 0.000 description 1
- CYCKJEFVFNRWEZ-UGYAYLCHSA-N Asp-Ile-Asn Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(N)=O)C(O)=O CYCKJEFVFNRWEZ-UGYAYLCHSA-N 0.000 description 1
- RTXQQDVBACBSCW-CFMVVWHZSA-N Asp-Ile-Tyr Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O RTXQQDVBACBSCW-CFMVVWHZSA-N 0.000 description 1
- KYQNAIMCTRZLNP-QSFUFRPTSA-N Asp-Ile-Val Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C(C)C)C(O)=O KYQNAIMCTRZLNP-QSFUFRPTSA-N 0.000 description 1
- JNNVNVRBYUJYGS-CIUDSAMLSA-N Asp-Leu-Ala Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(O)=O JNNVNVRBYUJYGS-CIUDSAMLSA-N 0.000 description 1
- CLUMZOKVGUWUFD-CIUDSAMLSA-N Asp-Leu-Asn Chemical compound OC(=O)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(O)=O CLUMZOKVGUWUFD-CIUDSAMLSA-N 0.000 description 1
- IVPNEDNYYYFAGI-GARJFASQSA-N Asp-Leu-Pro Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CC(=O)O)N IVPNEDNYYYFAGI-GARJFASQSA-N 0.000 description 1
- UMHUHHJMEXNSIV-CIUDSAMLSA-N Asp-Leu-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC(O)=O UMHUHHJMEXNSIV-CIUDSAMLSA-N 0.000 description 1
- XWSIYTYNLKCLJB-CIUDSAMLSA-N Asp-Lys-Asn Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(O)=O XWSIYTYNLKCLJB-CIUDSAMLSA-N 0.000 description 1
- GKWFMNNNYZHJHV-SRVKXCTJSA-N Asp-Lys-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CC(O)=O GKWFMNNNYZHJHV-SRVKXCTJSA-N 0.000 description 1
- NZWDWXSWUQCNMG-GARJFASQSA-N Asp-Lys-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CCCCN)NC(=O)[C@H](CC(=O)O)N)C(=O)O NZWDWXSWUQCNMG-GARJFASQSA-N 0.000 description 1
- GYWQGGUCMDCUJE-DLOVCJGASA-N Asp-Phe-Ala Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](C)C(O)=O GYWQGGUCMDCUJE-DLOVCJGASA-N 0.000 description 1
- IDDMGSKZQDEDGA-SRVKXCTJSA-N Asp-Phe-Asn Chemical compound OC(=O)C[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CC(N)=O)C(O)=O)CC1=CC=CC=C1 IDDMGSKZQDEDGA-SRVKXCTJSA-N 0.000 description 1
- LTCKTLYKRMCFOC-KKUMJFAQSA-N Asp-Phe-Leu Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(O)=O LTCKTLYKRMCFOC-KKUMJFAQSA-N 0.000 description 1
- UAXIKORUDGGIGA-DCAQKATOSA-N Asp-Pro-Lys Chemical compound C1C[C@H](N(C1)C(=O)[C@H](CC(=O)O)N)C(=O)N[C@@H](CCCCN)C(=O)O UAXIKORUDGGIGA-DCAQKATOSA-N 0.000 description 1
- XXAMCEGRCZQGEM-ZLUOBGJFSA-N Asp-Ser-Asn Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(N)=O)C(O)=O XXAMCEGRCZQGEM-ZLUOBGJFSA-N 0.000 description 1
- JDDYEZGPYBBPBN-JRQIVUDYSA-N Asp-Thr-Tyr Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O JDDYEZGPYBBPBN-JRQIVUDYSA-N 0.000 description 1
- SQIARYGNVQWOSB-BZSNNMDCSA-N Asp-Tyr-Phe Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O SQIARYGNVQWOSB-BZSNNMDCSA-N 0.000 description 1
- WAEDSQFVZJUHLI-BYULHYEWSA-N Asp-Val-Asp Chemical compound [H]N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(O)=O)C(O)=O WAEDSQFVZJUHLI-BYULHYEWSA-N 0.000 description 1
- 108700004991 Cas12a Proteins 0.000 description 1
- LZZYPRNAOMGNLH-UHFFFAOYSA-M Cetrimonium bromide Chemical compound [Br-].CCCCCCCCCCCCCCCC[N+](C)(C)C LZZYPRNAOMGNLH-UHFFFAOYSA-M 0.000 description 1
- QFMCHXSGIZPBKG-ZLUOBGJFSA-N Cys-Ala-Asp Chemical compound C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)[C@H](CS)N QFMCHXSGIZPBKG-ZLUOBGJFSA-N 0.000 description 1
- GRNOCLDFUNCIDW-ACZMJKKPSA-N Cys-Ala-Glu Chemical compound C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)O)NC(=O)[C@H](CS)N GRNOCLDFUNCIDW-ACZMJKKPSA-N 0.000 description 1
- LDIKUWLAMDFHPU-FXQIFTODSA-N Cys-Cys-Arg Chemical compound [H]N[C@@H](CS)C(=O)N[C@@H](CS)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O LDIKUWLAMDFHPU-FXQIFTODSA-N 0.000 description 1
- ZOMMHASZJQRLFS-IHRRRGAJSA-N Cys-Tyr-Val Chemical compound CC(C)[C@@H](C(=O)O)NC(=O)[C@H](CC1=CC=C(C=C1)O)NC(=O)[C@H](CS)N ZOMMHASZJQRLFS-IHRRRGAJSA-N 0.000 description 1
- 230000005778 DNA damage Effects 0.000 description 1
- 231100000277 DNA damage Toxicity 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 241000252212 Danio rerio Species 0.000 description 1
- 108010042407 Endonucleases Proteins 0.000 description 1
- 102000004533 Endonucleases Human genes 0.000 description 1
- PHZYLYASFWHLHJ-FXQIFTODSA-N Gln-Asn-Glu Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O PHZYLYASFWHLHJ-FXQIFTODSA-N 0.000 description 1
- ZPDVKYLJTOFQJV-WDSKDSINSA-N Gln-Asn-Gly Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(O)=O ZPDVKYLJTOFQJV-WDSKDSINSA-N 0.000 description 1
- XEYMBRRKIFYQMF-GUBZILKMSA-N Gln-Asp-Leu Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O XEYMBRRKIFYQMF-GUBZILKMSA-N 0.000 description 1
- KVXVVDFOZNYYKZ-DCAQKATOSA-N Gln-Gln-Leu Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(C)C)C(O)=O KVXVVDFOZNYYKZ-DCAQKATOSA-N 0.000 description 1
- SNLOOPZHAQDMJG-CIUDSAMLSA-N Gln-Glu-Glu Chemical compound NC(=O)CC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O SNLOOPZHAQDMJG-CIUDSAMLSA-N 0.000 description 1
- NNXIQPMZGZUFJJ-AVGNSLFASA-N Gln-His-Lys Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CCC(=O)N)N NNXIQPMZGZUFJJ-AVGNSLFASA-N 0.000 description 1
- YRWWJCDWLVXTHN-LAEOZQHASA-N Gln-Ile-Gly Chemical compound CC[C@H](C)[C@@H](C(=O)NCC(=O)O)NC(=O)[C@H](CCC(=O)N)N YRWWJCDWLVXTHN-LAEOZQHASA-N 0.000 description 1
- GIVHPCWYVWUUSG-HVTMNAMFSA-N Gln-Ile-His Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)NC(=O)[C@H](CCC(=O)N)N GIVHPCWYVWUUSG-HVTMNAMFSA-N 0.000 description 1
- FTIJVMLAGRAYMJ-MNXVOIDGSA-N Gln-Ile-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H](N)CCC(N)=O FTIJVMLAGRAYMJ-MNXVOIDGSA-N 0.000 description 1
- VZRAXPGTUNDIDK-GUBZILKMSA-N Gln-Leu-Asn Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H](CCC(=O)N)N VZRAXPGTUNDIDK-GUBZILKMSA-N 0.000 description 1
- CAXXTYYGFYTBPV-IUCAKERBSA-N Gln-Leu-Gly Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)NCC(O)=O CAXXTYYGFYTBPV-IUCAKERBSA-N 0.000 description 1
- PSERKXGRRADTKA-MNXVOIDGSA-N Gln-Leu-Ile Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O PSERKXGRRADTKA-MNXVOIDGSA-N 0.000 description 1
- IULKWYSYZSURJK-AVGNSLFASA-N Gln-Leu-Lys Chemical compound NC(=O)CC[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(O)=O IULKWYSYZSURJK-AVGNSLFASA-N 0.000 description 1
- ZBKUIQNCRIYVGH-SDDRHHMPSA-N Gln-Leu-Pro Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCC(=O)N)N ZBKUIQNCRIYVGH-SDDRHHMPSA-N 0.000 description 1
- OSCLNNWLKKIQJM-WDSKDSINSA-N Gln-Ser-Gly Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(=O)NCC(O)=O OSCLNNWLKKIQJM-WDSKDSINSA-N 0.000 description 1
- ININBLZFFVOQIO-JHEQGTHGSA-N Gln-Thr-Gly Chemical compound C[C@H]([C@@H](C(=O)NCC(=O)O)NC(=O)[C@H](CCC(=O)N)N)O ININBLZFFVOQIO-JHEQGTHGSA-N 0.000 description 1
- WTJIWXMJESRHMM-XDTLVQLUSA-N Gln-Tyr-Ala Chemical compound [H]N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](C)C(O)=O WTJIWXMJESRHMM-XDTLVQLUSA-N 0.000 description 1
- MKRDNSWGJWTBKZ-GVXVVHGQSA-N Gln-Val-Lys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CCC(=O)N)N MKRDNSWGJWTBKZ-GVXVVHGQSA-N 0.000 description 1
- NLKVNZUFDPWPNL-YUMQZZPRSA-N Glu-Arg-Gly Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)NCC(O)=O NLKVNZUFDPWPNL-YUMQZZPRSA-N 0.000 description 1
- YKLNMGJYMNPBCP-ACZMJKKPSA-N Glu-Asn-Asp Chemical compound C(CC(=O)O)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC(=O)O)C(=O)O)N YKLNMGJYMNPBCP-ACZMJKKPSA-N 0.000 description 1
- CKRUHITYRFNUKW-WDSKDSINSA-N Glu-Asn-Gly Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(O)=O CKRUHITYRFNUKW-WDSKDSINSA-N 0.000 description 1
- RDPOETHPAQEGDP-ACZMJKKPSA-N Glu-Asp-Ala Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C)C(O)=O RDPOETHPAQEGDP-ACZMJKKPSA-N 0.000 description 1
- HJIFPJUEOGZWRI-GUBZILKMSA-N Glu-Asp-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CCC(=O)O)N HJIFPJUEOGZWRI-GUBZILKMSA-N 0.000 description 1
- PAQUJCSYVIBPLC-AVGNSLFASA-N Glu-Asp-Phe Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 PAQUJCSYVIBPLC-AVGNSLFASA-N 0.000 description 1
- JRCUFCXYZLPSDZ-ACZMJKKPSA-N Glu-Asp-Ser Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O JRCUFCXYZLPSDZ-ACZMJKKPSA-N 0.000 description 1
- ZXQPJYWZSFGWJB-AVGNSLFASA-N Glu-Cys-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CS)NC(=O)[C@H](CCC(=O)O)N ZXQPJYWZSFGWJB-AVGNSLFASA-N 0.000 description 1
- UMIRPYLZFKOEOH-YVNDNENWSA-N Glu-Gln-Ile Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O UMIRPYLZFKOEOH-YVNDNENWSA-N 0.000 description 1
- IFZWDJWERARYFC-WNHJNPCNSA-N Glu-Glu-Gln-Trp Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CCC(O)=O)N)C(O)=O)=CNC2=C1 IFZWDJWERARYFC-WNHJNPCNSA-N 0.000 description 1
- SJPMNHCEWPTRBR-BQBZGAKWSA-N Glu-Glu-Gly Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)NCC(O)=O SJPMNHCEWPTRBR-BQBZGAKWSA-N 0.000 description 1
- KRGZZKWSBGPLKL-IUCAKERBSA-N Glu-Gly-Lys Chemical compound C(CCN)C[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CCC(=O)O)N KRGZZKWSBGPLKL-IUCAKERBSA-N 0.000 description 1
- VOORMNJKNBGYGK-YUMQZZPRSA-N Glu-Gly-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)CNC(=O)[C@H](CCC(=O)O)N VOORMNJKNBGYGK-YUMQZZPRSA-N 0.000 description 1
- XOFYVODYSNKPDK-AVGNSLFASA-N Glu-His-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)NC(=O)[C@H](CCC(=O)O)N XOFYVODYSNKPDK-AVGNSLFASA-N 0.000 description 1
- ZMVCLTGPGWJAEE-JYJNAYRXSA-N Glu-His-Tyr Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)O)NC(=O)[C@H](CC2=CN=CN2)NC(=O)[C@H](CCC(=O)O)N)O ZMVCLTGPGWJAEE-JYJNAYRXSA-N 0.000 description 1
- CXRWMMRLEMVSEH-PEFMBERDSA-N Glu-Ile-Asn Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(N)=O)C(O)=O CXRWMMRLEMVSEH-PEFMBERDSA-N 0.000 description 1
- ITBHUUMCJJQUSC-LAEOZQHASA-N Glu-Ile-Gly Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)NCC(O)=O ITBHUUMCJJQUSC-LAEOZQHASA-N 0.000 description 1
- ZCOJVESMNGBGLF-GRLWGSQLSA-N Glu-Ile-Ile Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O ZCOJVESMNGBGLF-GRLWGSQLSA-N 0.000 description 1
- ZSWGJYOZWBHROQ-RWRJDSDZSA-N Glu-Ile-Thr Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O ZSWGJYOZWBHROQ-RWRJDSDZSA-N 0.000 description 1
- FBEJIDRSQCGFJI-GUBZILKMSA-N Glu-Leu-Ser Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(O)=O FBEJIDRSQCGFJI-GUBZILKMSA-N 0.000 description 1
- NJCALAAIGREHDR-WDCWCFNPSA-N Glu-Leu-Thr Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O NJCALAAIGREHDR-WDCWCFNPSA-N 0.000 description 1
- ILWHFUZZCFYSKT-AVGNSLFASA-N Glu-Lys-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(O)=O ILWHFUZZCFYSKT-AVGNSLFASA-N 0.000 description 1
- ZGEJRLJEAMPEDV-SRVKXCTJSA-N Glu-Lys-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCC(=O)O)N ZGEJRLJEAMPEDV-SRVKXCTJSA-N 0.000 description 1
- AQNYKMCFCCZEEL-JYJNAYRXSA-N Glu-Lys-Tyr Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 AQNYKMCFCCZEEL-JYJNAYRXSA-N 0.000 description 1
- ZQYZDDXTNQXUJH-CIUDSAMLSA-N Glu-Met-Ala Chemical compound C[C@@H](C(=O)O)NC(=O)[C@H](CCSC)NC(=O)[C@H](CCC(=O)O)N ZQYZDDXTNQXUJH-CIUDSAMLSA-N 0.000 description 1
- BXSZPACYCMNKLS-AVGNSLFASA-N Glu-Ser-Phe Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O BXSZPACYCMNKLS-AVGNSLFASA-N 0.000 description 1
- TWYSSILQABLLME-HJGDQZAQSA-N Glu-Thr-Arg Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O TWYSSILQABLLME-HJGDQZAQSA-N 0.000 description 1
- RGJKYNUINKGPJN-RWRJDSDZSA-N Glu-Thr-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CCC(=O)O)N RGJKYNUINKGPJN-RWRJDSDZSA-N 0.000 description 1
- CGWHAXBNGYQBBK-JBACZVJFSA-N Glu-Trp-Tyr Chemical compound C([C@H](NC(=O)[C@H](CC=1C2=CC=CC=C2NC=1)NC(=O)[C@H](CCC(O)=O)N)C(O)=O)C1=CC=C(O)C=C1 CGWHAXBNGYQBBK-JBACZVJFSA-N 0.000 description 1
- MFYLRRCYBBJYPI-JYJNAYRXSA-N Glu-Tyr-Lys Chemical compound C1=CC(=CC=C1C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CCC(=O)O)N)O MFYLRRCYBBJYPI-JYJNAYRXSA-N 0.000 description 1
- KIEICAOUSNYOLM-NRPADANISA-N Glu-Val-Ala Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(O)=O KIEICAOUSNYOLM-NRPADANISA-N 0.000 description 1
- VIPDPMHGICREIS-GVXVVHGQSA-N Glu-Val-Leu Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O VIPDPMHGICREIS-GVXVVHGQSA-N 0.000 description 1
- SOYWRINXUSUWEQ-DLOVCJGASA-N Glu-Val-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CCC(O)=O SOYWRINXUSUWEQ-DLOVCJGASA-N 0.000 description 1
- VSVZIEVNUYDAFR-YUMQZZPRSA-N Gly-Ala-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)CN VSVZIEVNUYDAFR-YUMQZZPRSA-N 0.000 description 1
- QXPRJQPCFXMCIY-NKWVEPMBSA-N Gly-Ala-Pro Chemical compound C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)CN QXPRJQPCFXMCIY-NKWVEPMBSA-N 0.000 description 1
- LJPIRKICOISLKN-WHFBIAKZSA-N Gly-Ala-Ser Chemical compound NCC(=O)N[C@@H](C)C(=O)N[C@@H](CO)C(O)=O LJPIRKICOISLKN-WHFBIAKZSA-N 0.000 description 1
- OVSKVOOUFAKODB-UWVGGRQHSA-N Gly-Arg-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)CN)CCCN=C(N)N OVSKVOOUFAKODB-UWVGGRQHSA-N 0.000 description 1
- AIJAPFVDBFYNKN-WHFBIAKZSA-N Gly-Asn-Asp Chemical compound C([C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)O)NC(=O)CN)C(=O)N AIJAPFVDBFYNKN-WHFBIAKZSA-N 0.000 description 1
- NZAFOTBEULLEQB-WDSKDSINSA-N Gly-Asn-Glu Chemical compound C(CC(=O)O)[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)CN NZAFOTBEULLEQB-WDSKDSINSA-N 0.000 description 1
- XCLCVBYNGXEVDU-WHFBIAKZSA-N Gly-Asn-Ser Chemical compound NCC(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(O)=O XCLCVBYNGXEVDU-WHFBIAKZSA-N 0.000 description 1
- FMNHBTKMRFVGRO-FOHZUACHSA-N Gly-Asn-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)[C@H](CC(N)=O)NC(=O)CN FMNHBTKMRFVGRO-FOHZUACHSA-N 0.000 description 1
- FUTAPPOITCCWTH-WHFBIAKZSA-N Gly-Asp-Asp Chemical compound [H]NCC(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O FUTAPPOITCCWTH-WHFBIAKZSA-N 0.000 description 1
- XBWMTPAIUQIWKA-BYULHYEWSA-N Gly-Asp-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)CN XBWMTPAIUQIWKA-BYULHYEWSA-N 0.000 description 1
- JUGQPPOVWXSPKJ-RYUDHWBXSA-N Gly-Gln-Phe Chemical compound [H]NCC(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O JUGQPPOVWXSPKJ-RYUDHWBXSA-N 0.000 description 1
- HDNXXTBKOJKWNN-WDSKDSINSA-N Gly-Glu-Asn Chemical compound NCC(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O HDNXXTBKOJKWNN-WDSKDSINSA-N 0.000 description 1
- YYPFZVIXAVDHIK-IUCAKERBSA-N Gly-Glu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)CN YYPFZVIXAVDHIK-IUCAKERBSA-N 0.000 description 1
- ZQIMMEYPEXIYBB-IUCAKERBSA-N Gly-Glu-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)CN ZQIMMEYPEXIYBB-IUCAKERBSA-N 0.000 description 1
- CCQOOWAONKGYKQ-BYPYZUCNSA-N Gly-Gly-Ala Chemical compound OC(=O)[C@H](C)NC(=O)CNC(=O)CN CCQOOWAONKGYKQ-BYPYZUCNSA-N 0.000 description 1
- XPJBQTCXPJNIFE-ZETCQYMHSA-N Gly-Gly-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)CNC(=O)CN XPJBQTCXPJNIFE-ZETCQYMHSA-N 0.000 description 1
- QSVMIMFAAZPCAQ-PMVVWTBXSA-N Gly-His-Thr Chemical compound [H]NCC(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H]([C@@H](C)O)C(O)=O QSVMIMFAAZPCAQ-PMVVWTBXSA-N 0.000 description 1
- SXJHOPPTOJACOA-QXEWZRGKSA-N Gly-Ile-Arg Chemical compound NCC(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@H](C(O)=O)CCCN=C(N)N SXJHOPPTOJACOA-QXEWZRGKSA-N 0.000 description 1
- SCWYHUQOOFRVHP-MBLNEYKQSA-N Gly-Ile-Thr Chemical compound NCC(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(O)=O SCWYHUQOOFRVHP-MBLNEYKQSA-N 0.000 description 1
- LHYJCVCQPWRMKZ-WEDXCCLWSA-N Gly-Leu-Thr Chemical compound [H]NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O LHYJCVCQPWRMKZ-WEDXCCLWSA-N 0.000 description 1
- AFWYPMDMDYCKMD-KBPBESRZSA-N Gly-Leu-Tyr Chemical compound NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 AFWYPMDMDYCKMD-KBPBESRZSA-N 0.000 description 1
- FHQRLHFYVZAQHU-IUCAKERBSA-N Gly-Lys-Gln Chemical compound [H]NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(N)=O)C(O)=O FHQRLHFYVZAQHU-IUCAKERBSA-N 0.000 description 1
- NTBOEZICHOSJEE-YUMQZZPRSA-N Gly-Lys-Ser Chemical compound [H]NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O NTBOEZICHOSJEE-YUMQZZPRSA-N 0.000 description 1
- FXGRXIATVXUAHO-WEDXCCLWSA-N Gly-Lys-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)CN)CCCCN FXGRXIATVXUAHO-WEDXCCLWSA-N 0.000 description 1
- GAFKBWKVXNERFA-QWRGUYRKSA-N Gly-Phe-Asp Chemical compound OC(=O)C[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)CN)CC1=CC=CC=C1 GAFKBWKVXNERFA-QWRGUYRKSA-N 0.000 description 1
- GAAHQHNCMIAYEX-UWVGGRQHSA-N Gly-Pro-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)CN GAAHQHNCMIAYEX-UWVGGRQHSA-N 0.000 description 1
- ABPRMMYHROQBLY-NKWVEPMBSA-N Gly-Ser-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CO)NC(=O)CN)C(=O)O ABPRMMYHROQBLY-NKWVEPMBSA-N 0.000 description 1
- DBUNZBWUWCIELX-JHEQGTHGSA-N Gly-Thr-Glu Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(O)=O)C(O)=O DBUNZBWUWCIELX-JHEQGTHGSA-N 0.000 description 1
- RCHFYMASWAZQQZ-ZANVPECISA-N Gly-Trp-Ala Chemical compound C1=CC=C2C(C[C@@H](C(=O)N[C@@H](C)C(O)=O)NC(=O)CN)=CNC2=C1 RCHFYMASWAZQQZ-ZANVPECISA-N 0.000 description 1
- DNVDEMWIYLVIQU-RCOVLWMOSA-N Gly-Val-Asp Chemical compound NCC(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(O)=O)C(O)=O DNVDEMWIYLVIQU-RCOVLWMOSA-N 0.000 description 1
- RYAOJUMWLWUGNW-QMMMGPOBSA-N Gly-Val-Gly Chemical compound NCC(=O)N[C@@H](C(C)C)C(=O)NCC(O)=O RYAOJUMWLWUGNW-QMMMGPOBSA-N 0.000 description 1
- IPIVXQQRZXEUGW-UWJYBYFXSA-N His-Ala-His Chemical compound C([C@H](N)C(=O)N[C@@H](C)C(=O)N[C@@H](CC=1NC=NC=1)C(O)=O)C1=CN=CN1 IPIVXQQRZXEUGW-UWJYBYFXSA-N 0.000 description 1
- XINDHUAGVGCNSF-QSFUFRPTSA-N His-Ala-Ile Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O XINDHUAGVGCNSF-QSFUFRPTSA-N 0.000 description 1
- SVHKVHBPTOMLTO-DCAQKATOSA-N His-Arg-Asp Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(O)=O)C(O)=O SVHKVHBPTOMLTO-DCAQKATOSA-N 0.000 description 1
- MVADCDSCFTXCBT-CIUDSAMLSA-N His-Asp-Asp Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O MVADCDSCFTXCBT-CIUDSAMLSA-N 0.000 description 1
- ZJSMFRTVYSLKQU-DJFWLOJKSA-N His-Asp-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CC1=CN=CN1)N ZJSMFRTVYSLKQU-DJFWLOJKSA-N 0.000 description 1
- IMCHNUANCIGUKS-SRVKXCTJSA-N His-Glu-Arg Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O IMCHNUANCIGUKS-SRVKXCTJSA-N 0.000 description 1
- XMENRVZYPBKBIL-AVGNSLFASA-N His-Glu-His Chemical compound N[C@@H](Cc1cnc[nH]1)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](Cc1cnc[nH]1)C(O)=O XMENRVZYPBKBIL-AVGNSLFASA-N 0.000 description 1
- JCOSMKPAOYDKRO-AVGNSLFASA-N His-Glu-Lys Chemical compound C1=C(NC=N1)C[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N JCOSMKPAOYDKRO-AVGNSLFASA-N 0.000 description 1
- ZSKJIISDJXJQPV-BZSNNMDCSA-N His-Leu-Phe Chemical compound C([C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CN=CN1 ZSKJIISDJXJQPV-BZSNNMDCSA-N 0.000 description 1
- SVVULKPWDBIPCO-BZSNNMDCSA-N His-Phe-Leu Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(O)=O SVVULKPWDBIPCO-BZSNNMDCSA-N 0.000 description 1
- WCHONUZTYDQMBY-PYJNHQTQSA-N His-Pro-Ile Chemical compound [H]N[C@@H](CC1=CNC=N1)C(=O)N1CCC[C@H]1C(=O)N[C@@H]([C@@H](C)CC)C(O)=O WCHONUZTYDQMBY-PYJNHQTQSA-N 0.000 description 1
- 241000235789 Hyperoartia Species 0.000 description 1
- LQSBBHNVAVNZSX-GHCJXIJMSA-N Ile-Ala-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](C)C(=O)N[C@@H](CC(=O)N)C(=O)O)N LQSBBHNVAVNZSX-GHCJXIJMSA-N 0.000 description 1
- YOTNPRLPIPHQSB-XUXIUFHCSA-N Ile-Arg-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCCCN)C(=O)O)N YOTNPRLPIPHQSB-XUXIUFHCSA-N 0.000 description 1
- NBJAAWYRLGCJOF-UGYAYLCHSA-N Ile-Asp-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CC(=O)N)C(=O)O)N NBJAAWYRLGCJOF-UGYAYLCHSA-N 0.000 description 1
- KUHFPGIVBOCRMV-MNXVOIDGSA-N Ile-Gln-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CC(C)C)C(=O)O)N KUHFPGIVBOCRMV-MNXVOIDGSA-N 0.000 description 1
- LKACSKJPTFSBHR-MNXVOIDGSA-N Ile-Gln-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CCCCN)C(=O)O)N LKACSKJPTFSBHR-MNXVOIDGSA-N 0.000 description 1
- YBJWJQQBWRARLT-KBIXCLLPSA-N Ile-Gln-Ser Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(O)=O YBJWJQQBWRARLT-KBIXCLLPSA-N 0.000 description 1
- MTFVYKQRLXYAQN-LAEOZQHASA-N Ile-Glu-Gly Chemical compound [H]N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCC(O)=O)C(=O)NCC(O)=O MTFVYKQRLXYAQN-LAEOZQHASA-N 0.000 description 1
- PNDMHTTXXPUQJH-RWRJDSDZSA-N Ile-Glu-Thr Chemical compound N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H]([C@H](O)C)C(=O)O PNDMHTTXXPUQJH-RWRJDSDZSA-N 0.000 description 1
- KFVUBLZRFSVDGO-BYULHYEWSA-N Ile-Gly-Asp Chemical compound CC[C@H](C)[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CC(O)=O KFVUBLZRFSVDGO-BYULHYEWSA-N 0.000 description 1
- NYEYYMLUABXDMC-NHCYSSNCSA-N Ile-Gly-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)O)N NYEYYMLUABXDMC-NHCYSSNCSA-N 0.000 description 1
- JLWLMGADIQFKRD-QSFUFRPTSA-N Ile-His-Ala Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](C)C(O)=O)CC1=CN=CN1 JLWLMGADIQFKRD-QSFUFRPTSA-N 0.000 description 1
- HYLIOBDWPQNLKI-HVTMNAMFSA-N Ile-His-Gln Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CN=CN1)C(=O)N[C@@H](CCC(=O)N)C(=O)O)N HYLIOBDWPQNLKI-HVTMNAMFSA-N 0.000 description 1
- SJLVSMMIFYTSGY-GRLWGSQLSA-N Ile-Ile-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N SJLVSMMIFYTSGY-GRLWGSQLSA-N 0.000 description 1
- HUWYGQOISIJNMK-SIGLWIIPSA-N Ile-Ile-His Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N HUWYGQOISIJNMK-SIGLWIIPSA-N 0.000 description 1
- PKGGWLOLRLOPGK-XUXIUFHCSA-N Ile-Leu-Arg Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(O)=O)CCCN=C(N)N PKGGWLOLRLOPGK-XUXIUFHCSA-N 0.000 description 1
- OUUCIIJSBIBCHB-ZPFDUUQYSA-N Ile-Leu-Asp Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(O)=O OUUCIIJSBIBCHB-ZPFDUUQYSA-N 0.000 description 1
- HUORUFRRJHELPD-MNXVOIDGSA-N Ile-Leu-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N HUORUFRRJHELPD-MNXVOIDGSA-N 0.000 description 1
- DBXXASNNDTXOLU-MXAVVETBSA-N Ile-Leu-His Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CN=CN1)C(=O)O)N DBXXASNNDTXOLU-MXAVVETBSA-N 0.000 description 1
- DSDPLOODKXISDT-XUXIUFHCSA-N Ile-Leu-Val Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(O)=O DSDPLOODKXISDT-XUXIUFHCSA-N 0.000 description 1
- ADDYYRVQQZFIMW-MNXVOIDGSA-N Ile-Lys-Glu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(=O)O)C(=O)O)N ADDYYRVQQZFIMW-MNXVOIDGSA-N 0.000 description 1
- GVNNAHIRSDRIII-AJNGGQMLSA-N Ile-Lys-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)O)N GVNNAHIRSDRIII-AJNGGQMLSA-N 0.000 description 1
- GLYJPWIRLBAIJH-UHFFFAOYSA-N Ile-Lys-Pro Natural products CCC(C)C(N)C(=O)NC(CCCCN)C(=O)N1CCCC1C(O)=O GLYJPWIRLBAIJH-UHFFFAOYSA-N 0.000 description 1
- MSASLZGZQAXVFP-PEDHHIEDSA-N Ile-Met-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H]([C@@H](C)CC)C(=O)O)N MSASLZGZQAXVFP-PEDHHIEDSA-N 0.000 description 1
- CIDLJWVDMNDKPT-FIRPJDEBSA-N Ile-Phe-Phe Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC2=CC=CC=C2)C(=O)O)N CIDLJWVDMNDKPT-FIRPJDEBSA-N 0.000 description 1
- BJECXJHLUJXPJQ-PYJNHQTQSA-N Ile-Pro-His Chemical compound CC[C@H](C)[C@@H](C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)N BJECXJHLUJXPJQ-PYJNHQTQSA-N 0.000 description 1
- ZLFNNVATRMCAKN-ZKWXMUAHSA-N Ile-Ser-Gly Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CO)C(=O)NCC(=O)O)N ZLFNNVATRMCAKN-ZKWXMUAHSA-N 0.000 description 1
- KBDIBHQICWDGDL-PPCPHDFISA-N Ile-Thr-Leu Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)O)N KBDIBHQICWDGDL-PPCPHDFISA-N 0.000 description 1
- JSLIXOUMAOUGBN-JUKXBJQTSA-N Ile-Tyr-His Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)N JSLIXOUMAOUGBN-JUKXBJQTSA-N 0.000 description 1
- DLEBSGAVWRPTIX-PEDHHIEDSA-N Ile-Val-Ile Chemical compound CC[C@H](C)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)[C@@H](C)CC DLEBSGAVWRPTIX-PEDHHIEDSA-N 0.000 description 1
- 108010065920 Insulin Lispro Proteins 0.000 description 1
- SENJXOPIZNYLHU-UHFFFAOYSA-N L-leucyl-L-arginine Natural products CC(C)CC(N)C(=O)NC(C(O)=O)CCCN=C(N)N SENJXOPIZNYLHU-UHFFFAOYSA-N 0.000 description 1
- LJHGALIOHLRRQN-DCAQKATOSA-N Leu-Ala-Arg Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CCCN=C(N)N LJHGALIOHLRRQN-DCAQKATOSA-N 0.000 description 1
- MJOZZTKJZQFKDK-GUBZILKMSA-N Leu-Ala-Gln Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CCC(N)=O MJOZZTKJZQFKDK-GUBZILKMSA-N 0.000 description 1
- WNGVUZWBXZKQES-YUMQZZPRSA-N Leu-Ala-Gly Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C)C(=O)NCC(O)=O WNGVUZWBXZKQES-YUMQZZPRSA-N 0.000 description 1
- YOZCKMXHBYKOMQ-IHRRRGAJSA-N Leu-Arg-Lys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCCCN)C(=O)O)N YOZCKMXHBYKOMQ-IHRRRGAJSA-N 0.000 description 1
- WGNOPSQMIQERPK-UHFFFAOYSA-N Leu-Asn-Pro Natural products CC(C)CC(N)C(=O)NC(CC(=O)N)C(=O)N1CCCC1C(=O)O WGNOPSQMIQERPK-UHFFFAOYSA-N 0.000 description 1
- TWQIYNGNYNJUFM-NHCYSSNCSA-N Leu-Asn-Val Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](C(C)C)C(O)=O TWQIYNGNYNJUFM-NHCYSSNCSA-N 0.000 description 1
- PVMPDMIKUVNOBD-CIUDSAMLSA-N Leu-Asp-Ser Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(O)=O PVMPDMIKUVNOBD-CIUDSAMLSA-N 0.000 description 1
- FQZPTCNSNPWHLJ-AVGNSLFASA-N Leu-Gln-Lys Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(O)=O FQZPTCNSNPWHLJ-AVGNSLFASA-N 0.000 description 1
- HQUXQAMSWFIRET-AVGNSLFASA-N Leu-Glu-Lys Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@H](C(O)=O)CCCCN HQUXQAMSWFIRET-AVGNSLFASA-N 0.000 description 1
- VGPCJSXPPOQPBK-YUMQZZPRSA-N Leu-Gly-Ser Chemical compound CC(C)C[C@H](N)C(=O)NCC(=O)N[C@@H](CO)C(O)=O VGPCJSXPPOQPBK-YUMQZZPRSA-N 0.000 description 1
- USLNHQZCDQJBOV-ZPFDUUQYSA-N Leu-Ile-Asn Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(N)=O)C(O)=O USLNHQZCDQJBOV-ZPFDUUQYSA-N 0.000 description 1
- SEMUSFOBZGKBGW-YTFOTSKYSA-N Leu-Ile-Ile Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O SEMUSFOBZGKBGW-YTFOTSKYSA-N 0.000 description 1
- HNDWYLYAYNBWMP-AJNGGQMLSA-N Leu-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CC(C)C)N HNDWYLYAYNBWMP-AJNGGQMLSA-N 0.000 description 1
- RXGLHDWAZQECBI-SRVKXCTJSA-N Leu-Leu-Ser Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(O)=O RXGLHDWAZQECBI-SRVKXCTJSA-N 0.000 description 1
- JLWZLIQRYCTYBD-IHRRRGAJSA-N Leu-Lys-Arg Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O JLWZLIQRYCTYBD-IHRRRGAJSA-N 0.000 description 1
- RZXLZBIUTDQHJQ-SRVKXCTJSA-N Leu-Lys-Asp Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(O)=O)C(O)=O RZXLZBIUTDQHJQ-SRVKXCTJSA-N 0.000 description 1
- HVHRPWQEQHIQJF-AVGNSLFASA-N Leu-Lys-Glu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(O)=O HVHRPWQEQHIQJF-AVGNSLFASA-N 0.000 description 1
- LVTJJOJKDCVZGP-QWRGUYRKSA-N Leu-Lys-Gly Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)NCC(O)=O LVTJJOJKDCVZGP-QWRGUYRKSA-N 0.000 description 1
- FKQPWMZLIIATBA-AJNGGQMLSA-N Leu-Lys-Ile Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O FKQPWMZLIIATBA-AJNGGQMLSA-N 0.000 description 1
- DDVHDMSBLRAKNV-IHRRRGAJSA-N Leu-Met-Leu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(C)C)C(O)=O DDVHDMSBLRAKNV-IHRRRGAJSA-N 0.000 description 1
- ZAVCJRJOQKIOJW-KKUMJFAQSA-N Leu-Phe-Asp Chemical compound CC(C)C[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CC(O)=O)C(O)=O)CC1=CC=CC=C1 ZAVCJRJOQKIOJW-KKUMJFAQSA-N 0.000 description 1
- AIRUUHAOKGVJAD-JYJNAYRXSA-N Leu-Phe-Glu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(O)=O)C(O)=O AIRUUHAOKGVJAD-JYJNAYRXSA-N 0.000 description 1
- INCJJHQRZGQLFC-KBPBESRZSA-N Leu-Phe-Gly Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)NCC(O)=O INCJJHQRZGQLFC-KBPBESRZSA-N 0.000 description 1
- DRWMRVFCKKXHCH-BZSNNMDCSA-N Leu-Phe-Leu Chemical compound CC(C)C[C@H]([NH3+])C(=O)N[C@H](C(=O)N[C@@H](CC(C)C)C([O-])=O)CC1=CC=CC=C1 DRWMRVFCKKXHCH-BZSNNMDCSA-N 0.000 description 1
- PJWOOBTYQNNRBF-BZSNNMDCSA-N Leu-Phe-Lys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCCN)C(=O)O)N PJWOOBTYQNNRBF-BZSNNMDCSA-N 0.000 description 1
- YRRCOJOXAJNSAX-IHRRRGAJSA-N Leu-Pro-Lys Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(=O)O)N YRRCOJOXAJNSAX-IHRRRGAJSA-N 0.000 description 1
- IRMLZWSRWSGTOP-CIUDSAMLSA-N Leu-Ser-Ala Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@@H](C)C(O)=O IRMLZWSRWSGTOP-CIUDSAMLSA-N 0.000 description 1
- RGUXWMDNCPMQFB-YUMQZZPRSA-N Leu-Ser-Gly Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CO)C(=O)NCC(O)=O RGUXWMDNCPMQFB-YUMQZZPRSA-N 0.000 description 1
- XOWMDXHFSBCAKQ-SRVKXCTJSA-N Leu-Ser-Leu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CC(C)C XOWMDXHFSBCAKQ-SRVKXCTJSA-N 0.000 description 1
- AMSSKPUHBUQBOQ-SRVKXCTJSA-N Leu-Ser-Lys Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)O)N AMSSKPUHBUQBOQ-SRVKXCTJSA-N 0.000 description 1
- QWWPYKKLXWOITQ-VOAKCMCISA-N Leu-Thr-Leu Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@H](C(O)=O)CC(C)C QWWPYKKLXWOITQ-VOAKCMCISA-N 0.000 description 1
- DAYQSYGBCUKVKT-VOAKCMCISA-N Leu-Thr-Lys Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCCN)C(O)=O DAYQSYGBCUKVKT-VOAKCMCISA-N 0.000 description 1
- ILDSIMPXNFWKLH-KATARQTJSA-N Leu-Thr-Ser Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(O)=O ILDSIMPXNFWKLH-KATARQTJSA-N 0.000 description 1
- YWFZWQKWNDOWPA-XIRDDKMYSA-N Leu-Trp-Asn Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](CC(N)=O)C(O)=O YWFZWQKWNDOWPA-XIRDDKMYSA-N 0.000 description 1
- VJGQRELPQWNURN-JYJNAYRXSA-N Leu-Tyr-Glu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCC(O)=O)C(O)=O VJGQRELPQWNURN-JYJNAYRXSA-N 0.000 description 1
- JGKHAFUAPZCCDU-BZSNNMDCSA-N Leu-Tyr-Leu Chemical compound CC(C)C[C@H]([NH3+])C(=O)N[C@H](C(=O)N[C@@H](CC(C)C)C([O-])=O)CC1=CC=C(O)C=C1 JGKHAFUAPZCCDU-BZSNNMDCSA-N 0.000 description 1
- FBNPMTNBFFAMMH-AVGNSLFASA-N Leu-Val-Arg Chemical compound CC(C)C[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CCCN=C(N)N FBNPMTNBFFAMMH-AVGNSLFASA-N 0.000 description 1
- FBNPMTNBFFAMMH-UHFFFAOYSA-N Leu-Val-Arg Natural products CC(C)CC(N)C(=O)NC(C(C)C)C(=O)NC(C(O)=O)CCCN=C(N)N FBNPMTNBFFAMMH-UHFFFAOYSA-N 0.000 description 1
- AIMGJYMCTAABEN-GVXVVHGQSA-N Leu-Val-Glu Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O AIMGJYMCTAABEN-GVXVVHGQSA-N 0.000 description 1
- LMDVGHQPPPLYAR-IHRRRGAJSA-N Leu-Val-His Chemical compound N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC1=CNC=N1)C(=O)O LMDVGHQPPPLYAR-IHRRRGAJSA-N 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- XFIHDSBIPWEYJJ-YUMQZZPRSA-N Lys-Ala-Gly Chemical compound OC(=O)CNC(=O)[C@H](C)NC(=O)[C@@H](N)CCCCN XFIHDSBIPWEYJJ-YUMQZZPRSA-N 0.000 description 1
- IXHKPDJKKCUKHS-GARJFASQSA-N Lys-Ala-Pro Chemical compound C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCCCN)N IXHKPDJKKCUKHS-GARJFASQSA-N 0.000 description 1
- KNKHAVVBVXKOGX-JXUBOQSCSA-N Lys-Ala-Thr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O KNKHAVVBVXKOGX-JXUBOQSCSA-N 0.000 description 1
- SWWCDAGDQHTKIE-RHYQMDGZSA-N Lys-Arg-Thr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H]([C@@H](C)O)C(O)=O SWWCDAGDQHTKIE-RHYQMDGZSA-N 0.000 description 1
- DGWXCIORNLWGGG-CIUDSAMLSA-N Lys-Asn-Ser Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(O)=O DGWXCIORNLWGGG-CIUDSAMLSA-N 0.000 description 1
- LMVOVCYVZBBWQB-SRVKXCTJSA-N Lys-Asp-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CCCCN LMVOVCYVZBBWQB-SRVKXCTJSA-N 0.000 description 1
- PHHYNOUOUWYQRO-XIRDDKMYSA-N Lys-Asp-Trp Chemical compound C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CCCCN)N PHHYNOUOUWYQRO-XIRDDKMYSA-N 0.000 description 1
- WTZUSCUIVPVCRH-SRVKXCTJSA-N Lys-Gln-Arg Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@H](C(O)=O)CCCN=C(N)N WTZUSCUIVPVCRH-SRVKXCTJSA-N 0.000 description 1
- RZHLIPMZXOEJTL-AVGNSLFASA-N Lys-Gln-Leu Chemical compound CC(C)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CCCCN)N RZHLIPMZXOEJTL-AVGNSLFASA-N 0.000 description 1
- PGBPWPTUOSCNLE-JYJNAYRXSA-N Lys-Gln-Phe Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](CCCCN)N PGBPWPTUOSCNLE-JYJNAYRXSA-N 0.000 description 1
- NNCDAORZCMPZPX-GUBZILKMSA-N Lys-Gln-Ser Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CO)C(=O)O)N NNCDAORZCMPZPX-GUBZILKMSA-N 0.000 description 1
- LLSUNJYOSCOOEB-GUBZILKMSA-N Lys-Glu-Asp Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O LLSUNJYOSCOOEB-GUBZILKMSA-N 0.000 description 1
- KZOHPCYVORJBLG-AVGNSLFASA-N Lys-Glu-His Chemical compound C1=C(NC=N1)C[C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCCCN)N KZOHPCYVORJBLG-AVGNSLFASA-N 0.000 description 1
- WGLAORUKDGRINI-WDCWCFNPSA-N Lys-Glu-Thr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O WGLAORUKDGRINI-WDCWCFNPSA-N 0.000 description 1
- ULUQBUKAPDUKOC-GVXVVHGQSA-N Lys-Glu-Val Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C(C)C)C(O)=O ULUQBUKAPDUKOC-GVXVVHGQSA-N 0.000 description 1
- GQZMPWBZQALKJO-UWVGGRQHSA-N Lys-Gly-Arg Chemical compound [H]N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](CCCNC(N)=N)C(O)=O GQZMPWBZQALKJO-UWVGGRQHSA-N 0.000 description 1
- XNKDCYABMBBEKN-IUCAKERBSA-N Lys-Gly-Gln Chemical compound NCCCC[C@H](N)C(=O)NCC(=O)N[C@H](C(O)=O)CCC(N)=O XNKDCYABMBBEKN-IUCAKERBSA-N 0.000 description 1
- NNKLKUUGESXCBS-KBPBESRZSA-N Lys-Gly-Tyr Chemical compound [H]N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O NNKLKUUGESXCBS-KBPBESRZSA-N 0.000 description 1
- SPCHLZUWJTYZFC-IHRRRGAJSA-N Lys-His-Val Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](C(C)C)C(O)=O SPCHLZUWJTYZFC-IHRRRGAJSA-N 0.000 description 1
- QBEPTBMRQALPEV-MNXVOIDGSA-N Lys-Ile-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H](N)CCCCN QBEPTBMRQALPEV-MNXVOIDGSA-N 0.000 description 1
- JYXBNQOKPRQNQS-YTFOTSKYSA-N Lys-Ile-Ile Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O JYXBNQOKPRQNQS-YTFOTSKYSA-N 0.000 description 1
- QOJDBRUCOXQSSK-AJNGGQMLSA-N Lys-Ile-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCCN)C(O)=O QOJDBRUCOXQSSK-AJNGGQMLSA-N 0.000 description 1
- KEPWSUPUFAPBRF-DKIMLUQUSA-N Lys-Ile-Phe Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O KEPWSUPUFAPBRF-DKIMLUQUSA-N 0.000 description 1
- ONPDTSFZAIWMDI-AVGNSLFASA-N Lys-Leu-Gln Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(O)=O ONPDTSFZAIWMDI-AVGNSLFASA-N 0.000 description 1
- SKRGVGLIRUGANF-AVGNSLFASA-N Lys-Leu-Glu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O SKRGVGLIRUGANF-AVGNSLFASA-N 0.000 description 1
- RBEATVHTWHTHTJ-KKUMJFAQSA-N Lys-Leu-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(O)=O RBEATVHTWHTHTJ-KKUMJFAQSA-N 0.000 description 1
- XIZQPFCRXLUNMK-BZSNNMDCSA-N Lys-Leu-Phe Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)NC(=O)[C@H](CCCCN)N XIZQPFCRXLUNMK-BZSNNMDCSA-N 0.000 description 1
- YPLVCBKEPJPBDQ-MELADBBJSA-N Lys-Leu-Pro Chemical compound CC(C)C[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCCCN)N YPLVCBKEPJPBDQ-MELADBBJSA-N 0.000 description 1
- VUTWYNQUSJWBHO-BZSNNMDCSA-N Lys-Leu-Tyr Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O VUTWYNQUSJWBHO-BZSNNMDCSA-N 0.000 description 1
- XOQMURBBIXRRCR-SRVKXCTJSA-N Lys-Lys-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CCCCN XOQMURBBIXRRCR-SRVKXCTJSA-N 0.000 description 1
- ZJWIXBZTAAJERF-IHRRRGAJSA-N Lys-Lys-Arg Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(O)=O)CCCN=C(N)N ZJWIXBZTAAJERF-IHRRRGAJSA-N 0.000 description 1
- ALGGDNMLQNFVIZ-SRVKXCTJSA-N Lys-Lys-Asp Chemical compound C(CCN)C[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(=O)O)C(=O)O)N ALGGDNMLQNFVIZ-SRVKXCTJSA-N 0.000 description 1
- GAHJXEMYXKLZRQ-AJNGGQMLSA-N Lys-Lys-Ile Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O GAHJXEMYXKLZRQ-AJNGGQMLSA-N 0.000 description 1
- WBSCNDJQPKSPII-KKUMJFAQSA-N Lys-Lys-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(O)=O WBSCNDJQPKSPII-KKUMJFAQSA-N 0.000 description 1
- KJIXWRWPOCKYLD-IHRRRGAJSA-N Lys-Lys-Met Chemical compound CSCC[C@@H](C(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)N KJIXWRWPOCKYLD-IHRRRGAJSA-N 0.000 description 1
- YDDDRTIPNTWGIG-SRVKXCTJSA-N Lys-Lys-Ser Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(O)=O YDDDRTIPNTWGIG-SRVKXCTJSA-N 0.000 description 1
- CNGOEHJCLVCJHN-SRVKXCTJSA-N Lys-Pro-Glu Chemical compound NCCCC[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(O)=O CNGOEHJCLVCJHN-SRVKXCTJSA-N 0.000 description 1
- QBHGXFQJFPWJIH-XUXIUFHCSA-N Lys-Pro-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)CCCCN QBHGXFQJFPWJIH-XUXIUFHCSA-N 0.000 description 1
- GHKXHCMRAUYLBS-CIUDSAMLSA-N Lys-Ser-Asn Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(N)=O)C(O)=O GHKXHCMRAUYLBS-CIUDSAMLSA-N 0.000 description 1
- JMNRXRPBHFGXQX-GUBZILKMSA-N Lys-Ser-Glu Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCC(O)=O JMNRXRPBHFGXQX-GUBZILKMSA-N 0.000 description 1
- SBQDRNOLGSYHQA-YUMQZZPRSA-N Lys-Ser-Gly Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(=O)NCC(O)=O SBQDRNOLGSYHQA-YUMQZZPRSA-N 0.000 description 1
- ZUGVARDEGWMMLK-SRVKXCTJSA-N Lys-Ser-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CCCCN ZUGVARDEGWMMLK-SRVKXCTJSA-N 0.000 description 1
- QVTDVTONTRSQMF-WDCWCFNPSA-N Lys-Thr-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H]([C@H](O)C)NC(=O)[C@@H](N)CCCCN QVTDVTONTRSQMF-WDCWCFNPSA-N 0.000 description 1
- RYOLKFYZBHMYFW-WDSOQIARSA-N Lys-Trp-Arg Chemical compound C1=CC=C2C(C[C@H](NC(=O)[C@@H](N)CCCCN)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O)=CNC2=C1 RYOLKFYZBHMYFW-WDSOQIARSA-N 0.000 description 1
- RQILLQOQXLZTCK-KBPBESRZSA-N Lys-Tyr-Gly Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)NCC(O)=O RQILLQOQXLZTCK-KBPBESRZSA-N 0.000 description 1
- IEIHKHYMBIYQTH-YESZJQIVSA-N Lys-Tyr-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CC=C(C=C2)O)NC(=O)[C@H](CCCCN)N)C(=O)O IEIHKHYMBIYQTH-YESZJQIVSA-N 0.000 description 1
- SQRLLZAQNOQCEG-KKUMJFAQSA-N Lys-Tyr-Ser Chemical compound NCCCC[C@H](N)C(=O)N[C@H](C(=O)N[C@@H](CO)C(O)=O)CC1=CC=C(O)C=C1 SQRLLZAQNOQCEG-KKUMJFAQSA-N 0.000 description 1
- UGCIQUYEJIEHKX-GVXVVHGQSA-N Lys-Val-Glu Chemical compound [H]N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O UGCIQUYEJIEHKX-GVXVVHGQSA-N 0.000 description 1
- VKCPHIOZDWUFSW-ONGXEEELSA-N Lys-Val-Gly Chemical compound OC(=O)CNC(=O)[C@H](C(C)C)NC(=O)[C@@H](N)CCCCN VKCPHIOZDWUFSW-ONGXEEELSA-N 0.000 description 1
- RPWQJSBMXJSCPD-XUXIUFHCSA-N Lys-Val-Ile Chemical compound CC[C@H](C)[C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)CCCCN)C(C)C)C(O)=O RPWQJSBMXJSCPD-XUXIUFHCSA-N 0.000 description 1
- NYTDJEZBAAFLLG-IHRRRGAJSA-N Lys-Val-Lys Chemical compound NCCCC[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCCCN)C(O)=O NYTDJEZBAAFLLG-IHRRRGAJSA-N 0.000 description 1
- OZVXDDFYCQOPFD-XQQFMLRXSA-N Lys-Val-Pro Chemical compound CC(C)[C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCCCN)N OZVXDDFYCQOPFD-XQQFMLRXSA-N 0.000 description 1
- RIPJMCFGQHGHNP-RHYQMDGZSA-N Lys-Val-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](C(C)C)NC(=O)[C@H](CCCCN)N)O RIPJMCFGQHGHNP-RHYQMDGZSA-N 0.000 description 1
- 241000282567 Macaca fascicularis Species 0.000 description 1
- QAHFGYLFLVGBNW-DCAQKATOSA-N Met-Ala-Lys Chemical compound CSCC[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CCCCN QAHFGYLFLVGBNW-DCAQKATOSA-N 0.000 description 1
- WGBMNLCRYKSWAR-DCAQKATOSA-N Met-Asp-Lys Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CCCCN WGBMNLCRYKSWAR-DCAQKATOSA-N 0.000 description 1
- KQBJYJXPZBNEIK-DCAQKATOSA-N Met-Glu-Arg Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@H](C(O)=O)CCCNC(N)=N KQBJYJXPZBNEIK-DCAQKATOSA-N 0.000 description 1
- FZUNSVYYPYJYAP-NAKRPEOUSA-N Met-Ile-Ala Chemical compound [H]N[C@@H](CCSC)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C)C(O)=O FZUNSVYYPYJYAP-NAKRPEOUSA-N 0.000 description 1
- AFFKUNVPPLQUGA-DCAQKATOSA-N Met-Leu-Ala Chemical compound [H]N[C@@H](CCSC)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(O)=O AFFKUNVPPLQUGA-DCAQKATOSA-N 0.000 description 1
- BEZJTLKUMFMITF-AVGNSLFASA-N Met-Lys-Arg Chemical compound CSCC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(O)=O)CCCNC(N)=N BEZJTLKUMFMITF-AVGNSLFASA-N 0.000 description 1
- QEDGNYFHLXXIDC-DCAQKATOSA-N Met-Pro-Gln Chemical compound CSCC[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(N)=O)C(O)=O QEDGNYFHLXXIDC-DCAQKATOSA-N 0.000 description 1
- KSIPKXNIQOWMIC-RCWTZXSCSA-N Met-Thr-Arg Chemical compound CSCC[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@H](C(O)=O)CCCNC(N)=N KSIPKXNIQOWMIC-RCWTZXSCSA-N 0.000 description 1
- KYXDADPHSNFWQX-VEVYYDQMSA-N Met-Thr-Asp Chemical compound CSCC[C@H](N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@H](C(O)=O)CC(O)=O KYXDADPHSNFWQX-VEVYYDQMSA-N 0.000 description 1
- LBSWWNKMVPAXOI-GUBZILKMSA-N Met-Val-Ser Chemical compound CSCC[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CO)C(O)=O LBSWWNKMVPAXOI-GUBZILKMSA-N 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- WUGMRIBZSVSJNP-UHFFFAOYSA-N N-L-alanyl-L-tryptophan Natural products C1=CC=C2C(CC(NC(=O)C(N)C)C(O)=O)=CNC2=C1 WUGMRIBZSVSJNP-UHFFFAOYSA-N 0.000 description 1
- XZFYRXDAULDNFX-UHFFFAOYSA-N N-L-cysteinyl-L-phenylalanine Natural products SCC(N)C(=O)NC(C(O)=O)CC1=CC=CC=C1 XZFYRXDAULDNFX-UHFFFAOYSA-N 0.000 description 1
- PESQCPHRXOFIPX-UHFFFAOYSA-N N-L-methionyl-L-tyrosine Natural products CSCCC(N)C(=O)NC(C(O)=O)CC1=CC=C(O)C=C1 PESQCPHRXOFIPX-UHFFFAOYSA-N 0.000 description 1
- 108010002311 N-glycylglutamic acid Proteins 0.000 description 1
- 101100198830 Oryza sativa subsp. japonica ROC5 gene Proteins 0.000 description 1
- 241000251745 Petromyzon marinus Species 0.000 description 1
- CGOMLCQJEMWMCE-STQMWFEESA-N Phe-Arg-Gly Chemical compound NC(N)=NCCC[C@@H](C(=O)NCC(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 CGOMLCQJEMWMCE-STQMWFEESA-N 0.000 description 1
- HTTYNOXBBOWZTB-SRVKXCTJSA-N Phe-Asn-Asn Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC(=O)N)C(=O)O)N HTTYNOXBBOWZTB-SRVKXCTJSA-N 0.000 description 1
- WMGVYPPIMZPWPN-SRVKXCTJSA-N Phe-Asp-Asn Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CC(=O)N)C(=O)O)N WMGVYPPIMZPWPN-SRVKXCTJSA-N 0.000 description 1
- GDBOREPXIRKSEQ-FHWLQOOXSA-N Phe-Gln-Phe Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O GDBOREPXIRKSEQ-FHWLQOOXSA-N 0.000 description 1
- KYYMILWEGJYPQZ-IHRRRGAJSA-N Phe-Glu-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 KYYMILWEGJYPQZ-IHRRRGAJSA-N 0.000 description 1
- KJJROSNFBRWPHS-JYJNAYRXSA-N Phe-Glu-Leu Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O KJJROSNFBRWPHS-JYJNAYRXSA-N 0.000 description 1
- WPTYDQPGBMDUBI-QWRGUYRKSA-N Phe-Gly-Asn Chemical compound N[C@@H](Cc1ccccc1)C(=O)NCC(=O)N[C@@H](CC(N)=O)C(O)=O WPTYDQPGBMDUBI-QWRGUYRKSA-N 0.000 description 1
- KRYSMKKRRRWOCZ-QEWYBTABSA-N Phe-Ile-Glu Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCC(O)=O)C(O)=O KRYSMKKRRRWOCZ-QEWYBTABSA-N 0.000 description 1
- KDYPMIZMXDECSU-JYJNAYRXSA-N Phe-Leu-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC1=CC=CC=C1 KDYPMIZMXDECSU-JYJNAYRXSA-N 0.000 description 1
- LRBSWBVUCLLRLU-BZSNNMDCSA-N Phe-Leu-Lys Chemical compound CC(C)C[C@H](NC(=O)[C@@H](N)Cc1ccccc1)C(=O)N[C@@H](CCCCN)C(O)=O LRBSWBVUCLLRLU-BZSNNMDCSA-N 0.000 description 1
- PEFJUUYFEGBXFA-BZSNNMDCSA-N Phe-Lys-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CC1=CC=CC=C1 PEFJUUYFEGBXFA-BZSNNMDCSA-N 0.000 description 1
- TXJJXEXCZBHDNA-ACRUOGEOSA-N Phe-Phe-His Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC2=CC=CC=C2)C(=O)N[C@@H](CC3=CN=CN3)C(=O)O)N TXJJXEXCZBHDNA-ACRUOGEOSA-N 0.000 description 1
- DSXPMZMSJHOKKK-HJOGWXRNSA-N Phe-Phe-Tyr Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O DSXPMZMSJHOKKK-HJOGWXRNSA-N 0.000 description 1
- IPFXYNKCXYGSSV-KKUMJFAQSA-N Phe-Ser-Lys Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)O)N IPFXYNKCXYGSSV-KKUMJFAQSA-N 0.000 description 1
- KLYYKKGCPOGDPE-OEAJRASXSA-N Phe-Thr-Leu Chemical compound [H]N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(O)=O KLYYKKGCPOGDPE-OEAJRASXSA-N 0.000 description 1
- ZVJGAXNBBKPYOE-HKUYNNGSSA-N Phe-Trp-Gly Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)NCC(O)=O)C1=CC=CC=C1 ZVJGAXNBBKPYOE-HKUYNNGSSA-N 0.000 description 1
- DBNGDEAQXGFGRA-ACRUOGEOSA-N Phe-Tyr-Lys Chemical compound C1=CC=C(C=C1)C[C@@H](C(=O)N[C@@H](CC2=CC=C(C=C2)O)C(=O)N[C@@H](CCCCN)C(=O)O)N DBNGDEAQXGFGRA-ACRUOGEOSA-N 0.000 description 1
- YUPRIZTWANWWHK-DZKIICNBSA-N Phe-Val-Glu Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)O)NC(=O)[C@H](CC1=CC=CC=C1)N YUPRIZTWANWWHK-DZKIICNBSA-N 0.000 description 1
- 108091000080 Phosphotransferase Proteins 0.000 description 1
- OOLOTUZJUBOMAX-GUBZILKMSA-N Pro-Ala-Val Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](C)C(=O)N[C@@H](C(C)C)C(O)=O OOLOTUZJUBOMAX-GUBZILKMSA-N 0.000 description 1
- OBVCYFIHIIYIQF-CIUDSAMLSA-N Pro-Asn-Glu Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O OBVCYFIHIIYIQF-CIUDSAMLSA-N 0.000 description 1
- VOHFZDSRPZLXLH-IHRRRGAJSA-N Pro-Asn-Phe Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC1=CC=CC=C1)C(O)=O VOHFZDSRPZLXLH-IHRRRGAJSA-N 0.000 description 1
- KIPIKSXPPLABPN-CIUDSAMLSA-N Pro-Glu-Asn Chemical compound NC(=O)C[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H]1CCCN1 KIPIKSXPPLABPN-CIUDSAMLSA-N 0.000 description 1
- MGDFPGCFVJFITQ-CIUDSAMLSA-N Pro-Glu-Asp Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(O)=O)C(O)=O MGDFPGCFVJFITQ-CIUDSAMLSA-N 0.000 description 1
- FRKBNXCFJBPJOL-GUBZILKMSA-N Pro-Glu-Glu Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O FRKBNXCFJBPJOL-GUBZILKMSA-N 0.000 description 1
- VOZIBWWZSBIXQN-SRVKXCTJSA-N Pro-Glu-Lys Chemical compound NCCCC[C@H](NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H]1CCCN1)C(O)=O VOZIBWWZSBIXQN-SRVKXCTJSA-N 0.000 description 1
- AQGUSRZKDZYGGV-GMOBBJLQSA-N Pro-Ile-Asp Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(O)=O)C(O)=O AQGUSRZKDZYGGV-GMOBBJLQSA-N 0.000 description 1
- VZKBJNBZMZHKRC-XUXIUFHCSA-N Pro-Ile-Leu Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(C)C)C(O)=O VZKBJNBZMZHKRC-XUXIUFHCSA-N 0.000 description 1
- CDGABSWLRMECHC-IHRRRGAJSA-N Pro-Lys-His Chemical compound C1C[C@H](NC1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC2=CN=CN2)C(=O)O CDGABSWLRMECHC-IHRRRGAJSA-N 0.000 description 1
- FNGOXVQBBCMFKV-CIUDSAMLSA-N Pro-Ser-Glu Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(O)=O FNGOXVQBBCMFKV-CIUDSAMLSA-N 0.000 description 1
- QDDJNKWPTJHROJ-UFYCRDLUSA-N Pro-Tyr-Tyr Chemical compound C([C@@H](C(=O)O)NC(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)[C@H]1NCCC1)C1=CC=C(O)C=C1 QDDJNKWPTJHROJ-UFYCRDLUSA-N 0.000 description 1
- FIDNSJUXESUDOV-JYJNAYRXSA-N Pro-Tyr-Val Chemical compound [H]N1CCC[C@H]1C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](C(C)C)C(O)=O FIDNSJUXESUDOV-JYJNAYRXSA-N 0.000 description 1
- 108010003201 RGH 0205 Proteins 0.000 description 1
- QFBNNYNWKYKVJO-DCAQKATOSA-N Ser-Arg-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)CO)CCCN=C(N)N QFBNNYNWKYKVJO-DCAQKATOSA-N 0.000 description 1
- ZXLUWXWISXIFIX-ACZMJKKPSA-N Ser-Asn-Glu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O ZXLUWXWISXIFIX-ACZMJKKPSA-N 0.000 description 1
- BNFVPSRLHHPQKS-WHFBIAKZSA-N Ser-Asp-Gly Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(=O)NCC(O)=O BNFVPSRLHHPQKS-WHFBIAKZSA-N 0.000 description 1
- QPFJSHSJFIYDJZ-GHCJXIJMSA-N Ser-Asp-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CO QPFJSHSJFIYDJZ-GHCJXIJMSA-N 0.000 description 1
- OLIJLNWFEQEFDM-SRVKXCTJSA-N Ser-Asp-Phe Chemical compound OC[C@H](N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 OLIJLNWFEQEFDM-SRVKXCTJSA-N 0.000 description 1
- YQQKYAZABFEYAF-FXQIFTODSA-N Ser-Glu-Gln Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(O)=O YQQKYAZABFEYAF-FXQIFTODSA-N 0.000 description 1
- LALNXSXEYFUUDD-GUBZILKMSA-N Ser-Glu-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(O)=O LALNXSXEYFUUDD-GUBZILKMSA-N 0.000 description 1
- MUARUIBTKQJKFY-WHFBIAKZSA-N Ser-Gly-Asp Chemical compound [H]N[C@@H](CO)C(=O)NCC(=O)N[C@@H](CC(O)=O)C(O)=O MUARUIBTKQJKFY-WHFBIAKZSA-N 0.000 description 1
- SNVIOQXAHVORQM-WDSKDSINSA-N Ser-Gly-Gln Chemical compound [H]N[C@@H](CO)C(=O)NCC(=O)N[C@@H](CCC(N)=O)C(O)=O SNVIOQXAHVORQM-WDSKDSINSA-N 0.000 description 1
- QBUWQRKEHJXTOP-DCAQKATOSA-N Ser-His-Arg Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O QBUWQRKEHJXTOP-DCAQKATOSA-N 0.000 description 1
- RJHJPZQOMKCSTP-CIUDSAMLSA-N Ser-His-Asn Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CNC=N1)C(=O)N[C@@H](CC(N)=O)C(O)=O RJHJPZQOMKCSTP-CIUDSAMLSA-N 0.000 description 1
- YIUWWXVTYLANCJ-NAKRPEOUSA-N Ser-Ile-Arg Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O YIUWWXVTYLANCJ-NAKRPEOUSA-N 0.000 description 1
- JIPVNVNKXJLFJF-BJDJZHNGSA-N Ser-Ile-Lys Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CO)N JIPVNVNKXJLFJF-BJDJZHNGSA-N 0.000 description 1
- ZOPISOXXPQNOCO-SVSWQMSJSA-N Ser-Ile-Thr Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)O)NC(=O)[C@H](CO)N ZOPISOXXPQNOCO-SVSWQMSJSA-N 0.000 description 1
- MUJQWSAWLLRJCE-KATARQTJSA-N Ser-Leu-Thr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O MUJQWSAWLLRJCE-KATARQTJSA-N 0.000 description 1
- NNFMANHDYSVNIO-DCAQKATOSA-N Ser-Lys-Arg Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O NNFMANHDYSVNIO-DCAQKATOSA-N 0.000 description 1
- CRJZZXMAADSBBQ-SRVKXCTJSA-N Ser-Lys-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CO CRJZZXMAADSBBQ-SRVKXCTJSA-N 0.000 description 1
- QJKPECIAWNNKIT-KKUMJFAQSA-N Ser-Lys-Tyr Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(O)=O QJKPECIAWNNKIT-KKUMJFAQSA-N 0.000 description 1
- UPLYXVPQLJVWMM-KKUMJFAQSA-N Ser-Phe-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(C)C)C(O)=O UPLYXVPQLJVWMM-KKUMJFAQSA-N 0.000 description 1
- ADJDNJCSPNFFPI-FXQIFTODSA-N Ser-Pro-Ala Chemical compound OC(=O)[C@H](C)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)CO ADJDNJCSPNFFPI-FXQIFTODSA-N 0.000 description 1
- ILZAUMFXKSIUEF-SRVKXCTJSA-N Ser-Ser-Phe Chemical compound OC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@H](C(O)=O)CC1=CC=CC=C1 ILZAUMFXKSIUEF-SRVKXCTJSA-N 0.000 description 1
- PCJLFYBAQZQOFE-KATARQTJSA-N Ser-Thr-Lys Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CO)N)O PCJLFYBAQZQOFE-KATARQTJSA-N 0.000 description 1
- BEBVVQPDSHHWQL-NRPADANISA-N Ser-Val-Glu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCC(O)=O)C(O)=O BEBVVQPDSHHWQL-NRPADANISA-N 0.000 description 1
- LGIMRDKGABDMBN-DCAQKATOSA-N Ser-Val-Lys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCCN)C(=O)O)NC(=O)[C@H](CO)N LGIMRDKGABDMBN-DCAQKATOSA-N 0.000 description 1
- 101000910035 Streptococcus pyogenes serotype M1 CRISPR-associated endonuclease Cas9/Csn1 Proteins 0.000 description 1
- FQPQPTHMHZKGFM-XQXXSGGOSA-N Thr-Ala-Glu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C)C(=O)N[C@@H](CCC(O)=O)C(O)=O FQPQPTHMHZKGFM-XQXXSGGOSA-N 0.000 description 1
- DGDCHPCRMWEOJR-FQPOAREZSA-N Thr-Ala-Tyr Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](C)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 DGDCHPCRMWEOJR-FQPOAREZSA-N 0.000 description 1
- CEXFELBFVHLYDZ-XGEHTFHBSA-N Thr-Arg-Ser Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CO)C(O)=O CEXFELBFVHLYDZ-XGEHTFHBSA-N 0.000 description 1
- IRKWVRSEQFTGGV-VEVYYDQMSA-N Thr-Asn-Arg Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O IRKWVRSEQFTGGV-VEVYYDQMSA-N 0.000 description 1
- SKHPKKYKDYULDH-HJGDQZAQSA-N Thr-Asn-Leu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(O)=O SKHPKKYKDYULDH-HJGDQZAQSA-N 0.000 description 1
- OJRNZRROAIAHDL-LKXGYXEUSA-N Thr-Asn-Ser Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(O)=O OJRNZRROAIAHDL-LKXGYXEUSA-N 0.000 description 1
- QILPDQCTQZDHFM-HJGDQZAQSA-N Thr-Gln-Arg Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O QILPDQCTQZDHFM-HJGDQZAQSA-N 0.000 description 1
- RKDFEMGVMMYYNG-WDCWCFNPSA-N Thr-Gln-Leu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(C)C)C(O)=O RKDFEMGVMMYYNG-WDCWCFNPSA-N 0.000 description 1
- BNGDYRRHRGOPHX-IFFSRLJSSA-N Thr-Glu-Val Chemical compound CC(C)[C@H](NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)[C@@H](C)O)C(O)=O BNGDYRRHRGOPHX-IFFSRLJSSA-N 0.000 description 1
- AQAMPXBRJJWPNI-JHEQGTHGSA-N Thr-Gly-Glu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)NCC(=O)N[C@@H](CCC(O)=O)C(O)=O AQAMPXBRJJWPNI-JHEQGTHGSA-N 0.000 description 1
- XPNSAQMEAVSQRD-FBCQKBJTSA-N Thr-Gly-Gly Chemical compound C[C@@H](O)[C@H](N)C(=O)NCC(=O)NCC(O)=O XPNSAQMEAVSQRD-FBCQKBJTSA-N 0.000 description 1
- YSXYEJWDHBCTDJ-DVJZZOLTSA-N Thr-Gly-Trp Chemical compound C[C@H]([C@@H](C(=O)NCC(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)O)N)O YSXYEJWDHBCTDJ-DVJZZOLTSA-N 0.000 description 1
- PAXANSWUSVPFNK-IUKAMOBKSA-N Thr-Ile-Asn Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)O)NC(=O)[C@H]([C@@H](C)O)N PAXANSWUSVPFNK-IUKAMOBKSA-N 0.000 description 1
- MEJHFIOYJHTWMK-VOAKCMCISA-N Thr-Leu-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)[C@@H](C)O MEJHFIOYJHTWMK-VOAKCMCISA-N 0.000 description 1
- KKPOGALELPLJTL-MEYUZBJRSA-N Thr-Lys-Tyr Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 KKPOGALELPLJTL-MEYUZBJRSA-N 0.000 description 1
- KPNSNVTUVKSBFL-ZJDVBMNYSA-N Thr-Met-Thr Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H]([C@@H](C)O)C(=O)O)N)O KPNSNVTUVKSBFL-ZJDVBMNYSA-N 0.000 description 1
- WRQLCVIALDUQEQ-UNQGMJICSA-N Thr-Phe-Arg Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O WRQLCVIALDUQEQ-UNQGMJICSA-N 0.000 description 1
- WYLAVUAWOUVUCA-XVSYOHENSA-N Thr-Phe-Asp Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H](CC(O)=O)C(O)=O WYLAVUAWOUVUCA-XVSYOHENSA-N 0.000 description 1
- VGYVVSQFSSKZRJ-OEAJRASXSA-N Thr-Phe-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@@H](NC(=O)[C@@H](N)[C@H](O)C)CC1=CC=CC=C1 VGYVVSQFSSKZRJ-OEAJRASXSA-N 0.000 description 1
- IWAVRIPRTCJAQO-HSHDSVGOSA-N Thr-Pro-Trp Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(O)=O IWAVRIPRTCJAQO-HSHDSVGOSA-N 0.000 description 1
- MFMGPEKYBXFIRF-SUSMZKCASA-N Thr-Thr-Gln Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCC(N)=O)C(O)=O MFMGPEKYBXFIRF-SUSMZKCASA-N 0.000 description 1
- NHQVWACSJZJCGJ-FLBSBUHZSA-N Thr-Thr-Ile Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(O)=O NHQVWACSJZJCGJ-FLBSBUHZSA-N 0.000 description 1
- CSNBWOJOEOPYIJ-UVOCVTCTSA-N Thr-Thr-Lys Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCCN)C(O)=O CSNBWOJOEOPYIJ-UVOCVTCTSA-N 0.000 description 1
- ABCLYRRGTZNIFU-BWAGICSOSA-N Thr-Tyr-His Chemical compound C[C@H]([C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CC2=CN=CN2)C(=O)O)N)O ABCLYRRGTZNIFU-BWAGICSOSA-N 0.000 description 1
- XGFYGMKZKFRGAI-RCWTZXSCSA-N Thr-Val-Arg Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](C(C)C)C(=O)N[C@H](C(O)=O)CCCN=C(N)N XGFYGMKZKFRGAI-RCWTZXSCSA-N 0.000 description 1
- 108091028113 Trans-activating crRNA Proteins 0.000 description 1
- GHXXDFDIDHIEIL-WFBYXXMGSA-N Trp-Ala-Cys Chemical compound C[C@@H](C(=O)N[C@@H](CS)C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N GHXXDFDIDHIEIL-WFBYXXMGSA-N 0.000 description 1
- NMCBVGFGWSIGSB-NUTKFTJISA-N Trp-Ala-Leu Chemical compound C[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)O)NC(=O)[C@H](CC1=CNC2=CC=CC=C21)N NMCBVGFGWSIGSB-NUTKFTJISA-N 0.000 description 1
- RNFZZCMCRDFNAE-WFBYXXMGSA-N Trp-Asn-Ala Chemical compound [H]N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](C)C(O)=O RNFZZCMCRDFNAE-WFBYXXMGSA-N 0.000 description 1
- GEGYPBOPIGNZIF-CWRNSKLLSA-N Trp-Ser-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CO)NC(=O)[C@H](CC2=CNC3=CC=CC=C32)N)C(=O)O GEGYPBOPIGNZIF-CWRNSKLLSA-N 0.000 description 1
- ZJPSMXCFEKMZFE-IHPCNDPISA-N Trp-Tyr-Ser Chemical compound [H]N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CO)C(O)=O ZJPSMXCFEKMZFE-IHPCNDPISA-N 0.000 description 1
- DLZKEQQWXODGGZ-KWQFWETISA-N Tyr-Ala-Gly Chemical compound OC(=O)CNC(=O)[C@H](C)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 DLZKEQQWXODGGZ-KWQFWETISA-N 0.000 description 1
- DXYWRYQRKPIGGU-BPNCWPANSA-N Tyr-Ala-Val Chemical compound CC(C)[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 DXYWRYQRKPIGGU-BPNCWPANSA-N 0.000 description 1
- UABYBEBXFFNCIR-YDHLFZDLSA-N Tyr-Asp-Val Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(O)=O UABYBEBXFFNCIR-YDHLFZDLSA-N 0.000 description 1
- SLCSPPCQWUHPPO-JYJNAYRXSA-N Tyr-Glu-Lys Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 SLCSPPCQWUHPPO-JYJNAYRXSA-N 0.000 description 1
- KIJLSRYAUGGZIN-CFMVVWHZSA-N Tyr-Ile-Asp Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(O)=O)C(O)=O KIJLSRYAUGGZIN-CFMVVWHZSA-N 0.000 description 1
- QSFJHIRIHOJRKS-ULQDDVLXSA-N Tyr-Leu-Arg Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O QSFJHIRIHOJRKS-ULQDDVLXSA-N 0.000 description 1
- BSCBBPKDVOZICB-KKUMJFAQSA-N Tyr-Leu-Asp Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(O)=O BSCBBPKDVOZICB-KKUMJFAQSA-N 0.000 description 1
- NKUGCYDFQKFVOJ-JYJNAYRXSA-N Tyr-Leu-Gln Chemical compound NC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 NKUGCYDFQKFVOJ-JYJNAYRXSA-N 0.000 description 1
- JLKVWTICWVWGSK-JYJNAYRXSA-N Tyr-Lys-Glu Chemical compound OC(=O)CC[C@@H](C(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 JLKVWTICWVWGSK-JYJNAYRXSA-N 0.000 description 1
- PMHLLBKTDHQMCY-ULQDDVLXSA-N Tyr-Lys-Val Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(O)=O PMHLLBKTDHQMCY-ULQDDVLXSA-N 0.000 description 1
- LMKKMCGTDANZTR-BZSNNMDCSA-N Tyr-Phe-Asp Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CC(O)=O)C(O)=O)C1=CC=C(O)C=C1 LMKKMCGTDANZTR-BZSNNMDCSA-N 0.000 description 1
- VBFVQTPETKJCQW-RPTUDFQQSA-N Tyr-Phe-Thr Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)N[C@@H]([C@@H](C)O)C(O)=O VBFVQTPETKJCQW-RPTUDFQQSA-N 0.000 description 1
- RCMWNNJFKNDKQR-UFYCRDLUSA-N Tyr-Pro-Phe Chemical compound C([C@H](N)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CC=C(O)C=C1 RCMWNNJFKNDKQR-UFYCRDLUSA-N 0.000 description 1
- XGZBEGGGAUQBMB-KJEVXHAQSA-N Tyr-Pro-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CC2=CC=C(C=C2)O)N)O XGZBEGGGAUQBMB-KJEVXHAQSA-N 0.000 description 1
- MQGGXGKQSVEQHR-KKUMJFAQSA-N Tyr-Ser-Leu Chemical compound CC(C)C[C@@H](C(O)=O)NC(=O)[C@H](CO)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 MQGGXGKQSVEQHR-KKUMJFAQSA-N 0.000 description 1
- SQUMHUZLJDUROQ-YDHLFZDLSA-N Tyr-Val-Asp Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(O)=O)C(O)=O SQUMHUZLJDUROQ-YDHLFZDLSA-N 0.000 description 1
- ABSXSJZNRAQDDI-KJEVXHAQSA-N Tyr-Val-Thr Chemical compound [H]N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)O)C(O)=O ABSXSJZNRAQDDI-KJEVXHAQSA-N 0.000 description 1
- 108010064997 VPY tripeptide Proteins 0.000 description 1
- PAPWZOJOLKZEFR-AVGNSLFASA-N Val-Arg-Lys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCCCN)C(=O)O)N PAPWZOJOLKZEFR-AVGNSLFASA-N 0.000 description 1
- LIQJSDDOULTANC-QSFUFRPTSA-N Val-Asn-Ile Chemical compound CC[C@H](C)[C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](C(C)C)N LIQJSDDOULTANC-QSFUFRPTSA-N 0.000 description 1
- QGFPYRPIUXBYGR-YDHLFZDLSA-N Val-Asn-Phe Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC1=CC=CC=C1)C(=O)O)N QGFPYRPIUXBYGR-YDHLFZDLSA-N 0.000 description 1
- PVPAOIGJYHVWBT-KKHAAJSZSA-N Val-Asn-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CC(=O)N)NC(=O)[C@H](C(C)C)N)O PVPAOIGJYHVWBT-KKHAAJSZSA-N 0.000 description 1
- HZYOWMGWKKRMBZ-BYULHYEWSA-N Val-Asp-Asp Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CC(=O)O)C(=O)O)N HZYOWMGWKKRMBZ-BYULHYEWSA-N 0.000 description 1
- BMGOFDMKDVVGJG-NHCYSSNCSA-N Val-Asp-Lys Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)O)N BMGOFDMKDVVGJG-NHCYSSNCSA-N 0.000 description 1
- NYTKXWLZSNRILS-IFFSRLJSSA-N Val-Gln-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](C(C)C)N)O NYTKXWLZSNRILS-IFFSRLJSSA-N 0.000 description 1
- CVIXTAITYJQMPE-LAEOZQHASA-N Val-Glu-Asn Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(N)=O)C(O)=O CVIXTAITYJQMPE-LAEOZQHASA-N 0.000 description 1
- GBESYURLQOYWLU-LAEOZQHASA-N Val-Glu-Asp Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CC(=O)O)C(=O)O)N GBESYURLQOYWLU-LAEOZQHASA-N 0.000 description 1
- SZTTYWIUCGSURQ-AUTRQRHGSA-N Val-Glu-Glu Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(O)=O SZTTYWIUCGSURQ-AUTRQRHGSA-N 0.000 description 1
- XWYUBUYQMOUFRQ-IFFSRLJSSA-N Val-Glu-Thr Chemical compound C[C@H]([C@@H](C(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](C(C)C)N)O XWYUBUYQMOUFRQ-IFFSRLJSSA-N 0.000 description 1
- YTPLVNUZZOBFFC-SCZZXKLOSA-N Val-Gly-Pro Chemical compound CC(C)[C@H](N)C(=O)NCC(=O)N1CCC[C@@H]1C(O)=O YTPLVNUZZOBFFC-SCZZXKLOSA-N 0.000 description 1
- KZKMBGXCNLPYKD-YEPSODPASA-N Val-Gly-Thr Chemical compound CC(C)[C@H](N)C(=O)NCC(=O)N[C@@H]([C@@H](C)O)C(O)=O KZKMBGXCNLPYKD-YEPSODPASA-N 0.000 description 1
- UKEVLVBHRKWECS-LSJOCFKGSA-N Val-Ile-Gly Chemical compound CC[C@H](C)[C@@H](C(=O)NCC(=O)O)NC(=O)[C@H](C(C)C)N UKEVLVBHRKWECS-LSJOCFKGSA-N 0.000 description 1
- APQIVBCUIUDSMB-OSUNSFLBSA-N Val-Ile-Thr Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H]([C@@H](C)O)C(=O)O)NC(=O)[C@H](C(C)C)N APQIVBCUIUDSMB-OSUNSFLBSA-N 0.000 description 1
- WBAJDGWKRIHOAC-GVXVVHGQSA-N Val-Lys-Gln Chemical compound [H]N[C@@H](C(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(N)=O)C(O)=O WBAJDGWKRIHOAC-GVXVVHGQSA-N 0.000 description 1
- ZRSZTKTVPNSUNA-IHRRRGAJSA-N Val-Lys-Leu Chemical compound CC(C)C[C@H](NC(=O)[C@H](CCCCN)NC(=O)[C@@H](N)C(C)C)C(O)=O ZRSZTKTVPNSUNA-IHRRRGAJSA-N 0.000 description 1
- YMTOEGGOCHVGEH-IHRRRGAJSA-N Val-Lys-Lys Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(O)=O YMTOEGGOCHVGEH-IHRRRGAJSA-N 0.000 description 1
- RQOMPQGUGBILAG-AVGNSLFASA-N Val-Met-Leu Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(C)C)C(O)=O RQOMPQGUGBILAG-AVGNSLFASA-N 0.000 description 1
- UEPLNXPLHJUYPT-AVGNSLFASA-N Val-Met-Lys Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCCCN)C(O)=O UEPLNXPLHJUYPT-AVGNSLFASA-N 0.000 description 1
- GQMNEJMFMCJJTD-NHCYSSNCSA-N Val-Pro-Gln Chemical compound CC(C)[C@H](N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(N)=O)C(O)=O GQMNEJMFMCJJTD-NHCYSSNCSA-N 0.000 description 1
- SUGRIIAOLCDLBD-ZOBUZTSGSA-N Val-Trp-Asp Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)N[C@@H](CC(=O)O)C(=O)O)N SUGRIIAOLCDLBD-ZOBUZTSGSA-N 0.000 description 1
- MIAZWUMFUURQNP-YDHLFZDLSA-N Val-Tyr-Asn Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CC(=O)N)C(=O)O)N MIAZWUMFUURQNP-YDHLFZDLSA-N 0.000 description 1
- VTIAEOKFUJJBTC-YDHLFZDLSA-N Val-Tyr-Asp Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CC1=CC=C(C=C1)O)C(=O)N[C@@H](CC(=O)O)C(=O)O)N VTIAEOKFUJJBTC-YDHLFZDLSA-N 0.000 description 1
- GUIYPEKUEMQBIK-JSGCOSHPSA-N Val-Tyr-Gly Chemical compound CC(C)[C@H](N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)NCC(O)=O GUIYPEKUEMQBIK-JSGCOSHPSA-N 0.000 description 1
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 210000005006 adaptive immune system Anatomy 0.000 description 1
- 108010024078 alanyl-glycyl-serine Proteins 0.000 description 1
- 108010086434 alanyl-seryl-glycine Proteins 0.000 description 1
- 108010044940 alanylglutamine Proteins 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 108010008355 arginyl-glutamine Proteins 0.000 description 1
- 108010001271 arginyl-glutamyl-arginine Proteins 0.000 description 1
- 108010059459 arginyl-threonyl-phenylalanine Proteins 0.000 description 1
- 108010084758 arginyl-tyrosyl-aspartic acid Proteins 0.000 description 1
- 108010068380 arginylarginine Proteins 0.000 description 1
- 108010077245 asparaginyl-proline Proteins 0.000 description 1
- 108010047857 aspartylglycine Proteins 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000009835 boiling Methods 0.000 description 1
- 238000009395 breeding Methods 0.000 description 1
- 230000001488 breeding effect Effects 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 108010049041 glutamylalanine Proteins 0.000 description 1
- 108010079547 glutamylmethionine Proteins 0.000 description 1
- HPAIKDPJURGQLN-UHFFFAOYSA-N glycyl-L-histidyl-L-phenylalanine Natural products C=1C=CC=CC=1CC(C(O)=O)NC(=O)C(NC(=O)CN)CC1=CN=CN1 HPAIKDPJURGQLN-UHFFFAOYSA-N 0.000 description 1
- XBGGUPMXALFZOT-UHFFFAOYSA-N glycyl-L-tyrosine hemihydrate Natural products NCC(=O)NC(C(O)=O)CC1=CC=C(O)C=C1 XBGGUPMXALFZOT-UHFFFAOYSA-N 0.000 description 1
- 108010000434 glycyl-alanyl-leucine Proteins 0.000 description 1
- 108010019832 glycyl-asparaginyl-glycine Proteins 0.000 description 1
- 108010067216 glycyl-glycyl-glycine Proteins 0.000 description 1
- XKUKSGPZAADMRA-UHFFFAOYSA-N glycyl-glycyl-glycine Natural products NCC(=O)NCC(=O)NCC(O)=O XKUKSGPZAADMRA-UHFFFAOYSA-N 0.000 description 1
- 108010001064 glycyl-glycyl-glycyl-glycine Proteins 0.000 description 1
- 108010026364 glycyl-glycyl-leucine Proteins 0.000 description 1
- 108010066198 glycyl-leucyl-phenylalanine Proteins 0.000 description 1
- 108010015792 glycyllysine Proteins 0.000 description 1
- 108010087823 glycyltyrosine Proteins 0.000 description 1
- 108010037850 glycylvaline Proteins 0.000 description 1
- 108010028295 histidylhistidine Proteins 0.000 description 1
- 108010085325 histidylproline Proteins 0.000 description 1
- 210000005260 human cell Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003834 intracellular effect Effects 0.000 description 1
- 108010027338 isoleucylcysteine Proteins 0.000 description 1
- 230000009916 joint effect Effects 0.000 description 1
- 108010044311 leucyl-glycyl-glycine Proteins 0.000 description 1
- 108010030617 leucyl-phenylalanyl-valine Proteins 0.000 description 1
- 108010000761 leucylarginine Proteins 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 108010044348 lysyl-glutamyl-aspartic acid Proteins 0.000 description 1
- 108010045397 lysyl-tyrosyl-lysine Proteins 0.000 description 1
- 108010009298 lysylglutamic acid Proteins 0.000 description 1
- 108010064235 lysylglycine Proteins 0.000 description 1
- 108010038320 lysylphenylalanine Proteins 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 150000007523 nucleic acids Chemical group 0.000 description 1
- 239000002777 nucleoside Substances 0.000 description 1
- 150000003833 nucleoside derivatives Chemical class 0.000 description 1
- 108010064486 phenylalanyl-leucyl-valine Proteins 0.000 description 1
- 108010012581 phenylalanylglutamate Proteins 0.000 description 1
- 108010073025 phenylalanylphenylalanine Proteins 0.000 description 1
- 102000020233 phosphotransferase Human genes 0.000 description 1
- 108010025488 pinealon Proteins 0.000 description 1
- 239000013600 plasmid vector Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 108010031719 prolyl-serine Proteins 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 108010069117 seryl-lysyl-aspartic acid Proteins 0.000 description 1
- 108010015840 seryl-prolyl-lysyl-lysine Proteins 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 108010031491 threonyl-lysyl-glutamic acid Proteins 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 230000037426 transcriptional repression Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 108010015666 tryptophyl-leucyl-glutamic acid Proteins 0.000 description 1
- 108010038745 tryptophylglycine Proteins 0.000 description 1
- 108010045269 tryptophyltryptophan Proteins 0.000 description 1
- 108010012567 tyrosyl-glycyl-glycyl-phenylalanyl Proteins 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/82—Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
- C12N15/8201—Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
- C12N15/8213—Targeted insertion of genes into the plant genome by homologous recombination
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2810/00—Vectors comprising a targeting moiety
- C12N2810/10—Vectors comprising a non-peptidic targeting moiety
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Cell Biology (AREA)
- Microbiology (AREA)
- Plant Pathology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
Abstract
本发明属于基因工程技术领域,具体涉及一种植物基因组定向碱基编辑骨架载体及其应用。本发明要解决的技术问题是提升植物细胞基因组的定向碱基编辑效率、拓展碱基编辑窗口。本发明解决技术问题的技术方案是提供一种植物基因组定向碱基编辑骨架载体,该骨架载体由一个PolⅡ型启动子驱动nCas9‑PmCDA1核酸酶‑胞嘧啶脱氨酶融合蛋白表达单元和合成向导RNA(sgRNA)转录表达单元两个核心区域构成的核心单元的转录。本发明单一转录单元定向碱基编辑骨架载体,可有效实现胞嘧啶碱基(C)转变为胸腺嘧啶碱基(T)的简单、快捷、高效定向编辑,是一种有效实现植物基因组碱基定向编辑的分子工具。
The invention belongs to the technical field of genetic engineering, and in particular relates to a plant genome-directed base editing backbone carrier and its application. The technical problem to be solved by the present invention is to improve the efficiency of targeted base editing in plant cell genomes and expand the base editing window. The technical solution of the present invention to solve the technical problem is to provide a plant genome-directed base editing backbone carrier, which is driven by a Pol II type promoter to express the nCas9-PmCDA1 nuclease-cytidine deaminase fusion protein expression unit and synthetic guide RNA (sgRNA) Transcription of the core unit composed of two core regions of the transcription expression unit. The single transcription unit directional base editing backbone carrier of the present invention can effectively realize the simple, fast and efficient directional editing of cytosine base (C) into thymine base (T), and is a kind of effective realization of plant genome base directional editing. Molecular tools for editing.
Description
技术领域technical field
本发明属于植物基因工程领域,涉及一种植物基因组定向碱基编辑骨架载体及其应用。The invention belongs to the field of plant genetic engineering, and relates to a plant genome-directed base editing backbone carrier and an application thereof.
背景技术Background technique
基因组定向修饰一直是生物学研究的前沿与热点领域,通过对基因组特定区域进行精确定向修饰:一方面可以针对目标序列进行精确突变,获得突变材料,对目标基因功能进行明确鉴定;另一方面可以进行目标序列的精确置换或插入,将外源基因随机导入造成的表达及遗传的不确定性降至最低。Targeted genome modification has always been a frontier and hot spot in biological research. Through precise targeted modification of specific regions of the genome: on the one hand, it can precisely mutate the target sequence, obtain mutation materials, and clearly identify the function of the target gene; on the other hand, it can Precise replacement or insertion of the target sequence minimizes the uncertainty of expression and inheritance caused by the random introduction of foreign genes.
2012年,研究者首次证明了CRISPR-Cas(Clustered regularly interspacedshort palindromic repeats-CRISPR associated protein)可以实现序列特异性DNA双链剪切,随后CRISPR-Cas9系统在包括食蟹猴、斑马鱼、小鼠、人源细胞系、拟南芥、水稻等动植物系统中实现了基于RNA导向的细胞内基因组定向编辑。在该基因组定向编辑体系中,Cas蛋白在向导RNA引导下,识别并剪切特定DNA序列产生DNA双链断裂(double strandbreaks,DSBs),进而基于细胞内源DNA修复系统实现目标位点DNA序列定向编辑。目前已知的真核生物DNA修复系统可分为两大类:“同源重组”(homologous recombination,HR)修复;“非同源性末端连接”(nonhomologous end joining,NHEJ)修复。HR依据同源序列为模板,精确修复受损DNA区域;而NHEJ则不需要同源序列的存在,直接将DNA损伤形成的断裂末端进行连接,在完成修复的同时往往也引入不同程度的序列变异。In 2012, researchers proved for the first time that CRISPR-Cas (Clustered regularly interspacedshort palindromic repeats-CRISPR associated protein) can achieve sequence-specific DNA double-strand shearing, and then the CRISPR-Cas9 system was used in cynomolgus monkeys, zebrafish, mice, In human cell lines, Arabidopsis, rice and other animal and plant systems, RNA-guided intracellular genome-directed editing has been realized. In this genome-directed editing system, the Cas protein, guided by the guide RNA, recognizes and cuts specific DNA sequences to generate DNA double strand breaks (double strand breaks, DSBs), and then realizes the DNA sequence orientation of the target site based on the endogenous DNA repair system of the cell. edit. Currently known eukaryotic DNA repair systems can be divided into two categories: "homologous recombination" (homologous recombination, HR) repair; "nonhomologous end joining" (nonhomologous end joining, NHEJ) repair. HR uses homologous sequences as templates to accurately repair damaged DNA regions; while NHEJ does not require the existence of homologous sequences, and directly joins the broken ends formed by DNA damage, and often introduces different degrees of sequence variation when repairing .
尽管CRISPR-Cas基因组编辑工具可有效实现目标基因组序列定向编辑,但基于NHEJ修复途径的编辑事件主要是在目标修饰位点随机引入碱基插入或缺失突变,而基于HR修复途径的编辑事件尽管可依据供体模板DNA精确进行目标修饰位点序列替换,但其发生频率效率远低于NHEJ修复途径介导的编辑事件,极大限制了CRISPR-Cas基因组编辑工具进行精准碱基编辑相关基础研究及应用实践的有效应用。Although the CRISPR-Cas genome editing tool can effectively realize the directional editing of the target genome sequence, the editing events based on the NHEJ repair pathway mainly introduce base insertion or deletion mutations randomly at the target modification site, while the editing events based on the HR repair pathway can The sequence replacement of the target modification site is accurately carried out based on the donor template DNA, but its frequency and efficiency are much lower than the editing events mediated by the NHEJ repair pathway, which greatly limits the basic research and development related to precise base editing of CRISPR-Cas genome editing tools. Effective application of applied practice.
为了提高基因组目标位点特定碱基精准编辑效率,有效实现目标位点单碱基精准替换编辑,研究者在CRISPR-Cas基因组编辑工具基础上,通过将特定碱基脱氨酶与dCas9、nCas9或你Cas12a进行融合,实现了针对基因组目标位点特定碱基的精准替换编辑(如:碱基C替换为碱基T;碱基A替换为碱基G),这种新型基因组编辑工具被称为碱基编辑器(baseeditor,BE)。定向碱基编辑技术,可以针对基因组目标位点特定单碱基进行有效替换编辑,是CRISPR-Cas基因组编辑技术的有益补充,于2017年被《科学》杂志评为全球十大年度科学突破之一,凸显了该技术在基础研究及应用实践中的重要潜力。In order to improve the efficiency of precise editing of specific bases at target sites in the genome and effectively achieve precise single-base substitution editing at target sites, researchers based on the CRISPR-Cas genome editing tool combined specific base deaminases with dCas9, nCas9 or The fusion of your Cas12a achieves the precise substitution and editing of specific bases at the target site of the genome (for example: base C is replaced by base T; base A is replaced by base G), this new genome editing tool is called Base editor (baseeditor, BE). Targeted base editing technology, which can effectively replace and edit specific single bases at genome target sites, is a useful supplement to CRISPR-Cas genome editing technology, and was named one of the top ten annual scientific breakthroughs in the world by Science magazine in 2017 , highlighting the important potential of this technology in basic research and applied practice.
基于CRISPR-Cas系统的定向碱基编辑工具,有效扩展了CRISPR-Cas系统的应用范围,显示了其广泛应用前景。但现有定向碱基编辑工具,普遍存在编辑效率偏低、编辑窗口有限的问题,特别是在植物基因组编辑应用实践中,此类问题更为明显,急需研发具备高碱基编辑效率、宽碱基编辑窗口的增强型植物定向碱基编辑工具,以便有效拓展基于CRISPR-Cas系统的定向碱基编辑技术在植物基因组功能研究及育种实践中的积极应用。The directional base editing tool based on the CRISPR-Cas system has effectively expanded the application range of the CRISPR-Cas system, showing its broad application prospects. However, the existing directional base editing tools generally have the problems of low editing efficiency and limited editing window, especially in the application practice of plant genome editing. An enhanced plant-directed base editing tool for the base editing window, in order to effectively expand the positive application of the CRISPR-Cas system-based directed base editing technology in plant genome function research and breeding practice.
发明内容Contents of the invention
本发明要解决的技术问题是提升植物细胞基因组的定向碱基编辑效率、拓展碱基编辑窗口。The technical problem to be solved by the present invention is to improve the efficiency of targeted base editing in plant cell genomes and expand the base editing window.
本发明解决技术问题的技术方案是提供一种植物基因组定向碱基编辑骨架载体。该骨架载体包含一个由nCas9-PmCDA1核酸酶-胞嘧啶脱氨酶融合蛋白表达单元和合成向导RNA(sgRNA)转录表达单元两个核心区域构成的核心单元,该核心单元由一个PolⅡ型启动子驱动转录;The technical solution of the present invention to solve the technical problem is to provide a plant genome-directed base editing backbone vector. The backbone vector contains a core unit consisting of two core regions, an nCas9-PmCDA1 nuclease-cytidine deaminase fusion protein expression unit and a synthetic guide RNA (sgRNA) transcriptional expression unit, which is driven by a PolⅡ-type promoter transcription;
所述核心单元从5’到3’方向依次为nCas9 ORF-PmCDA1-Poly A-sgRNA cloningscaffold-T;所述的nCas9 ORF为酿脓链球菌(Streptococcus pyogenes)核酸酶蛋白D10A突变体编码框;PmCDA1为胞嘧啶脱氨酶编码区功能单元;Poly A为多聚A区域;sgRNAcloning scaffold为sgRNA克隆及转录单元,且sgRNA cloning scaffold至少为一个;T为终止子。The core unit is nCas9 ORF-PmCDA1-Poly A-sgRNA cloningscaffold-T from 5' to 3' direction; the nCas9 ORF is Streptococcus pyogenes nuclease protein D10A mutant coding frame; PmCDA1 It is the functional unit of the cytosine deaminase coding region; Poly A is the poly A region; sgRNA cloning scaffold is the sgRNA cloning and transcription unit, and there is at least one sgRNA cloning scaffold; T is the terminator.
其中,上述骨架载体中所述的PmCDA1胞嘧啶脱氨酶编码区功能单元从N端到C端依次包含GGGS接头、SH3接头、PmCDA1编码区、NLS信号肽、UGI编码区、SGGS接头、NLS信号肽。Wherein, the functional unit of the PmCDA1 cytosine deaminase coding region described in the above-mentioned backbone vector includes GGGS linker, SH3 linker, PmCDA1 coding region, NLS signal peptide, UGI coding region, SGGS linker, NLS signal sequence from N-terminal to C-terminal peptide.
其中,上述骨架载体符合以下至少一项:Wherein, the above-mentioned skeleton carrier meets at least one of the following:
a、nCas9核酸酶蛋白D10A突变体编码框nCas9 ORF所编码的氨基酸序列为Seq IDNo.2中第1位至第1382位氨基酸所示;a. The amino acid sequence encoded by the nCas9 ORF coding frame of the nCas9 nuclease protein D10A mutant is shown in amino acids 1 to 1382 in Seq ID No.2;
b、PmCDA1胞嘧啶脱氨酶编码区功能单元所编码的氨基酸序列为Seq ID No.2中第1383位至第1788位氨基酸所示。b. The amino acid sequence encoded by the functional unit of the cytosine deaminase coding region of PmCDA1 is shown in amino acids 1383 to 1788 in Seq ID No.2.
其中,上述骨架载体中所述sgRNA克隆及转录单元sgRNA cloning scaffold从5’端到3’端依次包含tRNA-Gly编码序列、BsaI-ccdB-BsaI单元、sgRNA骨架编码序列、tRNA-Gly编码序列。Wherein, the sgRNA cloning and transcription unit sgRNA cloning scaffold described in the above-mentioned backbone vector comprises tRNA-Gly coding sequence, BsaI-ccdB-BsaI unit, sgRNA backbone coding sequence, tRNA-Gly coding sequence from 5' end to 3' end sequentially.
其中,上述骨架载体中所述的sgRNA克隆及转录单元为1~6个。Wherein, the number of sgRNA clones and transcription units described in the above-mentioned backbone vector is 1-6.
其中,上述骨架载体中所述的sgRNA克隆及转录单元sgRNA cloning scaffold的核苷酸序列为Seq ID No.1中第7432bp至第8300bp所示。Wherein, the nucleotide sequence of the sgRNA clone and the transcription unit sgRNA cloning scaffold described in the above-mentioned backbone vector is shown in the 7432bp to the 8300bp in Seq ID No.1.
其中,上述骨架载体符合以下至少一项:Wherein, the above-mentioned skeleton carrier meets at least one of the following:
a、nCas9核酸酶蛋白D10A突变体编码框nCas9 ORF所编码的核苷酸序列为Seq IDNo.1中第2011bp至第6156bp所示;a. The nucleotide sequence encoded by the nCas9 ORF coding frame of nCas9 nuclease protein D10A mutant is shown in 2011bp to 6156bp in Seq ID No.1;
b、PmCDA1胞嘧啶脱氨酶编码区功能单元所编码的核苷酸序列为Seq ID No.1中第6157bp至第7374bp所示。b. The nucleotide sequence encoded by the functional unit of the cytosine deaminase coding region of PmCDA1 is shown in the 6157th bp to the 7374th bp in Seq ID No.1.
c、多聚A区域Poly A的核苷酸序列为Seq ID No.1中第7384bp至第7431bp所示c. The nucleotide sequence of Poly A in the poly A region is shown in 7384bp to 7431bp in Seq ID No.1
d、所述的终止子为水稻HSP终止子HSP T,其核苷酸序列为Seq ID No.1中第8307bp至第8556bp所示的核苷酸序列所示。d. The terminator is the rice HSP terminator HSP T, and its nucleotide sequence is shown in the nucleotide sequence from 8307 bp to 8556 bp in Seq ID No.1.
e、所述的PolⅡ型启动子为玉米pZmUbi1启动子pZmUbi1,其核苷酸序列为Seq IDNo.1中第1bp至第2008bp所示。e. The Pol II type promoter is the maize pZmUbi1 promoter pZmUbi1, and its nucleotide sequence is shown in the 1st bp to the 2008th bp in Seq ID No.1.
其中,上述骨架载体中所述的核心单元具有pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T的结构。进一步的,所述的pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T核心单元的核苷酸序列如Seq ID No.1所示。Wherein, the core unit described in the above-mentioned backbone vector has the structure of pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T. Further, the nucleotide sequence of the pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T core unit is shown in Seq ID No.1.
基于上述的骨架载体,本发明还提供了针对植物基因组目标位点特定胞嘧啶碱基进行定向碱基编辑的重组表达载体的制备方法。该方法包括如下步骤:Based on the above-mentioned backbone vector, the present invention also provides a method for preparing a recombinant expression vector for targeted base editing of a specific cytosine base at a target site in a plant genome. The method comprises the steps of:
a、明确特定生物基因组目标DNA区域,分析具有PAM(PAM全称protospaceradjacent motif sequenc,候选识别位点的毗邻基序)特征的区域,选择PAM结构5’端相邻的15~30bpDNA序列作为特异性靶序列;a. Identify the target DNA region of a specific biological genome, analyze the region with the characteristics of PAM (PAM full name protospaceradjacent motif sequence, adjacent motif of candidate recognition site), and select the 15-30bp DNA sequence adjacent to the 5' end of the PAM structure as the specific target sequence;
b、按照选定的特异性靶序列,分别合成具有5’-CGGA-NX-3’特征的正向寡核苷酸链和具有5’-AAAC-NX-3’特征的反向寡核苷酸链,N表示A、G、C、T中的任一种,X为整数,且14≤X≤30,其中所述正向寡核苷酸链中的NX和反向寡核苷酸中的NX具有反向互补特征;通过退火获得互补寡核苷酸双链片段;b. According to the selected specific target sequence, synthesize the forward oligonucleotide chain with 5'-CGGA-N X -3' characteristics and the reverse oligonucleotide chain with 5'-AAAC-N X -3' characteristics Nucleotide chain, N represents any one of A, G, C, T, X is an integer, and 14≤X≤30, wherein the N X in the forward oligonucleotide chain and the reverse oligonucleotide NX in the nucleotide has reverse complementary characteristics; the complementary oligonucleotide double-stranded fragment is obtained by annealing;
c、将权利要求1~9任一项所述的植物基因组定向碱基编辑骨架载体与步骤b得到的互补寡核苷酸双链片段混合,反应体系中同时加入BsaI内切酶及T4 DNA连接酶,设置酶切-连接循环反应,得到针对位点的进行定向碱基编辑的重组表达载体。c. Mix the plant genome-directed base editing backbone carrier according to any one of claims 1 to 9 with the complementary oligonucleotide double-stranded fragment obtained in step b, and add BsaI endonuclease and T4 DNA connection to the reaction system at the same time Enzyme, set up restriction restriction-ligation cycle reaction, obtain the recombinant expression vector for targeted base editing.
进一步的,所述步骤a中特异性靶序列长度为18~21bp。优选的,步骤a中特异性靶序列长度为20bp。Further, the length of the specific target sequence in step a is 18-21 bp. Preferably, the length of the specific target sequence in step a is 20bp.
优选的,步骤b中18≤X≤21。Preferably, 18≤X≤21 in step b.
优选的,在实践操作中,步骤c中可应用融合PCR扩增策略,得到由tRNA序列间隔的多个sgRNA转录单元串联扩增产物,通过“BsaI酶切-T4 DNA连接酶连接”循环反应的方式,替换BsaI-ccdB-BsaI单元,将此多sgRNA转录单元克隆入sgRNA克隆及转录单元,得到可针对多个目标位点进行特异定向碱基编辑重组表达载体。Preferably, in practical operation, the fusion PCR amplification strategy can be applied in step c to obtain the tandem amplification products of multiple sgRNA transcription units separated by tRNA sequences, and the cycle reaction of "BsaI digestion-T4 DNA ligase connection" In this way, the BsaI-ccdB-BsaI unit is replaced, and the multiple sgRNA transcription unit is cloned into the sgRNA clone and transcription unit to obtain a recombinant expression vector capable of specific and directional base editing for multiple target sites.
本发明的有益效果在于:本发明通过由一个启动子驱动nCas9-PmCDA1融合蛋白和合成向导RNA(sgRNA)转录表达单元两个核心区域启动,构成了单一转录单元定向碱基编辑骨架载体的核心单元。使用包含了该核心单元的定向碱基编辑骨架载体,可以针对植物基因组目标序列有效实现胞嘧啶碱基(C)转变为胸腺嘧啶碱基(T)的简单、快捷、高效定向编辑。本发明相较于目前使用的植物碱基编辑工具,提升了碱基编辑效率、拓展了碱基编辑窗口,推进了定向碱基编辑策略在植物基因组定向编辑中的有效应用,是一种有效实现植物基因组碱基定向编辑的分子工具,具有很好的应用前景。The beneficial effect of the present invention is that the present invention is driven by a promoter to drive the two core regions of the nCas9-PmCDA1 fusion protein and the synthetic guide RNA (sgRNA) transcription expression unit, constituting the core unit of the single transcription unit directed base editing backbone vector . Using the directional base editing backbone vector containing this core unit, the simple, fast and efficient directional editing of cytosine base (C) into thymine base (T) can be effectively realized for the target sequence of plant genome. Compared with the currently used plant base editing tools, the present invention improves the base editing efficiency, expands the base editing window, and promotes the effective application of the directional base editing strategy in the directional editing of plant genomes. It is an effective implementation Molecular tools for base-directed editing of plant genomes have promising application prospects.
附图说明Description of drawings
图1、本发明中植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体的核心单元结构及工作示意图。Fig. 1. The core unit structure and working schematic diagram of the plant STU nCas9-PmCDA1 single transcription unit directed base editing backbone vector in the present invention.
图2、基于本发明中STU nCas9-PmCDA1-OsCDC48-sgRNA01、STU nCas9-PmCDA1-OsROC5-gRNA04、STU nCas9-PmCDA1-OsROC5-gRNA05重组表达载体瞬时转化水稻原生质体,基于Illumina高通量测序的目标位点胞嘧啶定向编辑效率分析。其中,nCas9-PmCDA1代表本发明中植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体,nCas9-rApobec1为对照组(依据参考文献报道(Komor AC,Kim YB,Packer MS,Zuris JA,LiuDR.2016.Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage.Nature,533(7603):420-424.),将rApobec1胞嘧啶脱氨酶替换本发明构建的植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体中的PmCDA1单元)。Figure 2. Transient transformation of rice protoplasts based on STU nCas9-PmCDA1-OsCDC48-sgRNA01, STU nCas9-PmCDA1-OsROC5-gRNA04, and STU nCas9-PmCDA1-OsROC5-gRNA05 recombinant expression vectors in the present invention, based on the target of Illumina high-throughput sequencing Analysis of locus cytosine-directed editing efficiency. Wherein, nCas9-PmCDA1 represents the plant STU nCas9-PmCDA1 single transcription unit directional base editing backbone carrier in the present invention, and nCas9-rApobec1 is a control group (according to references reported (Komor AC, Kim YB, Packer MS, Zuris JA, LiuDR. 2016. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage.Nature, 533(7603):420-424.), replace rApobec1 cytosine deaminase with the single transcription of the plant STU nCas9-PmCDA1 constructed in the present invention PmCDA1 unit in a unit-directed base editing backbone vector).
图3基于本发明中STU nCas9-PmCDA1-OsCDC48-sgRNA01重组表达载体瞬时转化水稻原生质体,进行Illumina高通量测序,具体编辑位点处不同位置胞嘧啶碱基位点替换为胸腺嘧啶碱基的编辑效率分析。其中,nCas9-PmCDA1及nCas9-rApobec1同图2说明。Figure 3 is based on the transient transformation of rice protoplasts with the STU nCas9-PmCDA1-OsCDC48-sgRNA01 recombinant expression vector in the present invention, and performing Illumina high-throughput sequencing, and replacing cytosine base sites at different positions at specific editing sites with thymine bases Edit efficiency analysis. Among them, nCas9-PmCDA1 and nCas9-rApobec1 are as illustrated in Figure 2.
图4基于本发明中STU nCas9-PmCDA1-OsROC5-gRNA04重组表达载体瞬时转化水稻原生质体,进行Illumina高通量测序,具体编辑位点处不同位置胞嘧啶碱基位点替换为胸腺嘧啶碱基的编辑效率分析。其中,nCas9-PmCDA1及nCas9-rApobec1同图2说明。Figure 4 is based on the transient transformation of rice protoplasts with the STU nCas9-PmCDA1-OsROC5-gRNA04 recombinant expression vector in the present invention, and performing Illumina high-throughput sequencing, and replacing cytosine base sites at different positions at specific editing sites with thymine bases Edit efficiency analysis. Among them, nCas9-PmCDA1 and nCas9-rApobec1 are as illustrated in Figure 2.
图5基于本发明中STU nCas9-PmCDA1-OsROC5-gRNA05重组表达载体瞬时转化水稻原生质体,进行Illumina高通量测序,具体编辑位点处不同位置胞嘧啶碱基位点替换为胸腺嘧啶碱基的编辑效率分析。其中,nCas9-PmCDA1及nCas9-rApobec1同图2说明。Figure 5 is based on the transient transformation of rice protoplasts with the STU nCas9-PmCDA1-OsROC5-gRNA05 recombinant expression vector in the present invention, and performing Illumina high-throughput sequencing, and replacing cytosine base sites at different positions at specific editing sites with thymine bases Edit efficiency analysis. Among them, nCas9-PmCDA1 and nCas9-rApobec1 are as illustrated in Figure 2.
图6基于本发明中STU nCas9-PmCDA1-OsCDC48-sgRNA01、STU nCas9-PmCDA1-OsROC5-gRNA04、STU nCas9-PmCDA1-OsROC5-gRNA05重组表达载体进行农杆菌介导的水稻遗传转化,提取水稻转化再生幼苗基因组DNA,进行PCR扩增及Sanger测序分析,进行目标位点胞嘧啶定向编辑效率分析的结果。Figure 6 is based on the STU nCas9-PmCDA1-OsCDC48-sgRNA01, STU nCas9-PmCDA1-OsROC5-gRNA04, STU nCas9-PmCDA1-OsROC5-gRNA05 recombinant expression vectors in the present invention to perform Agrobacterium-mediated genetic transformation of rice, and extract rice to transform and regenerate seedlings Genomic DNA was subjected to PCR amplification and Sanger sequencing analysis, and the result of target site cytosine-directed editing efficiency analysis.
具体实施方式Detailed ways
本发明基于CRISPR-Cas9单一转录系统及PmCDA1胞嘧啶脱氨酶,通过编码区密码子优化、功能单元多元组装等策略,构建了构建了植物基因组定向碱基编辑骨架载体(本发明中也将其称为植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体),Based on the CRISPR-Cas9 single transcription system and PmCDA1 cytosine deaminase, the present invention constructs a plant genome-directed base editing backbone vector through strategies such as codon optimization in the coding region and multiple assembly of functional units. called plant STU nCas9-PmCDA1 single transcription unit directed base editing backbone vector),
本发明植物基因组定向碱基编辑骨架载体包含一个由nCas9-PmCDA1核酸酶-胞嘧啶脱氨酶融合蛋白表达单元和合成向导RNA(sgRNA)转录表达单元两个核心区域构成的核心单元,该核心单元由一个PolⅡ型启动子驱动转录;The plant genome-directed base editing backbone vector of the present invention comprises a core unit consisting of two core regions, an nCas9-PmCDA1 nuclease-cytidine deaminase fusion protein expression unit and a synthetic guide RNA (sgRNA) transcription and expression unit, the core unit Transcription driven by a Pol II promoter;
所述nCas9-PmCDA1核酸酶-胞嘧啶脱氨酶融合蛋白表达单元,包括nCas9 ORF为酿脓链球菌(Streptococcus pyogenes)核酸酶蛋白D10A突变体编码框;PmCDA1为胞嘧啶脱氨酶编码区功能单元;Poly A为多聚A区域,即nCas9 ORF-PmCDA1-Poly A;The nCas9-PmCDA1 nuclease-cytidine deaminase fusion protein expression unit includes the nCas9 ORF as the coding frame of the Streptococcus pyogenes nuclease protein D10A mutant; PmCDA1 is the functional unit of the cytosine deaminase coding region ; Poly A is a poly A region, namely nCas9 ORF-PmCDA1-Poly A;
所述核心单元从5’到3’方向依次为nCas9 ORF-PmCDA1-Poly A-sgRNA cloningscaffold-T;所述的nCas9 ORF为酿脓链球菌(Streptococcus pyogenes)核酸酶蛋白D10A突变体编码框;PmCDA1为胞嘧啶脱氨酶编码区功能单元;Poly A为多聚A区域;sgRNAcloning scaffold为sgRNA克隆及转录单元,且sgRNA cloning scaffold至少为一个;T为终止子。The core unit is nCas9 ORF-PmCDA1-Poly A-sgRNA cloningscaffold-T from 5' to 3' direction; the nCas9 ORF is Streptococcus pyogenes nuclease protein D10A mutant coding frame; PmCDA1 It is the functional unit of the cytosine deaminase coding region; Poly A is the poly A region; sgRNA cloning scaffold is the sgRNA cloning and transcription unit, and there is at least one sgRNA cloning scaffold; T is the terminator.
其中,上述骨架载体中所述的PmCDA1胞嘧啶脱氨酶编码区功能单元从N端到C端依次包含GGGS接头、SH3接头、PmCDA1编码区、NLS信号肽、UGI编码区、SGGS接头、NLS信号肽。Wherein, the functional unit of the PmCDA1 cytosine deaminase coding region described in the above-mentioned backbone vector includes GGGS linker, SH3 linker, PmCDA1 coding region, NLS signal peptide, UGI coding region, SGGS linker, NLS signal sequence from N-terminal to C-terminal peptide.
其中,上述骨架载体符合以下至少一项:Wherein, the above-mentioned skeleton carrier meets at least one of the following:
a、nCas9核酸酶蛋白D10A突变体编码框nCas9 ORF所编码的氨基酸序列为Seq IDNo.2中第1位至第1382位氨基酸所示;a. The amino acid sequence encoded by the nCas9 ORF coding frame of the nCas9 nuclease protein D10A mutant is shown in amino acids 1 to 1382 in Seq ID No.2;
b、PmCDA1胞嘧啶脱氨酶编码区功能单元所编码的氨基酸序列为Seq ID No.2中第1383位至第1788位氨基酸所示。上述两个组件连接构成了nCas9-PmCDA1核酸酶-胞嘧啶脱氨酶融合蛋白表达框,氨基酸序列为序列表中的Seq ID No.2所示。b. The amino acid sequence encoded by the functional unit of the cytosine deaminase coding region of PmCDA1 is shown in amino acids 1383 to 1788 in Seq ID No.2. The above two components are connected to form the nCas9-PmCDA1 nuclease-cytidine deaminase fusion protein expression cassette, and the amino acid sequence is shown in Seq ID No.2 in the sequence list.
其中,上述骨架载体中所述sgRNA克隆及转录单元sgRNA cloning scaffold从5’端到3’端依次包含tRNA-Gly编码序列、BsaI-ccdB-BsaI单元、sgRNA骨架编码序列、tRNA-Gly编码序列。Wherein, the sgRNA cloning and transcription unit sgRNA cloning scaffold described in the above-mentioned backbone vector comprises tRNA-Gly coding sequence, BsaI-ccdB-BsaI unit, sgRNA backbone coding sequence, tRNA-Gly coding sequence from 5' end to 3' end sequentially.
其中,上述骨架载体中所述的sgRNA克隆及转录单元为1~6个。Wherein, the number of sgRNA clones and transcription units described in the above-mentioned backbone vector is 1-6.
其中,上述骨架载体中所述的sgRNA克隆及转录单元sgRNA cloning scaffold的核苷酸序列为Seq ID No.1中第7432bp至第8300bp所示。Wherein, the nucleotide sequence of the sgRNA clone and the transcription unit sgRNA cloning scaffold described in the above-mentioned backbone vector is shown in the 7432bp to the 8300bp in Seq ID No.1.
其中,上述骨架载体符合以下至少一项:Wherein, the above-mentioned skeleton carrier meets at least one of the following:
a、nCas9核酸酶蛋白D10A突变体编码框nCas9 ORF所编码的核苷酸序列为Seq IDNo.1中第2011bp至第6156bp所示;a. The nucleotide sequence encoded by the nCas9 ORF coding frame of nCas9 nuclease protein D10A mutant is shown in 2011bp to 6156bp in Seq ID No.1;
b、PmCDA1胞嘧啶脱氨酶编码区功能单元所编码的核苷酸序列为Seq ID No.1中第6157bp至第7374bp所示。b. The nucleotide sequence encoded by the functional unit of the cytosine deaminase coding region of PmCDA1 is shown in the 6157th bp to the 7374th bp in Seq ID No.1.
c、多聚A区域Poly A的核苷酸序列为Seq ID No.1中第7384bp至第7431bp所示c. The nucleotide sequence of Poly A in the poly A region is shown in 7384bp to 7431bp in Seq ID No.1
d、所述的终止子为水稻HSP终止子HSP T,其核苷酸序列为Seq ID No.1中第8307bp至第8556bp所示的核苷酸序列所示。d. The terminator is the rice HSP terminator HSP T, and its nucleotide sequence is shown in the nucleotide sequence from 8307 bp to 8556 bp in Seq ID No.1.
e、所述的PolⅡ型启动子为玉米pZmUbi1启动子pZmUbi1,其核苷酸序列为Seq IDNo.1中第1bp至第2008bp所示。e. The Pol II type promoter is the maize pZmUbi1 promoter pZmUbi1, and its nucleotide sequence is shown in the 1st bp to the 2008th bp in Seq ID No.1.
其中,上述骨架载体中所述的核心单元具有pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T的结构。进一步的,所述的pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T核心单元的核苷酸序列如Seq ID No.1所示。Wherein, the core unit described in the above-mentioned backbone vector has the structure of pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T. Further, the nucleotide sequence of the pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T core unit is shown in Seq ID No.1.
本发明的核心单元(pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloningscaffold-HSP T)可以针对具体转化宿主生物及实验需要,将其中的启动子和终止子替换为任何的Pol II型启动子元件(如:植物中常用的OsUb1、CaMV35S、AtUb10等启动子元件)及终止子元件(如:植物中常用的Nos T、35s T等终止子元件),并可以放置于任何植物表达骨架载体中(如:植物中常用的pCambia、pBI、pMDC、pGreen等载体系列),实现位点特异性定向碱基编辑。The core unit of the present invention (pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloningscaffold-HSP T) can be aimed at specific transformation host organisms and experimental needs, and replace the promoter and terminator with any Pol II type promoter element (such as: promoter elements such as OsUb1, CaMV35S, AtUb10 commonly used in plants) and terminator elements (such as: terminator elements such as Nos T and 35s T commonly used in plants), and can be placed in any plant expression backbone vector ( Such as: pCambia, pBI, pMDC, pGreen and other vector series commonly used in plants) to achieve site-specific directional base editing.
本发明中,基于植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体,完成构建具体植物基因组位点特异性STU nCas9-PmCDA1-sgRNA定向碱基编辑重组表达载体进行转化后,在活体细胞条件下,PolⅡ启动子驱动“nCas9 ORF-PmCDA1-Poly A-sgRNAcloning scaffold”作为整体转录单元转录得到单链初级转录本。在细胞内源tRNA加工因子作用下,单一初级转录本分别在两个tRNA位点处发生自剪切,得到完整nCas9 ORF-PmCDA1核酸酶-胞嘧啶脱氨酶融合蛋白表达框mRNA(含Poly A)及sgRNA转录单元。在细胞体系内,nCas9-PmCDA1核酸酶-胞嘧啶脱氨酶融合蛋白表达框(含Poly A)进一步进行翻译得到nCas9-PmCDA1核酸酶-胞嘧啶脱氨酶融合蛋白,并和已有的sgRNA单元结合形成功能性的nCas9-PmCDA1-sgRNA复合单元进行基因组目标位点特定胞嘧啶碱基定向编辑。In the present invention, based on the plant STU nCas9-PmCDA1 single transcription unit directional base editing backbone carrier, after completing the construction of specific plant genome site-specific STU nCas9-PmCDA1-sgRNA directional base editing recombinant expression vectors for transformation, in living cell conditions Under this condition, the PolⅡ promoter drives the "nCas9 ORF-PmCDA1-Poly A-sgRNAcloning scaffold" as an overall transcription unit to transcribe a single-stranded primary transcript. Under the action of endogenous tRNA processing factors, a single primary transcript was self-cleaved at two tRNA sites to obtain a complete nCas9 ORF-PmCDA1 nuclease-cytidine deaminase fusion protein expression frame mRNA (containing Poly A ) and sgRNA transcription unit. In the cell system, the nCas9-PmCDA1 nuclease-cytidine deaminase fusion protein expression cassette (including Poly A) is further translated to obtain the nCas9-PmCDA1 nuclease-cytidine deaminase fusion protein, which is combined with the existing sgRNA unit Combined to form a functional nCas9-PmCDA1-sgRNA complex unit for targeted editing of specific cytosine bases at the target site of the genome.
本发明中,完整的sgRNA由能够与所述靶标片段互补结合的18~21bp RNA片段替换骨架载体sgRNA克隆及转录单元中的BsaI-ccdB-BsaI单元而成,所述骨架RNA片段依次由可以结合protospacer位点的sgRNA、tracrRNA、crRNA嵌合形成类似发夹结构的功能性RNA,所述骨架RNA片段可与Cas9核酸酶结合。In the present invention, the complete sgRNA is formed by replacing the BsaI-ccdB-BsaI unit in the backbone carrier sgRNA clone and transcription unit with an 18-21bp RNA fragment capable of complementary binding to the target fragment, and the backbone RNA fragment is sequentially composed of The sgRNA, tracrRNA, and crRNA at the protospacer site are chimerized to form a functional RNA similar to a hairpin structure, and the backbone RNA fragment can be combined with the Cas9 nuclease.
针对具体的目标基因,确定sgRNA位点后(5’-NX-NGG-3’;N表示A、G、C、T中的任一种,X为整数,且14≤X≤30(18、19、20、21为常用值)),依据发明中提供的STU nCas9-PmCDA1-sgRNA重组表达载体构建方法,将设计的sgRNA特异性靶序列(protospacer)“BsaI酶切-T4 DNA连接酶连接”循环反应的方式,替换BsaI-ccdB-BsaI单元克隆入gRNA克隆及转录单元,得到特定的有功能的STU nCas9-PmCDA1-sgRNA重组表达载体。For the specific target gene, after determining the sgRNA site (5'-N X -NGG-3'; N represents any of A, G, C, T, X is an integer, and 14≤X≤30 (18 , 19, 20, and 21 are commonly used values)), according to the STU nCas9-PmCDA1-sgRNA recombinant expression vector construction method provided in the invention, the designed sgRNA-specific target sequence (protospacer) "BsaI digestion-T4 DNA ligase connection "Circular reaction method, replace the BsaI-ccdB-BsaI unit and clone into the gRNA clone and transcription unit to obtain a specific functional STU nCas9-PmCDA1-sgRNA recombinant expression vector.
本发明中,在sgRNA克隆转录框架单元端融合了BsaI-ccdB-BsaI单元,其作用是作为多克隆位点酶切CRISPR/Cas9单一转录单元骨架载体,以便克隆目标gRAN特异性靶序列(protospacer)。可将BsaI-ccdB-BsaI单元替换为可以在本发明骨架载体上引入切口的限制内切酶,并相应修改sgRAN特异性靶序列克隆位点,都可以有效实现本发明的关键内容。In the present invention, the BsaI-ccdB-BsaI unit is fused at the end of the sgRNA clone transcription framework unit, which acts as a multi-cloning site to digest the CRISPR/Cas9 single transcription unit backbone vector so as to clone the target gRAN-specific target sequence (protospacer) . The key content of the present invention can be effectively realized by replacing the BsaI-ccdB-BsaI unit with a restriction endonuclease that can introduce a nick on the backbone vector of the present invention, and modifying the sgRAN-specific target sequence cloning site accordingly.
在构建植物基因组位点特异性STU nCas9-PmCDA1-sgRNA定向碱基编辑重组表达载体过程中,可通过转化大肠杆菌、细菌筛选压筛选含正确Cas9-gRNA表达载体的重组克隆,并可采用菌落PCR、质粒酶切、序列测定等方式进行鉴定,以明确获得了用于目的植物基因组位点特异性STU nCas9-PmCDA1-sgRNA定向碱基编辑重组表达载体。In the process of constructing plant genome site-specific STU nCas9-PmCDA1-sgRNA-directed base editing recombinant expression vector, the recombinant clone containing the correct Cas9-gRNA expression vector can be screened by transforming Escherichia coli and bacterial screening pressure, and colony PCR can be used , Plasmid digestion, sequence determination and other methods were identified to clearly obtain the target plant genome site-specific STU nCas9-PmCDA1-sgRNA directional base editing recombinant expression vector.
应用融合PCR扩增策略,可以得到由tRNA序列间隔的多个sgRNA转录单元串联扩增产物,通过“BsaI酶切-T4 DNA连接酶连接”循环反应的方式,替换BsaI-ccdB-BsaI单元,可以将此多sgRNA转录单元克隆入sgRNA克隆及转录单元,得到可针对多个目标位点进行特异修饰的STU nCas9-PmCDA1-sgRNA1-sgRNA2-…-sgRNAx重组表达载体(参见图1)。优选的,上述骨架载体中所述的sgRNA克隆及转录单元为1~6个。Using the fusion PCR amplification strategy, multiple sgRNA transcription units separated by tRNA sequences can be amplified in series, and the BsaI-ccdB-BsaI unit can be replaced by the circular reaction method of "BsaI digestion-T4 DNA ligase ligation". This multiple sgRNA transcription unit is cloned into the sgRNA clone and transcription unit to obtain a STU nCas9-PmCDA1-sgRNA1-sgRNA2-…-sgRNAx recombinant expression vector that can be specifically modified for multiple target sites (see Figure 1). Preferably, the number of sgRNA clones and transcription units described in the above-mentioned backbone vector is 1-6.
本发明中,可通过原生质、基因枪及农杆菌介导的多种转化方法,将依据本发明构建的位点特异性STU nCas9-PmCDA1-sgRNA定向碱基编辑重组表达载体转入植物细胞,使转化细胞同时具有nCas9-PmCDA1核酸酶-胞嘧啶脱氨酶融合蛋白及针对特定基因组目标序列的sgRNA单元;在nCas9-PmCDA1核酸酶-胞嘧啶脱氨酶融合蛋白及sgRNA单元共同作用下,对特定基因组目标序列特定胞嘧啶碱基进行编辑(将其替换为T(大概率)、A(小概率)、G(小概率))。本发明所述的STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体在植物中应用时,可以使用包括卡那霉素、潮霉素、basta等抗性基因进行植物转化子筛选,由阳性转化子细胞或组织(如原生质体或愈伤组织)分化再生,得到而来包含目标位点定向修饰的再生植株。In the present invention, the site-specific STU nCas9-PmCDA1-sgRNA directional base-editing recombinant expression vector constructed according to the present invention can be transferred into plant cells through various transformation methods mediated by protoplasts, gene guns, and Agrobacterium, so that Transformed cells have both nCas9-PmCDA1 nuclease-CD fusion protein and sgRNA unit targeting specific genomic target sequences; under the joint action of nCas9-PmCDA1 nuclease-CD fusion protein and sgRNA unit, specific Genomic target sequence specific cytosine bases are edited (replace them with T (high probability), A (low probability), G (low probability)). When the STU nCas9-PmCDA1 single transcription unit directional base editing backbone vector of the present invention is applied in plants, resistance genes including kanamycin, hygromycin, basta, etc. can be used to screen plant transformants, and positive transformation Daughter cells or tissues (such as protoplasts or callus) are differentiated and regenerated to obtain regenerated plants containing targeted modification of the target site.
实施例1植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体的构建Example 1 Construction of Plant STU nCas9-PmCDA1 Single Transcription Unit Directed Base Editing Backbone Vector
本发明设计一种用于植物基因组工程的STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体,其核心单元从5’到3’方向依次为pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T。其中:pZmUbi1即玉米pZmUbi1启动子(可通过AscI、SbfI双酶切基础STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体的方案,实现不同PolII型启动子的替换);nCas9 ORF为化脓性链球菌(Streptococcus pyogenes)核酸酶蛋白D10A突变体编码框(包含C端NLS信号肽);PmCDA1为胞嘧啶脱氨酶编码区功能单元(从N端到C端依次包含GGGS接头、SH3接头、PmCDA1编码区、NLS信号肽、UGI编码区、SGGS接头、NLS信号肽);Poly A为多聚A区域;sgRNA cloning scaffold(简写为sgRNA CS)即sgRNA克隆及转录单元(从5’端到3’端依次包含tRNA-Gly编码序列、BsaI-ccdB-BsaI单元、sgRNA骨架编码序列、tRNA-Gly编码序列);HSP T即水稻HSP终止子(可通过BamHI、SacI双酶切基础STUnCas9-PmCDA1单一转录单元定向碱基编辑骨架载体的方案,实现不同终止子的替换)。植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体结构及工作原理见图1。The present invention designs a STU nCas9-PmCDA1 single transcription unit directional base editing backbone carrier for plant genome engineering, and its core unit is pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold from 5' to 3' direction -HSP T. Among them: pZmUbi1 is the maize pZmUbi1 promoter (the basic STU nCas9-PmCDA1 single transcription unit-directed base-editing backbone carrier scheme can be used to realize the replacement of different PolII-type promoters through AscI and SbfI double enzyme digestion); nCas9 ORF is the suppurative chain Cocci (Streptococcus pyogenes) nuclease protein D10A mutant coding frame (including C-terminal NLS signal peptide); region, NLS signal peptide, UGI coding region, SGGS linker, NLS signal peptide); Poly A is the poly A region; sgRNA cloning scaffold (abbreviated as sgRNA CS) is the sgRNA clone and transcription unit (from 5' end to 3' end Contains tRNA-Gly coding sequence, BsaI-ccdB-BsaI unit, sgRNA backbone coding sequence, tRNA-Gly coding sequence); HSP T is rice HSP terminator (basic STUnCas9-PmCDA1 single transcription unit can be digested by BamHI and SacI double enzymes A protocol for directed base editing backbone vectors to achieve the replacement of different terminators). The structure and working principle of the plant STU nCas9-PmCDA1 single transcription unit directed base editing backbone vector are shown in Figure 1.
可选地,该骨架载体还包括:T-DNA的左、右边界序列,“pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T”核心单元位于所述T-DNA左、右边界之间;T-DNA的左、右边界序列间还可包括潮霉素抗性基因表达单元(依次组成元件为:2×CaMV35S启动子-hygromycin phosphotransferase ORF-CaMV poly A;可以通过AvrII、PacI双酶切基础CRISPR/Cas9单一转录单元骨架载体的方案,实现不同抗性基因ORF的替换)作为植物转化子筛选标记。Optionally, the backbone vector also includes: the left and right border sequences of T-DNA, and the "pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T" core unit is located at the left and right borders of the T-DNA between the left and right border sequences of T-DNA can also include the hygromycin resistance gene expression unit (consisting elements in turn are: 2×CaMV35S promoter-hygromycin phosphotransferase ORF-CaMV poly A; can pass AvrII, PacI double The scheme of enzyme-digesting basic CRISPR/Cas9 single transcription unit backbone vector to realize the replacement of ORF of different resistance genes) as a screening marker for plant transformants.
为了实现特定基因组目标STU nCas9-PmCDA1-sgRNA定向碱基编辑重组表达载体的快捷、高效构建,本发明所述的植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体在sgRNA转录表达单元的5’端融入637bp的的BsaI-ccdB-BsaI单元,基于此设计策略,在后续目标STU nCas9-PmCDA1-sgRNA定向碱基编辑重组表达载体构建过程中,仅需在构建体系中混合本发明所述的植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体、退火的特异性靶序列互补寡核苷酸双链片段、BsaI内切酶及T4 DNA连接酶,并设置“37℃酶切-16℃连接”循环反应,即可实现特定Cas9-gRNA表达载体的有效构建。该载体可以采用现有分子克隆技术中的常规方式来构建,同时需要说明的是上述元件序列是该骨干质粒载体的特有部分,其还可以包括一些常规载体所具有的一般结构,本发明中不再累述。In order to realize the rapid and efficient construction of the specific genome target STU nCas9-PmCDA1-sgRNA directional base editing recombinant expression vector, the plant STU nCas9-PmCDA1 single transcription unit directional base editing backbone vector of the present invention is in the 5 The 637bp BsaI-ccdB-BsaI unit is integrated into the ' end. Based on this design strategy, in the subsequent construction process of the target STU nCas9-PmCDA1-sgRNA directional base editing recombinant expression vector, it is only necessary to mix the BsaI described in the present invention in the construction system. Plant STU nCas9-PmCDA1 single transcription unit directional base editing backbone vector, annealed specific target sequence complementary oligonucleotide double-stranded fragment, BsaI endonuclease and T4 DNA ligase, and set "37°C enzyme digestion -16°C The efficient construction of a specific Cas9-gRNA expression vector can be realized by connecting” cycle reaction. The vector can be constructed in a conventional way in the existing molecular cloning technology, and it should be noted that the above-mentioned element sequence is a unique part of the backbone plasmid vector, and it can also include some general structures of conventional vectors, which are not included in the present invention. Let me tell you more.
基于酿脓链球菌(Streptococcus pyogenes)Cas9核酸酶蛋白编码基因(Streptococcus pyogenes Cas9,SpCas9)进行植物密码子优化(3’端添加NLS信号),并引入D10A突变,人工合成nCas9核酸酶蛋白D10A突变体编码框(包含C端NLS编码序列)完整ORF序列,核苷酸序列如Seq ID No.1中第2011bp至第6156bp所示(编码的氨基酸序列如Seq IDNo.2中第1AA至第1382AA所示)。同时,依据七鳃鳗(Petromyzon marinus)胞嘧啶脱氨酶(PmCDA1)检出序列(Nishida K,Arazoe T,Yachie N,Banno S,Kakimoto M,Tabata M,Mochizuki M,Miyabe A,Araki M,Hara KY,Shimatani Z,Kondo A.2016.Targetednucleotide editing using hybrid prokaryotic and vertebrate adaptive immunesystems.Science,353(6305).pii:aaf8729),设计PmCDA1胞嘧啶脱氨酶表达单元编码框(从N端到C端依次包含GGGS接头、SH3接头、PmCDA1编码区、NLS信号肽、UGI编码区、SGGS接头、NLS信号肽),进行植物密码子优化并进行人工合成,核苷酸序列如Seq ID No.1中第6157bp至第7374bp所示(编码的氨基酸序列如Seq ID No.2中第1383AA至第1788AA所示)。进一步,通过人工合成方式得到另外3个基本单元:Based on Streptococcus pyogenes (Streptococcus pyogenes) Cas9 nuclease protein coding gene (Streptococcus pyogenes Cas9, SpCas9), the plant codon was optimized (the NLS signal was added at the 3' end), and the D10A mutation was introduced to artificially synthesize the nCas9 nuclease protein D10A mutant The complete ORF sequence of the coding frame (including the C-terminal NLS coding sequence), the nucleotide sequence is shown in 2011bp to 6156bp in Seq ID No.1 (the encoded amino acid sequence is shown in 1AA to 1382AA in Seq ID No.2 ). At the same time, according to the detection sequence of lamprey (Petromyzon marinus) cytosine deaminase (PmCDA1) (Nishida K, Arazoe T, Yachie N, Banno S, Kakimoto M, Tabata M, Mochizuki M, Miyabe A, Araki M, Hara KY, Shimatani Z, Kondo A.2016.Targetednucleotide editing using hybrid prokaryotic and vertebrate adaptive immunesystems.Science,353(6305).pii:aaf8729), design PmCDA1 cytosine deaminase expression unit coding frame (from N-terminal to C-terminal Contains GGGS linker, SH3 linker, PmCDA1 coding region, NLS signal peptide, UGI coding region, SGGS linker, NLS signal peptide in turn), plant codon optimization and artificial synthesis, nucleotide sequence as in Seq ID No.1 6157bp to 7374bp (the coded amino acid sequence is shown as 1383AA to 1788AA in Seq ID No.2). Further, another 3 basic units are obtained through artificial synthesis:
a、frag-A:Poly A,核苷酸序列如Seq ID No.1中第7384bp至第7431bp所示;a. frag-A: Poly A, the nucleotide sequence is shown as 7384bp to 7431bp in Seq ID No.1;
b、frag-B:sgRNA克隆及转录单元编码序列,从5’端到3’端依次包含tRNA-Gly编码序列、BsaI-ccdB-BsaI单元、sgRNA骨架编码序列、tRNA-Gly编码序列:核苷酸序如Seq IDNo.1中第7432bp至第8300bp所示);b. frag-B: sgRNA clone and transcription unit coding sequence, including tRNA-Gly coding sequence, BsaI-ccdB-BsaI unit, sgRNA backbone coding sequence, tRNA-Gly coding sequence from 5' end to 3' end: nucleoside The acid sequence is shown in the 7432bp to the 8300bp in Seq ID No.1);
c、frag-C:水稻HSP终止子编码序列,核苷酸序列如Seq ID No.1中第8307bp至第8556bp所示。c. frag-C: rice HSP terminator coding sequence, the nucleotide sequence is shown in the 8307th bp to the 8556th bp in Seq ID No.1.
通过融合PCR方式,依次将nCas9核酸酶蛋白D10A突变体编码框编码序列、PmCDA1胞嘧啶脱氨酶表达单元编码框编码序列、frag-A、frag-B、frag-C进行融合,并分别在融合PCR产物5’、3’端添加SbfI、SacI限制酶切位点,得到6560bp组装单元。By means of fusion PCR, the coding sequence of the coding frame of the nCas9 nuclease protein D10A mutant, the coding sequence of the coding frame of the PmCDA1 cytosine deaminase expression unit, frag-A, frag-B, and frag-C were sequentially fused, and respectively in the fusion SbfI and SacI restriction enzyme cutting sites were added to the 5' and 3' ends of the PCR product to obtain a 6560bp assembly unit.
分别对载体骨架pGEL026(Tang X,Zheng X,Qi YP,Zhang D,Cheng Y,Tang A,Voytas DF,Zhang Y.2016.A single transcript CRISPR-Cas9 system for efficientgenome editing in plants.Molecular Plant,9(7):1088-1091.)及6560bp组装单元进行SbfI、SacI双酶切,回收目标片段,进行连接、转化。针对筛选的阳性克隆进行菌落PCR、质粒限制酶切、DNA测序确认了将6560bp组装单元克隆入了pGEL026原有的pZmUbi1下游,完成植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体的构建。The vector backbone pGEL026 (Tang X, Zheng X, Qi YP, Zhang D, Cheng Y, Tang A, Voytas DF, Zhang Y. 2016. A single transcript CRISPR-Cas9 system for efficient genome editing in plants. Molecular Plant, 9( 7): 1088-1091.) and 6560bp assembly unit were subjected to SbfI and SacI double enzyme digestion, and the target fragment was recovered for connection and transformation. Colony PCR, plasmid restriction enzyme digestion, and DNA sequencing were performed on the screened positive clones to confirm that the 6560bp assembly unit was cloned into the downstream of the original pZmUbi1 of pGEL026, and the construction of the plant STU nCas9-PmCDA1 single transcription unit-directed base editing backbone vector was completed.
实施例2基于STU nCas9-PmCDA1系统的水稻内源基因胞嘧啶定向碱基编辑效率高通量鉴定Example 2 High-throughput identification of rice endogenous gene cytosine-directed base editing efficiency based on STU nCas9-PmCDA1 system
1.水稻内源基因sgRNA设计及STU nCas9-PmCDA1碱基编辑重组表达载体构建1. Rice endogenous gene sgRNA design and STU nCas9-PmCDA1 base editing recombinant expression vector construction
针对水稻OsCDC48(LOC_Os03g05730)、OsROC5(LOC_Os02g45250)编码基因,检索5’-NGG-3’PAM位点,选取PAM上游20bp序列设计sgRNA(表1)。For rice OsCDC48 (LOC_Os03g05730) and OsROC5 (LOC_Os02g45250) coding genes, the 5'-NGG-3'PAM site was searched, and the 20bp sequence upstream of the PAM was selected to design sgRNA (Table 1).
表1水稻内源基因sgRNAcrRNA设计、合成及检测信息Table 1 Design, synthesis and detection information of rice endogenous gene sgRNAcrRNA
依据设计的sgRNA位点核酸序列,人工合成对应的正、反向寡核苷酸链,具体序列如下(大写碱基序列代表所设计的位点特异性向导sgRNA位点;小写碱基序列代表与骨架载体互补的粘性末端):According to the designed sgRNA site nucleic acid sequence, artificially synthesize the corresponding forward and reverse oligonucleotide chains, the specific sequence is as follows (uppercase base sequence represents the designed site-specific guide sgRNA site; lowercase base sequence represents the site with Backbone vector complementary cohesive ends):
BE-OsCDC48-sgRNA01-F(Seq ID No.10):tgcaGACCAGCCAGCGTCTGGCGC;BE-OsCDC48-sgRNA01-F (Seq ID No.10): tgcaGACCAGCCAGCGTCTGGCGC;
BE-OsCDC48-sgRNA01-R(Seq ID No.11):aaacGCGCCAGACGCTGGCTGGTC;BE-OsCDC48-sgRNA01-R (Seq ID No.11): aaacGCGCCAGACGCTGGCTGGTC;
BE-OsROC5-gRNA04-F(Seq ID No.12):tgcaGCAGCTGGCTGAGGGTGCAT;BE-OsROC5-gRNA04-F (Seq ID No.12): tgcaGCAGCTGGCTGAGGGTGCAT;
BE-OsROC5-gRNA04-R(Seq ID No.13):aaacATGCACCCTCAGCCAGCTGC;BE-OsROC5-gRNA04-R (Seq ID No.13): aaacATGCACCCTCAGCCAGCTGC;
BE-OsROC5-gRNA05-F(Seq ID No.14):tgcaAGCCAGCTGCTTACAAAAC;BE-OsROC5-gRNA05-F (Seq ID No.14): tgcaAGCCAGCTGCTTACAAAAC;
BE-OsROC5-gRNA05-R(Seq ID No.15):aaacGTTTTGTAAGCAGCTGGCT。BE-OsROC5-gRNA05-R (Seq ID No. 15): aaacGTTTTGTAAGCAGCTGGCT.
分别将BE-OsCDC48-sgRNA01-F/R、BE-OsROC5-gRNA04-F/R、BE-OsROC5-gRNA05-F/R等比例混合,沸水浴10min,而后自然降温退火,形成具有粘性末端的双链DNA,作为构建重组载体的插入片段。于200uL PCR管中加入植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体、粘末端插入片段、BsaI内切酶和T4 DNA连接酶,设置“37℃酶切-16℃连接”10个循环反应,80℃处理失活内切及连接酶后,取反应产物进行大肠杆菌转化。Mix BE-OsCDC48-sgRNA01-F/R, BE-OsROC5-gRNA04-F/R, and BE-OsROC5-gRNA05-F/R in equal proportions, put them in a boiling water bath for 10 minutes, and then cool down and anneal naturally to form a doublet with sticky ends. stranded DNA as an insert for constructing recombinant vectors. Add the plant STU nCas9-PmCDA1 single transcription unit directed base editing backbone vector, sticky-end insert, BsaI endonuclease and T4 DNA ligase into a 200uL PCR tube, and set up 10 cycles of "37°C digestion-16°C ligation" After the reaction, the endonuclease and ligase were inactivated after being treated at 80°C, and the reaction product was taken for Escherichia coli transformation.
通过卡那霉素抗性筛选、菌落PCR及酶切鉴定阳性转化子,最终通过经测序验证分别得到了STU nCas9-PmCDA1-OsCDC48-sgRNA01、STU nCas9-PmCDA1-OsROC5-gRNA04、STUnCas9-PmCDA1-OsROC5-gRNA05重组表达载体。Positive transformants were identified by kanamycin resistance screening, colony PCR and enzyme digestion, and finally STU nCas9-PmCDA1-OsCDC48-sgRNA01, STU nCas9-PmCDA1-OsROC5-gRNA04, STUnCas9-PmCDA1-OsROC5 were obtained through sequencing verification -gRNA05 recombinant expression vector.
2.水稻内源基因STU nCas9-PmCDA1-sgRNA碱基编辑重组表达载体的水稻原生质体转化2. Rice protoplast transformation of rice endogenous gene STU nCas9-PmCDA1-sgRNA base editing recombinant expression vector
分离水稻日本晴原生质体,基于PEG介导的转化方法,分别进行STU nCas9-PmCDA1-OsCDC48-sgRNA01、STU nCas9-PmCDA1-OsROC5-gRNA04、STU nCas9-PmCDA1-OsROC5-gRNA05重组表达载体的水稻原生质体转化。水稻原生质体转化具体过程可参考文献(Tang X,Zheng X,Qi YP,Zhang D,Cheng Y,Tang A,Voytas DF,Zhang Y.2016.Asingle transcript CRISPR-Cas9 system for efficient genome editing inplants.Molecular Plant,9(7):1088-1091.)中公开的实验方法。Isolate rice Nipponbare protoplasts, and transform rice protoplasts with STU nCas9-PmCDA1-OsCDC48-sgRNA01, STU nCas9-PmCDA1-OsROC5-gRNA04, and STU nCas9-PmCDA1-OsROC5-gRNA05 recombinant expression vectors based on the PEG-mediated transformation method . The specific process of rice protoplast transformation can refer to the literature (Tang X, Zheng X, Qi YP, Zhang D, Cheng Y, Tang A, Voytas DF, Zhang Y.2016. Asingle transcript CRISPR-Cas9 system for efficient genome editing inplants. Molecular Plant , 9(7):1088-1091.) The experimental method disclosed in.
3.水稻内源OsCDC48、OsROC5基因特定位点胞嘧啶碱基定向编辑检测3. Detection of targeted editing of cytosine bases at specific sites in rice endogenous OsCDC48 and OsROC5 genes
水稻原生质体转化后,25℃暗培养48小时,收集转化细胞,CTAB方法提取水稻原生质体基因组DNA,以该DNA为模板,进行PCR扩增及Illumina高通量测序,具体方法参考文献(Tang X,Lowder LG,Zhang T,Malzahn A,Zheng X,Voytas DF,Zhong Z,Chen Y,Ren Q,LiQ,Kirkland ER,Zhang Y,Qi Y.2017.A CRISPR-Cpf1 system for efficient genomeediting and transcriptional repression in plants.Nature Plants,3:17018.)中公开的实验方法。After the rice protoplasts were transformed, they were cultured in the dark at 25°C for 48 hours, and the transformed cells were collected. The genomic DNA of the rice protoplasts was extracted by the CTAB method, and the DNA was used as a template for PCR amplification and Illumina high-throughput sequencing. For specific methods, refer to the literature (Tang X ,Lowder LG,Zhang T,Malzahn A,Zheng X,Voytas DF,Zhong Z,Chen Y,Ren Q,LiQ,Kirkland ER,Zhang Y,Qi Y.2017.A CRISPR-Cpf1 system for efficient genome editing and transcriptional repression in plants. Nature Plants, 3: 17018.).
Illumina高通量测序结果分析表明,针对水稻OsCDC48、OsROC5内源基因共3个胞嘧啶碱基编辑目标序列分别实现了28.62%(OsCDC48-sgRNA01)、30.99%(OsROC5-gRNA04)、49.41%(OsROC5-gRNA05)的胞嘧啶碱基替换为胸腺嘧啶碱基的编辑效率(图2:nCas9-PmCDA1)。特别指出的是,作为对照组(依据参考文献报道(Komor AC,Kim YB,PackerMS,Zuris JA,Liu DR.2016.Programmable editing of a target base in genomic DNAwithout double-stranded DNA cleavage.Nature,533(7603):420-424.),将rApobec1胞嘧啶脱氨酶替换本发明构建的植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体中的PmCDA1单元),相同目标序列的胞嘧啶编辑效率分别为7.65%(OsCDC48-sgRNA01)、4.44%(OsROC5-gRNA04)、4.16%(OsROC5-gRNA05)(图2:nCas9-rApobec1)。Analysis of Illumina high-throughput sequencing results showed that 28.62% (OsCDC48-sgRNA01), 30.99% (OsROC5-gRNA04), 49.41% (OsROC5 -gRNA05) the editing efficiency of cytosine bases replaced by thymine bases (Figure 2: nCas9-PmCDA1). In particular, as a control group (according to reference reports (Komor AC, Kim YB, PackerMS, Zuris JA, Liu DR.2016. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature, 533 (7603 ): 420-424.), the rApobec1 cytosine deaminase is replaced by the PmCDA1 unit in the plant STU nCas9-PmCDA1 single transcription unit directed base editing backbone vector constructed by the present invention), and the cytosine editing efficiencies of the same target sequence are respectively 7.65% (OsCDC48-sgRNA01), 4.44% (OsROC5-gRNA04), 4.16% (OsROC5-gRNA05) (Figure 2: nCas9-rApobec1).
针对测定的水稻3个胞嘧啶碱基编辑目标序列,依据Illumina高通量测序结果,进一步分析具体编辑位点处独立胞嘧啶碱基位点替换为胸腺嘧啶碱基的编辑效率表明:OsCDC48-sgRNA01处C3、C4、C7、C8、C11、C14共6个胞嘧啶碱基发生了有效胸腺嘧啶碱基替换编辑(图3:nCas9-PmCDA1);OsROC5-gRNA04处C2、C5共2个胞嘧啶碱基发生了有效胸腺嘧啶碱基替换编辑(图4:nCas9-PmCDA1);OsROC5-gRNA05处C-1、C3、C4共3个胞嘧啶碱基发生了有效胸腺嘧啶碱基替换编辑(图5:nCas9-PmCDA1)。特别指出的是,基于本发明构建的植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体,在OsROC5-gRNA05处-1bp的C-1位点,检测到了37.94%的胞嘧啶碱基替换为胸腺嘧啶碱基的编辑效率(图5:nCas9-PmCDA1),这种位于sgRNA靶向序列区间外侧的胞嘧啶碱基编辑事件,在现有研究中还未能实现。同时,作为对照组,在测试的3个水稻胞嘧啶碱基编辑目标序列处独立胞嘧啶碱基位点替换为胸腺嘧啶碱基的编辑窗口、编辑效率均显著低于本发明实施例(图3:nCas9-rApobec1;图4:nCas9-rApobec1;图5:nCas9-rApobec1;)。Based on the determined three cytosine base editing target sequences in rice, based on the Illumina high-throughput sequencing results, further analysis of the editing efficiency of replacing independent cytosine base sites with thymine bases at specific editing sites showed that: OsCDC48-sgRNA01 A total of 6 cytosine bases at C3, C4, C7, C8, C11, and C14 had effective thymine base substitution editing (Figure 3: nCas9-PmCDA1); a total of 2 cytosine bases at C2 and C5 at OsROC5-gRNA04 Effective thymine base substitution editing occurred at the base (Figure 4: nCas9-PmCDA1); a total of 3 cytosine bases C-1, C3, and C4 at OsROC5-gRNA05 had effective thymine base substitution editing (Figure 5: nCas9-PmCDA1). In particular, based on the plant STU nCas9-PmCDA1 single transcription unit directional base editing backbone carrier constructed by the present invention, at the C-1 site of -1bp at OsROC5-gRNA05, 37.94% of cytosine bases were replaced by The editing efficiency of thymine bases (Figure 5: nCas9-PmCDA1), a cytosine base editing event located outside the sgRNA target sequence interval, has not been achieved in existing studies. At the same time, as a control group, the editing window and editing efficiency of replacing independent cytosine base sites with thymine bases at the three tested rice cytosine base editing target sequences were significantly lower than those in the examples of the present invention (Figure 3 : nCas9-rApobec1; Figure 4: nCas9-rApobec1; Figure 5: nCas9-rApobec1;).
以上测试结果说明,基于本发明构建的植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体,可以有效实现水稻内源基因特定区域胞嘧啶碱基高效定向编辑,并可以进一步拓展编辑窗口范围。The above test results show that the plant STU nCas9-PmCDA1 single transcription unit directional base editing backbone vector constructed based on the present invention can effectively realize efficient directional editing of cytosine bases in specific regions of rice endogenous genes, and can further expand the scope of editing window.
实施例3基于STU nCas9-PmCDA1系统的水稻内源基因胞嘧啶定向碱基编辑再生植株创制及效率分析Example 3 Based on the STU nCas9-PmCDA1 system, the creation and efficiency analysis of rice endogenous gene cytosine-directed base editing regenerated plants
1.水稻内源基因STU nCas9-PmCDA1-sgRNA碱基编辑重组表达载体的农杆菌转化1. Agrobacterium transformation of rice endogenous gene STU nCas9-PmCDA1-sgRNA base editing recombinant expression vector
将实施例2中成功构建,且于水稻原生质体中检测过定向修饰活性的STU nCas9-PmCDA1-OsCDC48-sgRNA01、STU nCas9-PmCDA1-OsROC5-gRNA04、STU nCas9-PmCDA1-OsROC5-gRNA05重组表达载体通过热激法分别转化农杆菌EHA105感受态细胞,涂布在含50毫克/升卡那霉素和50毫克/升利福平的LB固体培养基上,28℃黑暗培养2天后,得到阳性克隆。阳性克隆在含50毫克/升卡那霉素和50毫克/升利福平的LB液体培养基中活化,用于后续转化。The STU nCas9-PmCDA1-OsCDC48-sgRNA01, STU nCas9-PmCDA1-OsROC5-gRNA04, and STU nCas9-PmCDA1-OsROC5-gRNA05 recombinant expression vectors successfully constructed in Example 2 and detected in rice protoplasts for directional modification activity were passed The competent cells of Agrobacterium EHA105 were transformed by the heat shock method, spread on LB solid medium containing 50 mg/L kanamycin and 50 mg/L rifampicin, and cultured in the dark at 28°C for 2 days to obtain positive clones. Positive clones were activated in LB liquid medium containing 50 mg/L kanamycin and 50 mg/L rifampicin for subsequent transformation.
2.农杆菌介导的水稻内源基因STU nCas9-PmCDA1-sgRNA碱基编辑重组表达载体的水稻愈伤转化2. Agrobacterium-mediated rice callus transformation of rice endogenous gene STU nCas9-PmCDA1-sgRNA base-edited recombinant expression vector
将STU nCas9-PmCDA1-OsCDC48-sgRNA01、STU nCas9-PmCDA1-OsROC5-gRNA04、STUnCas9-PmCDA1-OsROC5-gRNA05重组表达载体通过农杆菌介导的转化方法,分别进行水稻愈伤组织转化。转化的具体过程参考文献(Tang X,Zheng X,Qi YP,Zhang D,Cheng Y,TangA,Voytas DF,Zhang Y.2016.A single transcript CRISPR-Cas9 system for efficientgenome editing in plants.Molecular Plant,9(7):1088-1091.)中公开的实验方法。The recombinant expression vectors of STU nCas9-PmCDA1-OsCDC48-sgRNA01, STU nCas9-PmCDA1-OsROC5-gRNA04, and STUnCas9-PmCDA1-OsROC5-gRNA05 were transformed into rice callus by Agrobacterium-mediated transformation. References for the specific process of transformation (Tang X, Zheng X, Qi YP, Zhang D, Cheng Y, TangA, Voytas DF, Zhang Y.2016.A single transcript CRISPR-Cas9 system for efficient genome editing in plants. Molecular Plant, 9( 7): 1088-1091.) The experimental method disclosed in.
3.水稻内源基因STU nCas9-PmCDA1-sgRNA碱基编辑重组表达载体稳定转化再生植株定向碱基编辑检测及效率分析3. Targeted base editing detection and efficiency analysis of rice endogenous gene STU nCas9-PmCDA1-sgRNA base editing recombinant expression vector stably transformed and regenerated plants
待转化后抗性愈伤诱导成水稻幼苗,提取水稻转化再生幼苗基因组DNA,以该DNA为模板进行PCR扩增及Sanger测序分析。STU nCas9-PmCDA1-OsCDC48-sgRNA01、STU nCas9-PmCDA1-OsROC5-gRNA04、STU nCas9-PmCDA1-OsROC5-gRNA05重组表达载体水稻转化再生植株分析表明,针对水稻OsCDC48、OsROC5内源基因共3个胞嘧啶碱基编辑目标序列分别实现了44.44%(OsCDC48-sgRNA01:8/18)、100.00%(OsROC5-gRNA04:26/26)、68.75%(OsROC5-gRNA05:11/16)的碱基编辑效率(图6:nCas9-PmCDA1)。该测试结果进一步说明,本发明构建的植物STU nCas9-PmCDA1单一转录单元定向碱基编辑骨架载体,可以有效实现水稻内源基因特定区域胞嘧啶碱基高效定向编辑,并获得碱基编辑再生植株,是一种有效实现植物基因组碱基定向编辑的分子工具,可以提升基因组工程化作物改良效率。After the transformation, the resistant callus was induced into rice seedlings, and the genomic DNA of the transformed and regenerated rice seedlings was extracted, and the DNA was used as a template for PCR amplification and Sanger sequencing analysis. STU nCas9-PmCDA1-OsCDC48-sgRNA01, STU nCas9-PmCDA1-OsROC5-gRNA04, STU nCas9-PmCDA1-OsROC5-gRNA05 recombinant expression vectors The analysis of rice transformed and regenerated plants showed that three cytosine bases targeting rice OsCDC48 and OsROC5 endogenous genes Base editing target sequences achieved base editing efficiencies of 44.44% (OsCDC48-sgRNA01: 8/18), 100.00% (OsROC5-gRNA04: 26/26), and 68.75% (OsROC5-gRNA05: 11/16) (Figure 6 : nCas9-PmCDA1). The test results further illustrate that the plant STU nCas9-PmCDA1 single transcription unit directional base editing backbone carrier constructed by the present invention can effectively realize efficient directional editing of cytosine bases in specific regions of rice endogenous genes, and obtain base editing regenerated plants, It is a molecular tool to effectively realize the targeted editing of plant genome bases, which can improve the efficiency of genome engineering crop improvement.
核苷酸和氨基酸序列Nucleotide and Amino Acid Sequences
Seq ID No.1:pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T的核苷酸序列Seq ID No.1: Nucleotide sequence of pZmUbi1-nCas9 ORF-PmCDA1-Poly A-sgRNA cloning scaffold-HSP T
CGCGCCTGCAGTGCAGCGTGACCCGGTCGTGCCCCTCTCTTGAGATAATGAGCATTGCATGTCTAAGTTATAAAAAATTACCACATATTTTTTTTGTCACACTTGTTTGAAGTGCAGTTTATCTATCTTTATACATATATTTAAACTTTACTCTACGAATAATATAATCTATAGTACTACAATAATATCAGTGTTTTAGAGAATCATATAAATGAACAGTTAGACATGGTCTAAAGGACAATTGAGTATTTTGACAACAGGACTCTACAGTTTTATCTTTTTAGTGTGCATGTGTTCTCCTTTTTTTTTGCAAATAGCTTCACCTATATAATACTTCATCCATTTTATTAGTACATCCATTTAGGGTTTAGGGTTAATGGTTTTTATAGACTAATTTTTTTAGTACATCTATTTTATTCTATTTTAGCCTCTAAATTAAGAAAACTAAAACTCTATTTTAGTTTTTTTATTTAATAATTTAGATATAAAATAGAATAAAATAAAGTGACTAAAAATTAAACAAATACCCTTTAAGAAATTAAAAAAACTAAGGAAACATTTTTCTTGTTTCGAGTAGATAATGCCAGCCTGTTAAACGCCGTCGACGAGTCTAACGGACACCAACCAGCGAACCAGCAGCGTCGCGTCGGGCCAAGCGAAGCAGACGGCACGGCATCTCTGTCGCTGCCTCTGGACCCCTCTCGAGAGTTCCGCTCCACCGTTGGACTTGCTCCGCTGTCGGCATCCAGAAATTGCGTGGCGGAGCGGCAGACGTGAGCCGGCACGGCAGGCGGCCTCCTCCTCCTCTCACGGCACCGGCAGCTACGGGGGATTCCTTTCCCACCGCTCCTTCGCTTTCCCTTCCTCGCCCGCCGTAATAAATAGACACCCCCTCCACACCCTCTTTCCCCAACCTCGTGTTGTTCGGAGCGCACACACACACAACCAGAACTCCCCCAAATCCACCCGTCGGCACCTCCGCTTCAAGGTACGCCGCTCGTCCTCCCCCCCCCCCCCTCTCTACCTTCTCAAGATCGGCGTTCCGGTCCATGGTTAGGGCCCGGTAGTTCTACTTCTGTTCATGTTTGTGTTAGATCCGTGTTTGTGTTAGATCCGTGCTACTAGCGTTCGTACACGGATGCGACCTGTACGTCAGACACGTTCTGATTGCTAACTTGCCAGTGTTTCTCTTTGGGGAATCCTGGGATGGCTCTAGCCGTTCCGCAGACGGGATCGATTTCATGATTTTTTTTGTTTCGTTGCATAGGGTTTGGTTTGCCCTTTTCCTTTATTTCAATATATGCCGTGCACTTGTTTGTCGGGTCATCTTTTCATGCTTTTTTTTGTCTTGGTTGTGATGATGTGGTCTGGTTGGGCGGTCGTTCAAGATCGGAGTAGAATTAATTCTGTTTCAAACTACCTGGTGGATTTATTAATTTTGGATCTGTATGTGTGTGCCATACATATTCATAGTTACGAATTGAAGATGATGGATGGAAATATCGATCTAGGATAGGTATACATGTTGATGCGGGTTTTACTGATGCATATACAGAGATGCTTTTTGTTCGCTTGGTTGTGATGATGTGGTGTGGTTGGGCGGTCGTTCATTCGTTCAAGATCGGAGTAGAATACTGTTTCAAACTACCTGGTGTATTTATTAATTTTGGAACTGTATGTGTGTGTCATACATCTTCATAGTTACGAGTTTAAGATGGATGGAAATATCGATCTAGGATAGGTATACATGTTGATGTGGGTTTTACTGATGCATATACATGATGGCATATGCAGCATCTATTCATATGCTCTAACCTTGAGTACCTATCTATTATAATAAACAAGTATGTTTTATAATTATTTTGATCTTGATATACTTGGATGATGGCATATGCAGCAGCTATATGTGGATTTTTTTAGCCCTGCCTTCATACGCTATTTATTTGCTTGGTACTGTTTCTTTTGTCGATGCTCACCCTGTTGTTTGGTGTTACTTCTGCAGCCTGCAGGATGGATAAGAAGTACTCTATCGGACTCGCTATCGGAACTAACTCTGTGGGATGGGCTGTGATCACCGATGAGTACAAGGTGCCATCTAAGAAGTTCAAGGTTCTCGGAAACACCGATAGGCACTCTATCAAGAAAAACCTTATCGGTGCTCTCCTCTTCGATTCTGGTGAAACTGCTGAGGCTACCAGACTCAAGAGAACCGCTAGAAGAAGGTACACCAGAAGAAAGAACAGGATCTGCTACCTCCAAGAGATCTTCTCTAACGAGATGGCTAAAGTGGATGATTCATTCTTCCACAGGCTCGAAGAGTCATTCCTCGTGGAAGAAGATAAGAAGCACGAGAGGCACCCTATCTTCGGAAACATCGTTGATGAGGTGGCATACCACGAGAAGTACCCTACTATCTACCACCTCAGAAAGAAGCTCGTTGATTCTACTGATAAGGCTGATCTCAGGCTCATCTACCTCGCTCTCGCTCACATGATCAAGTTCAGAGGACACTTCCTCATCGAGGGTGATCTCAACCCTGATAACTCTGATGTGGATAAGTTGTTCATCCAGCTCGTGCAGACCTACAACCAGCTTTTCGAAGAGAACCCTATCAACGCTTCAGGTGTGGATGCTAAGGCTATCCTCTCTGCTAGGCTCTCTAAGTCAAGAAGGCTTGAGAACCTCATTGCTCAGCTCCCTGGTGAGAAGAAGAACGGACTTTTCGGAAACTTGATCGCTCTCTCTCTCGGACTCACCCCTAACTTCAAGTCTAACTTCGATCTCGCTGAGGATGCAAAGCTCCAGCTCTCAAAGGATACCTACGATGATGATCTCGATAACCTCCTCGCTCAGATCGGAGATCAGTACGCTGATTTGTTCCTCGCTGCTAAGAACCTCTCTGATGCTATCCTCCTCAGTGATATCCTCAGAGTGAACACCGAGATCACCAAGGCTCCACTCTCAGCTTCTATGATCAAGAGATACGATGAGCACCACCAGGATCTCACACTTCTCAAGGCTCTTGTTAGACAGCAGCTCCCAGAGAAGTACAAAGAGATTTTCTTCGATCAGTCTAAGAACGGATACGCTGGTTACATCGATGGTGGTGCATCTCAAGAAGAGTTCTACAAGTTCATCAAGCCTATCCTCGAGAAGATGGATGGAACCGAGGAACTCCTCGTGAAGCTCAATAGAGAGGATCTTCTCAGAAAGCAGAGGACCTTCGATAACGGATCTATCCCTCATCAGATCCACCTCGGAGAGTTGCACGCTATCCTTAGAAGGCAAGAGGATTTCTACCCATTCCTCAAGGATAACAGGGAAAAGATTGAGAAGATTCTCACCTTCAGAATCCCTTACTACGTGGGACCTCTCGCTAGAGGAAACTCAAGATTCGCTTGGATGACCAGAAAGTCTGAGGAAACCATCACCCCTTGGAACTTCGAAGAGGTGGTGGATAAGGGTGCTAGTGCTCAGTCTTTCATCGAGAGGATGACCAACTTCGATAAGAACCTTCCAAACGAGAAGGTGCTCCCTAAGCACTCTTTGCTCTACGAGTACTTCACCGTGTACAACGAGTTGACCAAGGTTAAGTACGTGACCGAGGGAATGAGGAAGCCTGCTTTTTTGTCAGGTGAGCAAAAGAAGGCTATCGTTGATCTCTTGTTCAAGACCAACAGAAAGGTGACCGTGAAGCAGCTCAAAGAGGATTACTTCAAGAAAATCGAGTGCTTCGATTCAGTTGAGATTTCTGGTGTTGAGGATAGGTTCAACGCATCTCTCGGAACCTACCACGATCTCCTCAAGATCATTAAGGATAAGGATTTCTTGGATAACGAGGAAAACGAGGATATCTTGGAGGATATCGTTCTTACCCTCACCCTCTTTGAAGATAGAGAGATGATTGAAGAAAGGCTCAAGACCTACGCTCATCTCTTCGATGATAAGGTGATGAAGCAGTTGAAGAGAAGAAGATACACTGGTTGGGGAAGGCTCTCAAGAAAGCTCATTAACGGAATCAGGGATAAGCAGTCTGGAAAGACAATCCTTGATTTCCTCAAGTCTGATGGATTCGCTAACAGAAACTTCATGCAGCTCATCCACGATGATTCTCTCACCTTTAAAGAGGATATCCAGAAGGCTCAGGTTTCAGGACAGGGTGATAGTCTCCATGAGCATATCGCTAACCTCGCTGGATCTCCTGCAATCAAGAAGGGAATCCTCCAGACTGTGAAGGTTGTGGATGAGTTGGTGAAGGTGATGGGAAGGCATAAGCCTGAGAACATCGTGATCGAAATGGCTAGAGAGAACCAGACCACTCAGAAGGGACAGAAGAACTCTAGGGAAAGGATGAAGAGGATCGAGGAAGGTATCAAAGAGCTTGGATCTCAGATCCTCAAAGAGCACCCTGTTGAGAACACTCAGCTCCAGAATGAGAAGCTCTACCTCTACTACCTCCAGAACGGAAGGGATATGTATGTGGATCAAGAGTTGGATATCAACAGGCTCTCTGATTACGATGTTGATCATATCGTGCCACAGTCATTCTTGAAGGATGATTCTATCGATAACAAGGTGCTCACCAGGTCTGATAAGAACAGGGGTAAGAGTGATAACGTGCCAAGTGAAGAGGTTGTGAAGAAAATGAAGAACTATTGGAGGCAGCTCCTCAACGCTAAGCTCATCACTCAGAGAAAGTTCGATAACTTGACTAAGGCTGAGAGGGGAGGACTCTCTGAATTGGATAAGGCAGGATTCATCAAGAGGCAGCTTGTGGAAACCAGGCAGATCACTAAGCACGTTGCACAGATCCTCGATTCTAGGATGAACACCAAGTACGATGAGAACGATAAGTTGATCAGGGAAGTGAAGGTTATCACCCTCAAGTCAAAGCTCGTGTCTGATTTCAGAAAGGATTTCCAATTCTACAAGGTGAGGGAAATCAACAACTACCACCACGCTCACGATGCTTACCTTAACGCTGTTGTTGGAACCGCTCTCATCAAGAAGTATCCTAAGCTCGAGTCAGAGTTCGTGTACGGTGATTACAAGGTGTACGATGTGAGGAAGATGATCGCTAAGTCTGAGCAAGAGATCGGAAAGGCTACCGCTAAGTATTTCTTCTACTCTAACATCATGAATTTCTTCAAGACCGAGATTACCCTCGCTAACGGTGAGATCAGAAAGAGGCCACTCATCGAGACAAACGGTGAAACAGGTGAGATCGTGTGGGATAAGGGAAGGGATTTCGCTACCGTTAGAAAGGTGCTCTCTATGCCACAGGTGAACATCGTTAAGAAAACCGAGGTGCAGACCGGTGGATTCTCTAAAGAGTCTATCCTCCCTAAGAGGAACTCTGATAAGCTCATTGCTAGGAAGAAGGATTGGGACCCTAAGAAATACGGTGGTTTCGATTCTCCTACCGTGGCTTACTCTGTTCTCGTTGTGGCTAAGGTTGAGAAGGGAAAGAGTAAGAAGCTCAAGTCTGTTAAGGAACTTCTCGGAATCACTATCATGGAAAGGTCATCTTTCGAGAAGAACCCAATCGATTTCCTCGAGGCTAAGGGATACAAAGAGGTTAAGAAGGATCTCATCATCAAGCTCCCAAAGTACTCACTCTTCGAACTCGAGAACGGTAGAAAGAGGATGCTCGCTTCTGCTGGTGAGCTTCAAAAGGGAAACGAGCTTGCTCTCCCATCTAAGTACGTTAACTTTCTTTACCTCGCTTCTCACTACGAGAAGTTGAAGGGATCTCCAGAAGATAACGAGCAGAAGCAACTTTTCGTTGAGCAGCACAAGCACTACTTGGATGAGATCATCGAGCAGATCTCTGAGTTCTCTAAAAGGGTGATCCTCGCTGATGCAAACCTCGATAAGGTGTTGTCTGCTTACAACAAGCACAGAGATAAGCCTATCAGGGAACAGGCAGAGAACATCATCCATCTCTTCACCCTTACCAACCTCGGTGCTCCTGCTGCTTTCAAGTACTTCGATACAACCATCGATAGGAAGAGATACACCTCTACCAAAGAAGTGCTCGATGCTACCCTCATCCATCAGTCTATCACTGGACTCTACGAGACTAGGATCGATCTCTCACAGCTCGGTGGTGATTCAAGGGCTGATCCTAAGAAGAAGAGGAAGGTTGGAGACGACGGAGGTGGCGGTACAGGAGGGGGTGGGTCCGCTGAGTATGTCAGGGCGTTGTTCGACTTCAATGGAAACGACGAGGAAGATCTGCCTTTTAAAAAGGGAGATATTCTCAGGATCAGAGATAAGCCGGAAGAACAATGGTGGAACGCTGAAGACTCTGAAGGTAAGAGAGGTATGATTCTTGTCCCCTACGTCGAGAAGTATTCGGGTGACTATAAAGACCACGATGGAGATTATAAGGACCACGATATAGATTATAAGGATGATGATGATAAGAGCGGAATGACCGATGCAGAGTACGTCAGGATTCATGAGAAACTTGACATCTACACGTTTAAGAAACAGTTTTTCAACAACAAAAAATCTGTTAGTCACCGCTGTTACGTGCTGTTCGAATTGAAACGCAGAGGTGAGAGGAGAGCCTGCTTTTGGGGCTATGCCGTCAACAAGCCGCAAAGCGGCACAGAAAGGGGCATTCACGCGGAGATATTTAGCATTAGAAAGGTCGAGGAATACCTTCGGGATAATCCCGGGCAATTCACTATCAATTGGTACTCTTCATGGTCCCCGTGTGCAGATTGCGCTGAAAAGATACTGGAGTGGTATAATCAAGAACTCAGAGGAAACGGTCACACCCTCAAGATTTGGGCTTGCAAGCTTTACTACGAGAAAAATGCAAGGAACCAGATCGGCCTCTGGAACTTGCGCGACAACGGCGTGGGGTTGAATGTGATGGTGTCGGAGCATTACCAGTGCTGCCGGAAGATATTCATTCAGTCGTCACATAATCAATTGAACGAGAATAGGTGGCTCGAAAAAACCCTGAAGCGGGCCGAGAAGTGGAGGAGTGAACTCTCGATAATGATCCAGGTTAAAATACTGCATACTACCAAATCTCCGGCGGTGGGACCGAAGAAGAAGCGCAAGGTGGGGACCATGACTAATCTCTCAGATATAATCGAGAAGGAAACAGGAAAGCAACTGGTCATCCAAGAATCGATTTTGATGCTTCCCGAAGAAGTCGAAGAAGTTATAGGAAATAAGCCCGAGTCTGACATACTGGTTCACACAGCGTACGATGAAAGTACGGACGAGAATGTCATGTTGCTGACATCGGACGCACCTGAATACAAGCCTTGGGCTCTGGTCATACAAGATAGTAACGGAGAAAATAAGATTAAAATGCTTTCAGGTGGCTCCCCAAAGAAGAAACGCAAGGTTTGAGGATCTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAGCACCAGTGGTCTAGTGGTAGAATAGTACCCTGCCACGGTACAGACCCGGGTTCGATTCCCGGCTGGTGCAGGAGACCTTATATTCCCCAGAACATCAGGTTAATGGCGTTTTTGATGTCATTTTCGCGGTGGCTGAGATCAGCCACTTCTTCCCCGATAACGGAAACCGGCACACTGGCCATATCGGTGGTCATCATGCGCCAGCTTTCATCCCCGATATGCACCACCGGGTAAAGTTCACGGGAGACTTTATCTGACAGCAGACGTGCACTGGCCAGGGGGATCACCATCCGTCGCCCGGGCGTGTCAATAATATCACTCTGTACATCCACAAACAGACGATAACGGCTCTCTCTTTTATAGGTGTAAACCTTAAACTGCATTTCACCAGCCCCTGTTCTCGTCAGCAAAAGAGCCGTTCATTTCAATAAACCGGGCGACCTCAGCCATCCCTTCCTGATTTTCCGCTTTCCAGCGTTCGGCACGCAGACGACGGGCTTCATTCTGCATGGTTGTGCTTACCAGACCGGAGATATTGACATCATATATGCCTTGAGCAACTGATAGCTGTCGCTGTCAACTGTCACTGTAATACGCTGCTTCATAGCATACCTCTTTTTGACATACTTCGGGTATACATATCAGTATATATTCTTATACCGCAAAAATCAGCGCGCAAATACGCATACTGTTATCTGGCTTGGTCTCAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCAACAAAGCACCAGTGGTCTAGTGGTAGAATAGTACCCTGCCACGGTACAGACCCGGGTTCGATTCCCGGCTGGTGCAGGATCCATATGAAGATGAAGATGAAATATTTGGTGTGTCAAATAAAAAGCTTGTGTGCTTAAGTTTGTGTTTTTTTCTTGGCTTGTTGTGTTATGAATTTGTGGCTTTTTCTAATATTAAATGAATGTAAGATCACATTATAATGAATAAACAAATGTTTCTATAATCCATTGTGAATGTTTTGTTGGATCTCTTCTGCAGCATATAACTACTGTATGTGCTATGGTATGGACTATGGAATATGATTAAAGATAAGCGCGCCTGCAGTGCAGCGTGACCCGGTCGTGCCCCTCTCTTGAGATAATGAGCATTGCATGTCTAAGTTATAAAAAATTACCACATATTTTTTTTGTCACACTTGTTTGAAGTGCAGTTTATCTATCTTTATACATATATTTAAACTTTACTCTACGAATAATATAATCTATAGTACTACAATAATATCAGTGTTTTAGAGAATCATATAAATGAACAGTTAGACATGGTCTAAAGGACAATTGAGTATTTTGACAACAGGACTCTACAGTTTTATCTTTTTAGTGTGCATGTGTTCTCCTTTTTTTTTGCAAATAGCTTCACCTATATAATACTTCATCCATTTTATTAGTACATCCATTTAGGGTTTAGGGTTAATGGTTTTTATAGACTAATTTTTTTAGTACATCTATTTTATTCTATTTTAGCCTCTAAATTAAGAAAACTAAAACTCTATTTTAGTTTTTTTATTTAATAATTTAGATATAAAATAGAATAAAATAAAGTGACTAAAAATTAAACAAATACCCTTTAAGAAATTAAAAAAACTAAGGAAACATTTTTCTTGTTTCGAGTAGATAATGCCAGCCTGTTAAACGCCGTCGACGAGTCTAACGGACACCAACCAGCGAACCAGCAGCGTCGCGTCGGGCCAAGCGAAGCAGACGGCACGGCATCTCTGTCGCTGCCTCTGGACCCCTCTCGAGAGTTCCGCTCCACCGTTGGACTTGCTCCGCTGTCGGCATCCAGAAATTGCGTGGCGGAGCGGCAGACGTGAGCCGGCACGGCAGGCGGCCTCCTCCTCCTCTCACGGCACCGGCAGCTACGGGGGATTCCTTTCCCACCGCTCCTTCGCTTTCCCTTCCTCGCCCGCCGTAATAAATAGACACCCCCTCCACACCCTCTTTCCCCAACCTCGTGTTGTTCGGAGCGCACACACACACAACCAGAACTCCCCCAAATCCACCCGTCGGCACCTCCGCTTCAAGGTACGCCGCTCG TCCTCCCCCCCCCCCCCTCTCTACCTTCTCAAGATCGGCGTTCCGGTCCATGGTTAGGGCCCGGTAGTTCTACTTCTGTTCATGTTTGTGTTAGATCCGTGTTTGTGTTAGATCCGTGCTACTAGCGTTCGTACACGGATGCGACCTGTACGTCAGACACGTTCTGATTGCTAACTTGCCAGTGTTTCTCTTTGGGGAATCCTGGGATGGCTCTAGCCGTTCCGCAGACGGGATCGATTTCATGATTTTTTTTGTTTCGTTGCATAGGGTTTGGTTTGCCCTTTTCCTTTATTTCAATATATGCCGTGCACTTGTTTGTCGGGTCATCTTTTCATGCTTTTTTTTGTCTTGGTTGTGATGATGTGGTCTGGTTGGGCGGTCGTTCAAGATCGGAGTAGAATTAATTCTGTTTCAAACTACCTGGTGGATTTATTAATTTTGGATCTGTATGTGTGTGCCATACATATTCATAGTTACGAATTGAAGATGATGGATGGAAATATCGATCTAGGATAGGTATACATGTTGATGCGGGTTTTACTGATGCATATACAGAGATGCTTTTTGTTCGCTTGGTTGTGATGATGTGGTGTGGTTGGGCGGTCGTTCATTCGTTCAAGATCGGAGTAGAATACTGTTTCAAACTACCTGGTGTATTTATTAATTTTGGAACTGTATGTGTGTGTCATACATCTTCATAGTTACGAGTTTAAGATGGATGGAAATATCGATCTAGGATAGGTATACATGTTGATGTGGGTTTTACTGATGCATATACATGATGGCATATGCAGCATCTATTCATATGCTCTAACCTTGAGTACCTATCTATTATAATAAACAAGTATGTTTTATAATTATTTTGATCTTGATATACTTGGATGATGGCATATGCAGCAGCTATATGTGGATTTTTTTAGCCCTGCCTTCATACGCTATTTATTTGCTTGGTACTGTTTCTTTTGTCGATGCTCACCCTGTTGTTTGGTGTTACTTCTGC AGCCTGCAGGATGGATAAGAAGTACTCTATCGGACTCGCTATCGGAACTAACTCTGTGGGATGGGCTGTGATCACCGATGAGTACAAGGTGCCATCTAAGAAGTTCAAGGTTCTCGGAAACACCGATAGGCACTCTATCAAGAAAAACCTTATCGGTGCTCTCCTCTTCGATTCTGGTGAAACTGCTGAGGCTACCAGACTCAAGAGAACCGCTAGAAGAAGGTACACCAGAAGAAAGAACAGGATCTGCTACCTCCAAGAGATCTTCTCTAACGAGATGGCTAAAGTGGATGATTCATTCTTCCACAGGCTCGAAGAGTCATTCCTCGTGGAAGAAGATAAGAAGCACGAGAGGCACCCTATCTTCGGAAACATCGTTGATGAGGTGGCATACCACGAGAAGTACCCTACTATCTACCACCTCAGAAAGAAGCTCGTTGATTCTACTGATAAGGCTGATCTCAGGCTCATCTACCTCGCTCTCGCTCACATGATCAAGTTCAGAGGACACTTCCTCATCGAGGGTGATCTCAACCCTGATAACTCTGATGTGGATAAGTTGTTCATCCAGCTCGTGCAGACCTACAACCAGCTTTTCGAAGAGAACCCTATCAACGCTTCAGGTGTGGATGCTAAGGCTATCCTCTCTGCTAGGCTCTCTAAGTCAAGAAGGCTTGAGAACCTCATTGCTCAGCTCCCTGGTGAGAAGAAGAACGGACTTTTCGGAAACTTGATCGCTCTCTCTCTCGGACTCACCCCTAACTTCAAGTCTAACTTCGATCTCGCTGAGGATGCAAAGCTCCAGCTCTCAAAGGATACCTACGATGATGATCTCGATAACCTCCTCGCTCAGATCGGAGATCAGTACGCTGATTTGTTCCTCGCTGCTAAGAACCTCTCTGATGCTATCCTCCTCAGTGATATCCTCAGAGTGAACACCGAGATCACCAAGGCTCCACTCTCAGCTTCTATGATCAAGAGATACGATGAGCACCACCAG GATCTCACACTTCTCAAGGCTCTTGTTAGACAGCAGCTCCCAGAGAAGTACAAAGAGATTTTCTTCGATCAGTCTAAGAACGGATACGCTGGTTACATCGATGGTGGTGCATCTCAAGAAGAGTTCTACAAGTTCATCAAGCCTATCCTCGAGAAGATGGATGGAACCGAGGAACTCCTCGTGAAGCTCAATAGAGAGGATCTTCTCAGAAAGCAGAGGACCTTCGATAACGGATCTATCCCTCATCAGATCCACCTCGGAGAGTTGCACGCTATCCTTAGAAGGCAAGAGGATTTCTACCCATTCCTCAAGGATAACAGGGAAAAGATTGAGAAGATTCTCACCTTCAGAATCCCTTACTACGTGGGACCTCTCGCTAGAGGAAACTCAAGATTCGCTTGGATGACCAGAAAGTCTGAGGAAACCATCACCCCTTGGAACTTCGAAGAGGTGGTGGATAAGGGTGCTAGTGCTCAGTCTTTCATCGAGAGGATGACCAACTTCGATAAGAACCTTCCAAACGAGAAGGTGCTCCCTAAGCACTCTTTGCTCTACGAGTACTTCACCGTGTACAACGAGTTGACCAAGGTTAAGTACGTGACCGAGGGAATGAGGAAGCCTGCTTTTTTGTCAGGTGAGCAAAAGAAGGCTATCGTTGATCTCTTGTTCAAGACCAACAGAAAGGTGACCGTGAAGCAGCTCAAAGAGGATTACTTCAAGAAAATCGAGTGCTTCGATTCAGTTGAGATTTCTGGTGTTGAGGATAGGTTCAACGCATCTCTCGGAACCTACCACGATCTCCTCAAGATCATTAAGGATAAGGATTTCTTGGATAACGAGGAAAACGAGGATATCTTGGAGGATATCGTTCTTACCCTCACCCTCTTTGAAGATAGAGAGATGATTGAAGAAAGGCTCAAGACCTACGCTCATCTCTTCGATGATAAGGTGATGAAGCAGTTGAAGAGAAGAAGATACACTGGTTGGGGAAGGCTCTCAA GAAAGCTCATTAACGGAATCAGGGATAAGCAGTCTGGAAAGACAATCCTTGATTTCCTCAAGTCTGATGGATTCGCTAACAGAAACTTCATGCAGCTCATCCACGATGATTCTCTCACCTTTAAAGAGGATATCCAGAAGGCTCAGGTTTCAGGACAGGGTGATAGTCTCCATGAGCATATCGCTAACCTCGCTGGATCTCCTGCAATCAAGAAGGGAATCCTCCAGACTGTGAAGGTTGTGGATGAGTTGGTGAAGGTGATGGGAAGGCATAAGCCTGAGAACATCGTGATCGAAATGGCTAGAGAGAACCAGACCACTCAGAAGGGACAGAAGAACTCTAGGGAAAGGATGAAGAGGATCGAGGAAGGTATCAAAGAGCTTGGATCTCAGATCCTCAAAGAGCACCCTGTTGAGAACACTCAGCTCCAGAATGAGAAGCTCTACCTCTACTACCTCCAGAACGGAAGGGATATGTATGTGGATCAAGAGTTGGATATCAACAGGCTCTCTGATTACGATGTTGATCATATCGTGCCACAGTCATTCTTGAAGGATGATTCTATCGATAACAAGGTGCTCACCAGGTCTGATAAGAACAGGGGTAAGAGTGATAACGTGCCAAGTGAAGAGGTTGTGAAGAAAATGAAGAACTATTGGAGGCAGCTCCTCAACGCTAAGCTCATCACTCAGAGAAAGTTCGATAACTTGACTAAGGCTGAGAGGGGAGGACTCTCTGAATTGGATAAGGCAGGATTCATCAAGAGGCAGCTTGTGGAAACCAGGCAGATCACTAAGCACGTTGCACAGATCCTCGATTCTAGGATGAACACCAAGTACGATGAGAACGATAAGTTGATCAGGGAAGTGAAGGTTATCACCCTCAAGTCAAAGCTCGTGTCTGATTTCAGAAAGGATTTCCAATTCTACAAGGTGAGGGAAATCAACAACTACCACCACGCTCACGATGCTTACCTTAACGCTGTTGTTGGAACCGCTCT CATCAAGAAGTATCCTAAGCTCGAGTCAGAGTTCGTGTACGGTGATTACAAGGTGTACGATGTGAGGAAGATGATCGCTAAGTCTGAGCAAGAGATCGGAAAGGCTACCGCTAAGTATTTCTTCTACTCTAACATCATGAATTTCTTCAAGACCGAGATTACCCTCGCTAACGGTGAGATCAGAAAGAGGCCACTCATCGAGACAAACGGTGAAACAGGTGAGATCGTGTGGGATAAGGGAAGGGATTTCGCTACCGTTAGAAAGGTGCTCTCTATGCCACAGGTGAACATCGTTAAGAAAACCGAGGTGCAGACCGGTGGATTCTCTAAAGAGTCTATCCTCCCTAAGAGGAACTCTGATAAGCTCATTGCTAGGAAGAAGGATTGGGACCCTAAGAAATACGGTGGTTTCGATTCTCCTACCGTGGCTTACTCTGTTCTCGTTGTGGCTAAGGTTGAGAAGGGAAAGAGTAAGAAGCTCAAGTCTGTTAAGGAACTTCTCGGAATCACTATCATGGAAAGGTCATCTTTCGAGAAGAACCCAATCGATTTCCTCGAGGCTAAGGGATACAAAGAGGTTAAGAAGGATCTCATCATCAAGCTCCCAAAGTACTCACTCTTCGAACTCGAGAACGGTAGAAAGAGGATGCTCGCTTCTGCTGGTGAGCTTCAAAAGGGAAACGAGCTTGCTCTCCCATCTAAGTACGTTAACTTTCTTTACCTCGCTTCTCACTACGAGAAGTTGAAGGGATCTCCAGAAGATAACGAGCAGAAGCAACTTTTCGTTGAGCAGCACAAGCACTACTTGGATGAGATCATCGAGCAGATCTCTGAGTTCTCTAAAAGGGTGATCCTCGCTGATGCAAACCTCGATAAGGTGTTGTCTGCTTACAACAAGCACAGAGATAAGCCTATCAGGGAACAGGCAGAGAACATCATCCATCTCTTCACCCTTACCAACCTCGGTGCTCCTGCTGCTTTCAAGTACTTCGATACAACC ATCGATAGGAAGAGATACACCTCTACCAAAGAAGTGCTCGATGCTACCCTCATCCATCAGTCTATCACTGGACTCTACGAGACTAGGATCGATCTCTCACAGCTCGGTGGTGATTCAAGGGCTGATCCTAAGAAGAAGAGGAAGGTTGGAGACGACGGAGGTGGCGGTACAGGAGGGGGTGGGTCCGCTGAGTATGTCAGGGCGTTGTTCGACTTCAATGGAAACGACGAGGAAGATCTGCCTTTTAAAAAGGGAGATATTCTCAGGATCAGAGATAAGCCGGAAGAACAATGGTGGAACGCTGAAGACTCTGAAGGTAAGAGAGGTATGATTCTTGTCCCCTACGTCGAGAAGTATTCGGGTGACTATAAAGACCACGATGGAGATTATAAGGACCACGATATAGATTATAAGGATGATGATGATAAGAGCGGAATGACCGATGCAGAGTACGTCAGGATTCATGAGAAACTTGACATCTACACGTTTAAGAAACAGTTTTTCAACAACAAAAAATCTGTTAGTCACCGCTGTTACGTGCTGTTCGAATTGAAACGCAGAGGTGAGAGGAGAGCCTGCTTTTGGGGCTATGCCGTCAACAAGCCGCAAAGCGGCACAGAAAGGGGCATTCACGCGGAGATATTTAGCATTAGAAAGGTCGAGGAATACCTTCGGGATAATCCCGGGCAATTCACTATCAATTGGTACTCTTCATGGTCCCCGTGTGCAGATTGCGCTGAAAAGATACTGGAGTGGTATAATCAAGAACTCAGAGGAAACGGTCACACCCTCAAGATTTGGGCTTGCAAGCTTTACTACGAGAAAAATGCAAGGAACCAGATCGGCCTCTGGAACTTGCGCGACAACGGCGTGGGGTTGAATGTGATGGTGTCGGAGCATTACCAGTGCTGCCGGAAGATATTCATTCAGTCGTCACATAATCAATTGAACGAGAATAGGTGGCTCGAAAAAACCCTGAAGCGGGCCGAGAAGTGGAGGA GTGAACTCTCGATAATGATCCAGGTTAAAATACTGCATACTACCAAATCTCCGGCGGTGGGACCGAAGAAGAAGCGCAAGGTGGGGACCATGACTAATCTCTCAGATATAATCGAGAAGGAAACAGGAAAGCAACTGGTCATCCAAGAATCGATTTTGATGCTTCCCGAAGAAGTCGAAGAAGTTATAGGAAATAAGCCCGAGTCTGACATACTGGTTCACACAGCGTACGATGAAAGTACGGACGAGAATGTCATGTTGCTGACATCGGACGCACCTGAATACAAGCCTTGGGCTCTGGTCATACAAGATAGTAACGGAGAAAATAAGATTAAAATGCTTTCAGGTGGCTCCCCAAAGAAGAAACGCAAGGTTTGAGGATCTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAGCACCAGTGGTCTAGTGGTAGAATAGTACCCTGCCACGGTACAGACCCGGGTTCGATTCCCGGCTGGTGCAGGAGACCTTATATTCCCCAGAACATCAGGTTAATGGCGTTTTTGATGTCATTTTCGCGGTGGCTGAGATCAGCCACTTCTTCCCCGATAACGGAAACCGGCACACTGGCCATATCGGTGGTCATCATGCGCCAGCTTTCATCCCCGATATGCACCACCGGGTAAAGTTCACGGGAGACTTTATCTGACAGCAGACGTGCACTGGCCAGGGGGATCACCATCCGTCGCCCGGGCGTGTCAATAATATCACTCTGTACATCCACAAACAGACGATAACGGCTCTCTCTTTTATAGGTGTAAACCTTAAACTGCATTTCACCAGCCCCTGTTCTCGTCAGCAAAAGAGCCGTTCATTTCAATAAACCGGGCGACCTCAGCCATCCCTTCCTGATTTTCCGCTTTCCAGCGTTCGGCACGCAGACGACGGGCTTCATTCTGCATGGTTGTGCTTACCAGACCGGAGATATTGACATCATATATGCCTTGAGCAACT GATAGCTGTCGCTGTCAACTGTCACTGTAATACGCTGCTTCATAGCATACCTCTTTTTGACATACTTCGGGTATACATATCAGTATATATTCTTATACCGCAAAAATCAGCGCGCAAATACGCATACTGTTATCTGGCTTGGTCTCAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCAACAAAGCACCAGTGGTCTAGTGGTAGAATAGTACCCTGCCACGGTACAGACCCGGGTTCGATTCCCGGCTGGTGCAGGATCCATATGAAGATGAAGATGAAATATTTGGTGTGTCAAATAAAAAGCTTGTGTGCTTAAGTTTGTGTTTTTTTCTTGGCTTGTTGTGTTATGAATTTGTGGCTTTTTCTAATATTAAATGAATGTAAGATCACATTATAATGAATAAACAAATGTTTCTATAATCCATTGTGAATGTTTTGTTGGATCTCTTCTGCAGCATATAACTACTGTATGTGCTATGGTATGGACTATGGAATATGATTAAAGATAAG
Seq ID No.2:nCas9-PmCDA1核酸酶-胞嘧啶脱氨酶融合蛋白表达框氨基酸序列Seq ID No.2: Amino acid sequence of nCas9-PmCDA1 nuclease-cytidine deaminase fusion protein expression cassette
MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSRADPKKKRKVGDDGGGGTGGGGSAEYVRALFDFNGNDEEDLPFKKGDILRIRDKPEEQWWNAEDSEGKRGMILVPYVEKYSGDYKDHDGDYKDHDIDYKDDDDKSGMTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKWRSELSIMIQVKILHTTKSPAVGPKKKRKVGTMTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV。MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKK YPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSRADPKKKRKVGDDGGGGTGGGGSAEYVRALFDFNGNDEEDLPFKKGDILRIRDKPEEQWWNAEDSEGKRGMILVPYVEKYSGDYKDHDGDYKDHDIDYKDDDDKSGMTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKWRSELSIMIQVKILHTTKSPAVGPKKKRKVGTMTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV。
序列表sequence listing
<110> 电子科技大学<110> University of Electronic Science and Technology
<120> 一种植物基因组定向碱基编辑骨架载体及其应用<120> A plant genome-directed base editing backbone vector and its application
<130> 20180001<130> 20180001
<141> 2018-11-23<141> 2018-11-23
<160> 15<160> 15
<170> SIPOSequenceListing 1.0<170> SIPOSequenceListing 1.0
<210> 1<210> 1
<211> 8556<211> 8556
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 1<400> 1
cgcgcctgca gtgcagcgtg acccggtcgt gcccctctct tgagataatg agcattgcat 60cgcgcctgca gtgcagcgtg acccggtcgt gcccctctct tgagataatg agcattgcat 60
gtctaagtta taaaaaatta ccacatattt tttttgtcac acttgtttga agtgcagttt 120gtctaagtta taaaaaatta ccacatattt tttttgtcac acttgtttga agtgcagttt 120
atctatcttt atacatatat ttaaacttta ctctacgaat aatataatct atagtactac 180atctatcttt atacatatat ttaaacttta ctctacgaat aatataatct atagtactac 180
aataatatca gtgttttaga gaatcatata aatgaacagt tagacatggt ctaaaggaca 240aataatatca gtgttttaga gaatcatata aatgaacagt tagacatggt ctaaaggaca 240
attgagtatt ttgacaacag gactctacag ttttatcttt ttagtgtgca tgtgttctcc 300attgagtatt ttgacaacag gactctacag ttttatcttt ttagtgtgca tgtgttctcc 300
tttttttttg caaatagctt cacctatata atacttcatc cattttatta gtacatccat 360tttttttttg caaatagctt cacctatata atacttcatc cattttatta gtacatccat 360
ttagggttta gggttaatgg tttttataga ctaatttttt tagtacatct attttattct 420ttagggttta gggttaatgg tttttataga ctaatttttt tagtacatct attttattct 420
attttagcct ctaaattaag aaaactaaaa ctctatttta gtttttttat ttaataattt 480attttagcct ctaaattaag aaaactaaaa ctctatttta gtttttttat ttaataattt 480
agatataaaa tagaataaaa taaagtgact aaaaattaaa caaataccct ttaagaaatt 540agatataaaa tagaataaaa taaagtgact aaaaattaaa caaataccct ttaagaaatt 540
aaaaaaacta aggaaacatt tttcttgttt cgagtagata atgccagcct gttaaacgcc 600aaaaaaacta aggaaacatt tttcttgttt cgagtagata atgccagcct gttaaacgcc 600
gtcgacgagt ctaacggaca ccaaccagcg aaccagcagc gtcgcgtcgg gccaagcgaa 660gtcgacgagt ctaacggaca ccaaccagcg aaccagcagc gtcgcgtcgg gccaagcgaa 660
gcagacggca cggcatctct gtcgctgcct ctggacccct ctcgagagtt ccgctccacc 720gcagacggca cggcatctct gtcgctgcct ctggacccct ctcgagagtt ccgctccacc 720
gttggacttg ctccgctgtc ggcatccaga aattgcgtgg cggagcggca gacgtgagcc 780gttggacttg ctccgctgtc ggcatccaga aattgcgtgg cggagcggca gacgtgagcc 780
ggcacggcag gcggcctcct cctcctctca cggcaccggc agctacgggg gattcctttc 840ggcacggcag gcggcctcct cctcctctca cggcaccggc agctacgggg gattcctttc 840
ccaccgctcc ttcgctttcc cttcctcgcc cgccgtaata aatagacacc ccctccacac 900ccaccgctcc ttcgctttcc cttcctcgcc cgccgtaata aatagacacc ccctccacac 900
cctctttccc caacctcgtg ttgttcggag cgcacacaca cacaaccaga actcccccaa 960cctctttccc caacctcgtg ttgttcggag cgcacacaca cacaaccaga actcccccaa 960
atccacccgt cggcacctcc gcttcaaggt acgccgctcg tcctcccccc ccccccctct 1020atccaccccgt cggcacctcc gcttcaaggt acgccgctcg tcctcccccc ccccccctct 1020
ctaccttctc aagatcggcg ttccggtcca tggttagggc ccggtagttc tacttctgtt 1080ctaccttctc aagatcggcg ttccggtcca tggttagggc ccggtagttc tacttctgtt 1080
catgtttgtg ttagatccgt gtttgtgtta gatccgtgct actagcgttc gtacacggat 1140catgtttgtg ttagatccgt gtttgtgtgtta gatccgtgct actagcgttc gtacacggat 1140
gcgacctgta cgtcagacac gttctgattg ctaacttgcc agtgtttctc tttggggaat 1200gcgacctgta cgtcagacac gttctgattg ctaacttgcc agtgtttctc tttggggaat 1200
cctgggatgg ctctagccgt tccgcagacg ggatcgattt catgattttt tttgtttcgt 1260cctgggatgg ctctagccgt tccgcagacg ggatcgattt catgattttt tttgtttcgt 1260
tgcatagggt ttggtttgcc cttttccttt atttcaatat atgccgtgca cttgtttgtc 1320tgcataggt ttggtttgcc cttttccttt atttcaatat atgccgtgca cttgtttgtc 1320
gggtcatctt ttcatgcttt tttttgtctt ggttgtgatg atgtggtctg gttgggcggt 1380gggtcatctt ttcatgcttt tttttgtctt ggttgtgatg atgtggtctg gttgggcggt 1380
cgttcaagat cggagtagaa ttaattctgt ttcaaactac ctggtggatt tattaatttt 1440cgttcaagat cggagtagaa ttaattctgt ttcaaactac ctggtggatt tattaatttt 1440
ggatctgtat gtgtgtgcca tacatattca tagttacgaa ttgaagatga tggatggaaa 1500ggatctgtat gtgtgtgcca tacatattca tagttacgaa ttgaagatga tggatggaaa 1500
tatcgatcta ggataggtat acatgttgat gcgggtttta ctgatgcata tacagagatg 1560tatcgatcta ggataggtat acatgttgat gcgggtttta ctgatgcata tacagagatg 1560
ctttttgttc gcttggttgt gatgatgtgg tgtggttggg cggtcgttca ttcgttcaag 1620ctttttgttc gcttggttgt gatgatgtgg tgtggttggg cggtcgttca ttcgttcaag 1620
atcggagtag aatactgttt caaactacct ggtgtattta ttaattttgg aactgtatgt 1680atcggagtag aatactgttt caaactacct ggtgtattta ttaattttgg aactgtatgt 1680
gtgtgtcata catcttcata gttacgagtt taagatggat ggaaatatcg atctaggata 1740gtgtgtcata catcttcata gttacgagtt taagatggat ggaaatatcg atctaggata 1740
ggtatacatg ttgatgtggg ttttactgat gcatatacat gatggcatat gcagcatcta 1800ggtatacatg ttgatgtggg ttttactgat gcatatacat gatggcatat gcagcatcta 1800
ttcatatgct ctaaccttga gtacctatct attataataa acaagtatgt tttataatta 1860ttcatatgct ctaaccttga gtacctatct attataataa acaagtatgt tttataatta 1860
ttttgatctt gatatacttg gatgatggca tatgcagcag ctatatgtgg atttttttag 1920ttttgatctt gatatacttg gatgatggca tatgcagcag ctatatgtgg atttttttag 1920
ccctgccttc atacgctatt tatttgcttg gtactgtttc ttttgtcgat gctcaccctg 1980ccctgccttc atacgctatt tatttgcttg gtactgtttc ttttgtcgat gctcaccctg 1980
ttgtttggtg ttacttctgc agcctgcagg atggataaga agtactctat cggactcgct 2040ttgtttggtg ttacttctgc agcctgcagg atggataaga agtactctat cggactcgct 2040
atcggaacta actctgtggg atgggctgtg atcaccgatg agtacaaggt gccatctaag 2100atcggaacta actctgtggg atgggctgtg atcaccgatg agtacaaggt gccatctaag 2100
aagttcaagg ttctcggaaa caccgatagg cactctatca agaaaaacct tatcggtgct 2160aagttcaagg ttctcggaaa caccgatagg cactctatca agaaaaacct tatcggtgct 2160
ctcctcttcg attctggtga aactgctgag gctaccagac tcaagagaac cgctagaaga 2220ctcctcttcg attctggtga aactgctgag gctaccagac tcaagagaac cgctagaaga 2220
aggtacacca gaagaaagaa caggatctgc tacctccaag agatcttctc taacgagatg 2280aggtacacca gaagaaagaa caggatctgc tacctccaag agatcttctc taacgagatg 2280
gctaaagtgg atgattcatt cttccacagg ctcgaagagt cattcctcgt ggaagaagat 2340gctaaagtgg atgattcatt cttccacagg ctcgaagagt cattcctcgt ggaagaagat 2340
aagaagcacg agaggcaccc tatcttcgga aacatcgttg atgaggtggc ataccacgag 2400aagaagcacg agaggcaccc tatcttcgga aacatcgttg atgaggtggc ataccacgag 2400
aagtacccta ctatctacca cctcagaaag aagctcgttg attctactga taaggctgat 2460aagtacccta ctatctacca cctcagaaag aagctcgttg attctactga taaggctgat 2460
ctcaggctca tctacctcgc tctcgctcac atgatcaagt tcagaggaca cttcctcatc 2520ctcaggctca tctacctcgc tctcgctcac atgatcaagt tcagaggaca cttcctcatc 2520
gagggtgatc tcaaccctga taactctgat gtggataagt tgttcatcca gctcgtgcag 2580gagggtgatc tcaaccctga taactctgat gtggataagt tgttcatcca gctcgtgcag 2580
acctacaacc agcttttcga agagaaccct atcaacgctt caggtgtgga tgctaaggct 2640acctacaacc agcttttcga agagaaccct atcaacgctt caggtgtgga tgctaaggct 2640
atcctctctg ctaggctctc taagtcaaga aggcttgaga acctcattgc tcagctccct 2700atcctctctg ctaggctctc taagtcaaga aggcttgaga acctcattgc tcagctccct 2700
ggtgagaaga agaacggact tttcggaaac ttgatcgctc tctctctcgg actcacccct 2760ggtgagaaga agaacggact tttcggaaac ttgatcgctc tctctctcgg actcacccct 2760
aacttcaagt ctaacttcga tctcgctgag gatgcaaagc tccagctctc aaaggatacc 2820aacttcaagt ctaacttcga tctcgctgag gatgcaaagc tccagctctc aaaggatacc 2820
tacgatgatg atctcgataa cctcctcgct cagatcggag atcagtacgc tgatttgttc 2880tacgatgatg atctcgataa cctcctcgct cagatcggag atcagtacgc tgatttgttc 2880
ctcgctgcta agaacctctc tgatgctatc ctcctcagtg atatcctcag agtgaacacc 2940ctcgctgcta agaacctctc tgatgctatc ctcctcagtg atatcctcag agtgaacacc 2940
gagatcacca aggctccact ctcagcttct atgatcaaga gatacgatga gcaccaccag 3000gagatcacca aggctccact ctcagcttct atgatcaaga gatacgatga gcaccaccag 3000
gatctcacac ttctcaaggc tcttgttaga cagcagctcc cagagaagta caaagagatt 3060gatctcacac ttctcaaggc tcttgttaga cagcagctcc cagagaagta caaagagatt 3060
ttcttcgatc agtctaagaa cggatacgct ggttacatcg atggtggtgc atctcaagaa 3120ttcttcgatc agtctaagaa cggatacgct ggttacatcg atggtggtgc atctcaagaa 3120
gagttctaca agttcatcaa gcctatcctc gagaagatgg atggaaccga ggaactcctc 3180gagttctaca agttcatcaa gcctatcctc gagaagatgg atggaaccga ggaactcctc 3180
gtgaagctca atagagagga tcttctcaga aagcagagga ccttcgataa cggatctatc 3240gtgaagctca atagagagga tcttctcaga aagcagagga ccttcgataa cggatctatc 3240
cctcatcaga tccacctcgg agagttgcac gctatcctta gaaggcaaga ggatttctac 3300cctcatcaga tccacctcgg agagttgcac gctatcctta gaaggcaaga ggatttctac 3300
ccattcctca aggataacag ggaaaagatt gagaagattc tcaccttcag aatcccttac 3360ccattcctca aggataacag ggaaaagatt gagaagattc tcaccttcag aatcccttac 3360
tacgtgggac ctctcgctag aggaaactca agattcgctt ggatgaccag aaagtctgag 3420tacgtgggac ctctcgctag aggaaactca agattcgctt ggatgaccag aaagtctgag 3420
gaaaccatca ccccttggaa cttcgaagag gtggtggata agggtgctag tgctcagtct 3480gaaaccatca ccccttggaa cttcgaagag gtggtggata agggtgctag tgctcagtct 3480
ttcatcgaga ggatgaccaa cttcgataag aaccttccaa acgagaaggt gctccctaag 3540ttcatcgaga ggatgaccaa cttcgataag aaccttccaa acgagaaggt gctccctaag 3540
cactctttgc tctacgagta cttcaccgtg tacaacgagt tgaccaaggt taagtacgtg 3600cactctttgc tctacgagta cttcaccgtg tacaacgagt tgaccaaggt taagtacgtg 3600
accgagggaa tgaggaagcc tgcttttttg tcaggtgagc aaaagaaggc tatcgttgat 3660accgagggaa tgaggaagcc tgcttttttg tcaggtgagc aaaagaaggc tatcgttgat 3660
ctcttgttca agaccaacag aaaggtgacc gtgaagcagc tcaaagagga ttacttcaag 3720ctcttgttca agaccaacag aaaggtgacc gtgaagcagc tcaaagagga ttacttcaag 3720
aaaatcgagt gcttcgattc agttgagatt tctggtgttg aggataggtt caacgcatct 3780aaaatcgagt gcttcgattc agttgagatt tctggtgttg aggataggtt caacgcatct 3780
ctcggaacct accacgatct cctcaagatc attaaggata aggatttctt ggataacgag 3840ctcggaacct accacgatct cctcaagatc attaaggata aggatttctt ggataacgag 3840
gaaaacgagg atatcttgga ggatatcgtt cttaccctca ccctctttga agatagagag 3900gaaaacgagg atatcttgga ggatatcgtt cttaccctca ccctctttga agatagagag 3900
atgattgaag aaaggctcaa gacctacgct catctcttcg atgataaggt gatgaagcag 3960atgattgaag aaaggctcaa gacctacgct catctcttcg atgataaggt gatgaagcag 3960
ttgaagagaa gaagatacac tggttgggga aggctctcaa gaaagctcat taacggaatc 4020ttgaagagaa gaagatacac tggttgggga aggctctcaa gaaagctcat taacggaatc 4020
agggataagc agtctggaaa gacaatcctt gatttcctca agtctgatgg attcgctaac 4080agggataagc agtctggaaa gacaatcctt gatttcctca agtctgatgg attcgctaac 4080
agaaacttca tgcagctcat ccacgatgat tctctcacct ttaaagagga tatccagaag 4140agaaacttca tgcagctcat ccacgatgat tctctcacct ttaaagagga tatccagaag 4140
gctcaggttt caggacaggg tgatagtctc catgagcata tcgctaacct cgctggatct 4200gctcaggttt caggacaggg tgatagtctc catgagcata tcgctaacct cgctggatct 4200
cctgcaatca agaagggaat cctccagact gtgaaggttg tggatgagtt ggtgaaggtg 4260cctgcaatca agaagggaat cctccagact gtgaaggttg tggatgagtt ggtgaaggtg 4260
atgggaaggc ataagcctga gaacatcgtg atcgaaatgg ctagagagaa ccagaccact 4320atgggaaggc ataagcctga gaacatcgtg atcgaaatgg ctagagagaa ccagaccact 4320
cagaagggac agaagaactc tagggaaagg atgaagagga tcgaggaagg tatcaaagag 4380cagaagggac agaagaactc tagggaaagg atgaagagga tcgaggaagg tatcaaagag 4380
cttggatctc agatcctcaa agagcaccct gttgagaaca ctcagctcca gaatgagaag 4440cttggatctc agatcctcaa agagcaccct gttgagaaca ctcagctcca gaatgagaag 4440
ctctacctct actacctcca gaacggaagg gatatgtatg tggatcaaga gttggatatc 4500ctctacctct actacctcca gaacggaagg gatatgtatg tggatcaaga gttggatatc 4500
aacaggctct ctgattacga tgttgatcat atcgtgccac agtcattctt gaaggatgat 4560aacaggctct ctgattacga tgttgatcat atcgtgccac agtcattctt gaaggatgat 4560
tctatcgata acaaggtgct caccaggtct gataagaaca ggggtaagag tgataacgtg 4620tctatcgata acaaggtgct caccaggtct gataagaaca ggggtaagag tgataacgtg 4620
ccaagtgaag aggttgtgaa gaaaatgaag aactattgga ggcagctcct caacgctaag 4680ccaagtgaag aggttgtgaa gaaaatgaag aactattgga ggcagctcct caacgctaag 4680
ctcatcactc agagaaagtt cgataacttg actaaggctg agaggggagg actctctgaa 4740ctcatcactc agagaaagtt cgataacttg actaaggctg agagggggagg actctctgaa 4740
ttggataagg caggattcat caagaggcag cttgtggaaa ccaggcagat cactaagcac 4800ttggataagg caggattcat caagaggcag cttgtggaaa ccaggcagat cactaagcac 4800
gttgcacaga tcctcgattc taggatgaac accaagtacg atgagaacga taagttgatc 4860gttgcacaga tcctcgattc taggatgaac accaagtacg atgagaacga taagttgatc 4860
agggaagtga aggttatcac cctcaagtca aagctcgtgt ctgatttcag aaaggatttc 4920agggaagtga aggttatcac cctcaagtca aagctcgtgt ctgatttcag aaaggatttc 4920
caattctaca aggtgaggga aatcaacaac taccaccacg ctcacgatgc ttaccttaac 4980caattctaca aggtgaggga aatcaacaac taccaccacg ctcacgatgc ttaccttaac 4980
gctgttgttg gaaccgctct catcaagaag tatcctaagc tcgagtcaga gttcgtgtac 5040gctgttgttg gaaccgctct catcaagaag tatcctaagc tcgagtcaga gttcgtgtac 5040
ggtgattaca aggtgtacga tgtgaggaag atgatcgcta agtctgagca agagatcgga 5100ggtgattaca aggtgtacga tgtgaggaag atgatcgcta agtctgagca agagatcgga 5100
aaggctaccg ctaagtattt cttctactct aacatcatga atttcttcaa gaccgagatt 5160aaggctaccg ctaagtattt cttctactct aacatcatga atttcttcaa gaccgagatt 5160
accctcgcta acggtgagat cagaaagagg ccactcatcg agacaaacgg tgaaacaggt 5220accctcgcta acggtgagat cagaaagagg ccactcatcg agacaaacgg tgaaacaggt 5220
gagatcgtgt gggataaggg aagggatttc gctaccgtta gaaaggtgct ctctatgcca 5280gagatcgtgt gggataaggg aagggatttc gctaccgtta gaaaggtgct ctctatgcca 5280
caggtgaaca tcgttaagaa aaccgaggtg cagaccggtg gattctctaa agagtctatc 5340caggtgaaca tcgttaagaa aaccgaggtg cagaccggtg gattctctaa agagtctatc 5340
ctccctaaga ggaactctga taagctcatt gctaggaaga aggattggga ccctaagaaa 5400ctccctaaga ggaactctga taagctcatt gctaggaaga aggattggga ccctaagaaa 5400
tacggtggtt tcgattctcc taccgtggct tactctgttc tcgttgtggc taaggttgag 5460tacggtggtt tcgattctcc taccgtggct tactctgttc tcgttgtggc taaggttgag 5460
aagggaaaga gtaagaagct caagtctgtt aaggaacttc tcggaatcac tatcatggaa 5520aagggaaaga gtaagaagct caagtctgtt aaggaacttc tcggaatcac tatcatggaa 5520
aggtcatctt tcgagaagaa cccaatcgat ttcctcgagg ctaagggata caaagaggtt 5580aggtcatctt tcgagaagaa cccaatcgat ttcctcgagg ctaagggata caaagaggtt 5580
aagaaggatc tcatcatcaa gctcccaaag tactcactct tcgaactcga gaacggtaga 5640aagaaggatc tcatcatcaa gctcccaaag tactcactct tcgaactcga gaacggtaga 5640
aagaggatgc tcgcttctgc tggtgagctt caaaagggaa acgagcttgc tctcccatct 5700aagaggatgc tcgcttctgc tggtgagctt caaaagggaa acgagcttgc tctcccatct 5700
aagtacgtta actttcttta cctcgcttct cactacgaga agttgaaggg atctccagaa 5760aagtacgtta actttcttta cctcgcttct cactacgaga agttgaaggg atctccagaa 5760
gataacgagc agaagcaact tttcgttgag cagcacaagc actacttgga tgagatcatc 5820gataacgagc agaagcaact tttcgttgag cagcacaagc actacttgga tgagatcatc 5820
gagcagatct ctgagttctc taaaagggtg atcctcgctg atgcaaacct cgataaggtg 5880gagcagatct ctgagttctc taaaagggtg atcctcgctg atgcaaacct cgataaggtg 5880
ttgtctgctt acaacaagca cagagataag cctatcaggg aacaggcaga gaacatcatc 5940ttgtctgctt acaacaagca cagagataag cctatcaggg aacaggcaga gaacatcatc 5940
catctcttca cccttaccaa cctcggtgct cctgctgctt tcaagtactt cgatacaacc 6000catctcttca cccttaccaa cctcggtgct cctgctgctt tcaagtactt cgatacaacc 6000
atcgatagga agagatacac ctctaccaaa gaagtgctcg atgctaccct catccatcag 6060atcgatagga agagatacac ctctaccaaa gaagtgctcg atgctaccct catccatcag 6060
tctatcactg gactctacga gactaggatc gatctctcac agctcggtgg tgattcaagg 6120tctatcactg gactctacga gactaggatc gatctctcac agctcggtgg tgattcaagg 6120
gctgatccta agaagaagag gaaggttgga gacgacggag gtggcggtac aggagggggt 6180gctgatccta agaagaagag gaaggttgga gacgacggag gtggcggtac aggagggggt 6180
gggtccgctg agtatgtcag ggcgttgttc gacttcaatg gaaacgacga ggaagatctg 6240gggtccgctg agtatgtcag ggcgttgttc gacttcaatg gaaacgacga ggaagatctg 6240
ccttttaaaa agggagatat tctcaggatc agagataagc cggaagaaca atggtggaac 6300ccttttaaaa agggagatat tctcaggatc agagataagc cggaagaaca atggtggaac 6300
gctgaagact ctgaaggtaa gagaggtatg attcttgtcc cctacgtcga gaagtattcg 6360gctgaagact ctgaaggtaa gagaggtatg attcttgtcc cctacgtcga gaagtattcg 6360
ggtgactata aagaccacga tggagattat aaggaccacg atatagatta taaggatgat 6420ggtgactata aagaccacga tggagattat aaggaccacg atatagatta taaggatgat 6420
gatgataaga gcggaatgac cgatgcagag tacgtcagga ttcatgagaa acttgacatc 6480gatgataaga gcggaatgac cgatgcagag tacgtcagga ttcatgagaa acttgacatc 6480
tacacgttta agaaacagtt tttcaacaac aaaaaatctg ttagtcaccg ctgttacgtg 6540tacacgttta agaaacagtt tttcaacaac aaaaaatctg ttagtcaccg ctgttacgtg 6540
ctgttcgaat tgaaacgcag aggtgagagg agagcctgct tttggggcta tgccgtcaac 6600ctgttcgaat tgaaacgcag aggtgagagg agagcctgct tttggggcta tgccgtcaac 6600
aagccgcaaa gcggcacaga aaggggcatt cacgcggaga tatttagcat tagaaaggtc 6660aagccgcaaa gcggcacaga aaggggcatt cacgcggaga tattagcat tagaaaggtc 6660
gaggaatacc ttcgggataa tcccgggcaa ttcactatca attggtactc ttcatggtcc 6720gaggaatacc ttcgggataa tcccgggcaa ttcactatca attggtactc ttcatggtcc 6720
ccgtgtgcag attgcgctga aaagatactg gagtggtata atcaagaact cagaggaaac 6780ccgtgtgcag attgcgctga aaagatactg gagtggtata atcaagaact cagaggaaac 6780
ggtcacaccc tcaagatttg ggcttgcaag ctttactacg agaaaaatgc aaggaaccag 6840ggtcacaccc tcaagatttg ggcttgcaag ctttactacg agaaaaatgc aaggaaccag 6840
atcggcctct ggaacttgcg cgacaacggc gtggggttga atgtgatggt gtcggagcat 6900atcggcctct ggaacttgcg cgacaacggc gtggggttga atgtgatggt gtcggagcat 6900
taccagtgct gccggaagat attcattcag tcgtcacata atcaattgaa cgagaatagg 6960taccagtgct gccggaagat attcattcag tcgtcacata atcaattgaa cgagaatagg 6960
tggctcgaaa aaaccctgaa gcgggccgag aagtggagga gtgaactctc gataatgatc 7020tggctcgaaa aaaccctgaa gcgggccgag aagtggagga gtgaactctc gataatgatc 7020
caggttaaaa tactgcatac taccaaatct ccggcggtgg gaccgaagaa gaagcgcaag 7080caggttaaaa tactgcatac taccaaatct ccggcggtgg gaccgaagaa gaagcgcaag 7080
gtggggacca tgactaatct ctcagatata atcgagaagg aaacaggaaa gcaactggtc 7140gtggggacca tgactaatct ctcagatata atcgagaagg aaacaggaaa gcaactggtc 7140
atccaagaat cgattttgat gcttcccgaa gaagtcgaag aagttatagg aaataagccc 7200atccaagaat cgattttgat gcttcccgaa gaagtcgaag aagttatagg aaataagccc 7200
gagtctgaca tactggttca cacagcgtac gatgaaagta cggacgagaa tgtcatgttg 7260gagtctgaca tactggttca cacagcgtac gatgaaagta cggacgagaa tgtcatgttg 7260
ctgacatcgg acgcacctga atacaagcct tgggctctgg tcatacaaga tagtaacgga 7320ctgacatcgg acgcacctga atacaagcct tgggctctgg tcatacaaga tagtaacgga 7320
gaaaataaga ttaaaatgct ttcaggtggc tccccaaaga agaaacgcaa ggtttgagga 7380gaaaataaga ttaaaatgct ttcaggtggc tccccaaaga agaaacgcaa ggtttgagga 7380
tctaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaacaaagca 7440tctaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacaaagca 7440
ccagtggtct agtggtagaa tagtaccctg ccacggtaca gacccgggtt cgattcccgg 7500ccagtggtct agtggtagaa tagtaccctg ccacggtaca gacccgggtt cgattcccgg 7500
ctggtgcagg agaccttata ttccccagaa catcaggtta atggcgtttt tgatgtcatt 7560ctggtgcagg agaccttata ttccccagaa catcaggtta atggcgtttt tgatgtcatt 7560
ttcgcggtgg ctgagatcag ccacttcttc cccgataacg gaaaccggca cactggccat 7620ttcgcggtgg ctgagatcag ccacttcttc cccgataacg gaaaccggca cactggccat 7620
atcggtggtc atcatgcgcc agctttcatc cccgatatgc accaccgggt aaagttcacg 7680atcggtggtc atcatgcgcc agctttcatc cccgatatgc accaccgggt aaagttcacg 7680
ggagacttta tctgacagca gacgtgcact ggccaggggg atcaccatcc gtcgcccggg 7740ggagacttta tctgacagca gacgtgcact ggccagggggg atcaccatcc gtcgcccggg 7740
cgtgtcaata atatcactct gtacatccac aaacagacga taacggctct ctcttttata 7800cgtgtcaata atatcactct gtacatccac aaacagacga taacggctct ctcttttata 7800
ggtgtaaacc ttaaactgca tttcaccagc ccctgttctc gtcagcaaaa gagccgttca 7860ggtgtaaacc ttaaactgca tttcaccagc ccctgttctc gtcagcaaaa gagccgttca 7860
tttcaataaa ccgggcgacc tcagccatcc cttcctgatt ttccgctttc cagcgttcgg 7920tttcaataaa ccgggcgacc tcagccatcc cttcctgatt ttccgctttc cagcgttcgg 7920
cacgcagacg acgggcttca ttctgcatgg ttgtgcttac cagaccggag atattgacat 7980cacgcagacg acgggcttca ttctgcatgg ttgtgcttac cagaccggag atattgacat 7980
catatatgcc ttgagcaact gatagctgtc gctgtcaact gtcactgtaa tacgctgctt 8040catatatgcc ttgagcaact gatagctgtc gctgtcaact gtcactgtaa tacgctgctt 8040
catagcatac ctctttttga catacttcgg gtatacatat cagtatatat tcttataccg 8100catagcatac ctctttttga catacttcgg gtatacatat cagtatatat tcttataccg 8100
caaaaatcag cgcgcaaata cgcatactgt tatctggctt ggtctcagtt ttagagctag 8160caaaaatcag cgcgcaaata cgcatactgt tatctggctt ggtctcagtt ttagagctag 8160
aaatagcaag ttaaaataag gctagtccgt tatcaacttg aaaaagtggc accgagtcgg 8220aaatagcaag ttaaaataag gctagtccgt tatcaacttg aaaaagtggc accgagtcgg 8220
tgcaacaaag caccagtggt ctagtggtag aatagtaccc tgccacggta cagacccggg 8280tgcaacaaag caccagtggt ctagtggtag aatagtaccc tgccacggta cagacccggg 8280
ttcgattccc ggctggtgca ggatccatat gaagatgaag atgaaatatt tggtgtgtca 8340ttcgattccc ggctggtgca ggatccatat gaagatgaag atgaaatatt tggtgtgtca 8340
aataaaaagc ttgtgtgctt aagtttgtgt ttttttcttg gcttgttgtg ttatgaattt 8400aataaaaagc ttgtgtgctt aagtttgtgtgtttttttcttg gcttgttgtg ttatgaattt 8400
gtggcttttt ctaatattaa atgaatgtaa gatcacatta taatgaataa acaaatgttt 8460gtggcttttt ctaatattaa atgaatgtaa gatcacatta taatgaataa acaaatgttt 8460
ctataatcca ttgtgaatgt tttgttggat ctcttctgca gcatataact actgtatgtg 8520ctataatcca ttgtgaatgt tttgttggat ctcttctgca gcatataact actgtatgtg 8520
ctatggtatg gactatggaa tatgattaaa gataag 8556ctatggtatg gactatggaa tatgattaaa gataag 8556
<210> 2<210> 2
<211> 1788<211> 1788
<212> PRT<212> PRT
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 2<400> 2
Met Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser ValMet Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val
1 5 10 151 5 10 15
Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys PheGly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe
20 25 30 20 25 30
Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu IleLys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile
35 40 45 35 40 45
Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg LeuGly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu
50 55 60 50 55 60
Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile CysLys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys
65 70 75 8065 70 75 80
Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp SerTyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser
85 90 95 85 90 95
Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys LysPhe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys
100 105 110 100 105 110
His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala TyrHis Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr
115 120 125 115 120 125
His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val AspHis Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp
130 135 140 130 135 140
Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala HisSer Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His
145 150 155 160145 150 155 160
Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn ProMet Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro
165 170 175 165 170 175
Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr TyrAsp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr
180 185 190 180 185 190
Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp AlaAsn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala
195 200 205 195 200 205
Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu AsnLys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn
210 215 220 210 215 220
Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly AsnLeu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn
225 230 235 240225 230 235 240
Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn PheLeu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe
245 250 255 245 250 255
Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr AspAsp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp
260 265 270 260 265 270
Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala AspAsp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp
275 280 285 275 280 285
Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser AspLeu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp
290 295 300 290 295 300
Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala SerIle Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser
305 310 315 320305 310 315 320
Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu LysMet Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys
325 330 335 325 330 335
Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe PheAla Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe
340 345 350 340 345 350
Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala SerAsp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser
355 360 365 355 360 365
Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met AspGln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp
370 375 380 370 375 380
Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu ArgGly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg
385 390 395 400385 390 395 400
Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His LeuLys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu
405 410 415 405 410 415
Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro PheGly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe
420 425 430 420 425 430
Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg IleLeu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile
435 440 445 435 440 445
Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala TrpPro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp
450 455 460 450 455 460
Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu GluMet Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu
465 470 475 480465 470 475 480
Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met ThrVal Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr
485 490 495 485 490 495
Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His SerAsn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser
500 505 510 500 505 510
Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val LysLeu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys
515 520 525 515 520 525
Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu GlnTyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln
530 535 540 530 535 540
Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val ThrLys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr
545 550 555 560545 550 555 560
Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe AspVal Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp
565 570 575 565 570 575
Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu GlySer Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly
580 585 590 580 585 590
Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu AspThr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp
595 600 605 595 600 605
Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu ThrAsn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr
610 615 620 610 615 620
Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr AlaLeu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala
625 630 635 640625 630 635 640
His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg TyrHis Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr
645 650 655 645 650 655
Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg AspThr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp
660 665 670 660 665 670
Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly PheLys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe
675 680 685 675 680 685
Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr PheAla Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe
690 695 700 690 695 700
Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser LeuLys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu
705 710 715 720705 710 715 720
His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys GlyHis Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly
725 730 735 725 730 735
Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met GlyIle Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly
740 745 750 740 745 750
Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn GlnArg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln
755 760 765 755 760 765
Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg IleThr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile
770 775 780 770 775 780
Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His ProGlu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro
785 790 795 800785 790 795 800
Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr LeuVal Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu
805 810 815 805 810 815
Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn ArgGln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg
820 825 830 820 825 830
Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu LysLeu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys
835 840 845 835 840 845
Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn ArgAsp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg
850 855 860 850 855 860
Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met LysGly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys
865 870 875 880865 870 875 880
Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg LysAsn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys
885 890 895 885 890 895
Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu AspPhe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
900 905 910 900 905 910
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile ThrLys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr
915 920 925 915 920 925
Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr AspLys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp
930 935 940 930 935 940
Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys SerGlu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser
945 950 955 960945 950 955 960
Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val ArgLys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
965 970 975 965 970 975
Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala ValGlu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val
980 985 990 980 985 990
Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu PheVal Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe
995 1000 1005 995 1000 1005
Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala LysVal Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys
1010 1015 1020 1010 1015 1020
Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr SerSer Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser
1025 1030 1035 10401025 1030 1035 1040
Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly GluAsn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu
1045 1050 1055 1045 1050 1055
Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu IleIle Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile
1060 1065 1070 1060 1065 1070
Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu SerVal Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser
1075 1080 1085 1075 1080 1085
Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly GlyMet Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly
1090 1095 1100 1090 1095 1100
Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu IlePhe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile
1105 1110 1115 11201105 1110 1115 1120
Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp SerAla Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser
1125 1130 1135 1125 1130 1135
Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys GlyPro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1140 1145 1150 1140 1145 1150
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr IleLys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile
1155 1160 1165 1155 1160 1165
Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu AlaMet Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala
1170 1175 1180 1170 1175 1180
Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro LysLys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys
1185 1190 1195 12001185 1190 1195 1200
Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala SerTyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser
1205 1210 1215 1205 1210 1215
Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys TyrAla Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr
1220 1225 1230 1220 1225 1230
Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly SerVal Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser
1235 1240 1245 1235 1240 1245
Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys HisPro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His
1250 1255 1260 1250 1255 1260
Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg ValTyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val
1265 1270 1275 12801265 1270 1275 1280
Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn LysIle Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys
1285 1290 1295 1285 1290 1295
His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His LeuHis Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu
1300 1305 1310 1300 1305 1310
Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe AspPhe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp
1315 1320 1325 1315 1320 1325
Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu AspThr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp
1330 1335 1340 1330 1335 1340
Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg IleAla Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile
1345 1350 1355 13601345 1350 1355 1360
Asp Leu Ser Gln Leu Gly Gly Asp Ser Arg Ala Asp Pro Lys Lys LysAsp Leu Ser Gln Leu Gly Gly Asp Ser Arg Ala Asp Pro Lys Lys Lys
1365 1370 1375 1365 1370 1375
Arg Lys Val Gly Asp Asp Gly Gly Gly Gly Thr Gly Gly Gly Gly SerArg Lys Val Gly Asp Asp Gly Gly Gly Gly Thr Gly Gly Gly Gly Ser
1380 1385 1390 1380 1385 1390
Ala Glu Tyr Val Arg Ala Leu Phe Asp Phe Asn Gly Asn Asp Glu GluAla Glu Tyr Val Arg Ala Leu Phe Asp Phe Asn Gly Asn Asp Glu Glu
1395 1400 1405 1395 1400 1405
Asp Leu Pro Phe Lys Lys Gly Asp Ile Leu Arg Ile Arg Asp Lys ProAsp Leu Pro Phe Lys Lys Gly Asp Ile Leu Arg Ile Arg Asp Lys Pro
1410 1415 1420 1410 1415 1420
Glu Glu Gln Trp Trp Asn Ala Glu Asp Ser Glu Gly Lys Arg Gly MetGlu Glu Gln Trp Trp Asn Ala Glu Asp Ser Glu Gly Lys Arg Gly Met
1425 1430 1435 14401425 1430 1435 1440
Ile Leu Val Pro Tyr Val Glu Lys Tyr Ser Gly Asp Tyr Lys Asp HisIle Leu Val Pro Tyr Val Glu Lys Tyr Ser Gly Asp Tyr Lys Asp His
1445 1450 1455 1445 1450 1455
Asp Gly Asp Tyr Lys Asp His Asp Ile Asp Tyr Lys Asp Asp Asp AspAsp Gly Asp Tyr Lys Asp His Asp Ile Asp Tyr Lys Asp Asp Asp Asp
1460 1465 1470 1460 1465 1470
Lys Ser Gly Met Thr Asp Ala Glu Tyr Val Arg Ile His Glu Lys LeuLys Ser Gly Met Thr Asp Ala Glu Tyr Val Arg Ile His Glu Lys Leu
1475 1480 1485 1475 1480 1485
Asp Ile Tyr Thr Phe Lys Lys Gln Phe Phe Asn Asn Lys Lys Ser ValAsp Ile Tyr Thr Phe Lys Lys Gln Phe Phe Asn Asn Lys Lys Ser Val
1490 1495 1500 1490 1495 1500
Ser His Arg Cys Tyr Val Leu Phe Glu Leu Lys Arg Arg Gly Glu ArgSer His Arg Cys Tyr Val Leu Phe Glu Leu Lys Arg Arg Gly Glu Arg
1505 1510 1515 15201505 1510 1515 1520
Arg Ala Cys Phe Trp Gly Tyr Ala Val Asn Lys Pro Gln Ser Gly ThrArg Ala Cys Phe Trp Gly Tyr Ala Val Asn Lys Pro Gln Ser Gly Thr
1525 1530 1535 1525 1530 1535
Glu Arg Gly Ile His Ala Glu Ile Phe Ser Ile Arg Lys Val Glu GluGlu Arg Gly Ile His Ala Glu Ile Phe Ser Ile Arg Lys Val Glu Glu
1540 1545 1550 1540 1545 1550
Tyr Leu Arg Asp Asn Pro Gly Gln Phe Thr Ile Asn Trp Tyr Ser SerTyr Leu Arg Asp Asn Pro Gly Gln Phe Thr Ile Asn Trp Tyr Ser Ser
1555 1560 1565 1555 1560 1565
Trp Ser Pro Cys Ala Asp Cys Ala Glu Lys Ile Leu Glu Trp Tyr AsnTrp Ser Pro Cys Ala Asp Cys Ala Glu Lys Ile Leu Glu Trp Tyr Asn
1570 1575 1580 1570 1575 1580
Gln Glu Leu Arg Gly Asn Gly His Thr Leu Lys Ile Trp Ala Cys LysGln Glu Leu Arg Gly Asn Gly His Thr Leu Lys Ile Trp Ala Cys Lys
1585 1590 1595 16001585 1590 1595 1600
Leu Tyr Tyr Glu Lys Asn Ala Arg Asn Gln Ile Gly Leu Trp Asn LeuLeu Tyr Tyr Glu Lys Asn Ala Arg Asn Gln Ile Gly Leu Trp Asn Leu
1605 1610 1615 1605 1610 1615
Arg Asp Asn Gly Val Gly Leu Asn Val Met Val Ser Glu His Tyr GlnArg Asp Asn Gly Val Gly Leu Asn Val Met Val Ser Glu His Tyr Gln
1620 1625 1630 1620 1625 1630
Cys Cys Arg Lys Ile Phe Ile Gln Ser Ser His Asn Gln Leu Asn GluCys Cys Arg Lys Ile Phe Ile Gln Ser Ser His Asn Gln Leu Asn Glu
1635 1640 1645 1635 1640 1645
Asn Arg Trp Leu Glu Lys Thr Leu Lys Arg Ala Glu Lys Trp Arg SerAsn Arg Trp Leu Glu Lys Thr Leu Lys Arg Ala Glu Lys Trp Arg Ser
1650 1655 1660 1650 1655 1660
Glu Leu Ser Ile Met Ile Gln Val Lys Ile Leu His Thr Thr Lys SerGlu Leu Ser Ile Met Ile Gln Val Lys Ile Leu His Thr Thr Lys Ser
1665 1670 1675 16801665 1670 1675 1680
Pro Ala Val Gly Pro Lys Lys Lys Arg Lys Val Gly Thr Met Thr AsnPro Ala Val Gly Pro Lys Lys Lys Arg Lys Val Gly Thr Met Thr Asn
1685 1690 1695 1685 1690 1695
Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val Ile GlnLeu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val Ile Gln
1700 1705 1710 1700 1705 1710
Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile Gly AsnGlu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile Gly Asn
1715 1720 1725 1715 1720 1725
Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu Ser ThrLys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu Ser Thr
1730 1735 1740 1730 1735 1740
Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys ProAsp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro
1745 1750 1755 17601745 1750 1755 1760
Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys MetTrp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys Met
1765 1770 1775 1765 1770 1775
Leu Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys ValLeu Ser Gly Gly Ser Pro Lys Lys Lys Lys Arg Lys Val
1780 1785 1780 1785
<210> 3<210> 3
<211> 20<211> 20
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 3<400> 3
cgacatccgc aagtaccagg 20cgacatccgc aagtaccagg 20
<210> 4<210> 4
<211> 22<211> 22
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 4<400> 4
agtacactgt ttccccgtat gt 22agtacactgt ttccccgtat gt 22
<210> 5<210> 5
<211> 19<211> 19
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 5<400> 5
gctgctggtg agtgctgat 19gctgctggtg agtgctgat 19
<210> 6<210> 6
<211> 20<211> 20
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 6<400> 6
acccattggg agtgtcttgc 20acccattggg agtgtcttgc 20
<210> 7<210> 7
<211> 20<211> 20
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 7<400> 7
gaccagccag cgtctggcgc 20gaccagccag cgtctggcgc 20
<210> 8<210> 8
<211> 20<211> 20
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 8<400> 8
gcagctggct gagggtgcat 20gcagctggct gagggtgcat 20
<210> 9<210> 9
<211> 19<211> 19
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 9<400> 9
agccagctgc ttacaaaac 19agccagctgc ttacaaaac 19
<210> 10<210> 10
<211> 24<211> 24
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 10<400> 10
tgcagaccag ccagcgtctg gcgc 24tgcagaccag ccagcgtctg gcgc 24
<210> 12<210> 12
<211> 24<211> 24
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 12<400> 12
aaacgcgcca gacgctggct ggtc 24aaacgcgcca gacgctggct ggtc 24
<210> 13<210> 13
<211> 24<211> 24
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 13<400> 13
tgcagcagct ggctgagggt gcat 24tgcagcagct ggctgagggt gcat 24
<210> 13<210> 13
<211> 24<211> 24
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 13<400> 13
aaacatgcac cctcagccag ctgc 24aaacatgcac cctcagccag ctgc 24
<210> 14<210> 14
<211> 23<211> 23
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 14<400> 14
tgcaagccag ctgcttacaa aac 23tgcaagccag ctgcttacaa aac 23
<210> 15<210> 15
<211> 23<211> 23
<212> DNA<212>DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 15<400> 15
aaacgttttg taagcagctg gct 23aaacgttttg taagcagctg gct 23
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811403794.0A CN110607320B (en) | 2018-11-23 | 2018-11-23 | Plant genome directional base editing framework vector and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811403794.0A CN110607320B (en) | 2018-11-23 | 2018-11-23 | Plant genome directional base editing framework vector and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110607320A true CN110607320A (en) | 2019-12-24 |
CN110607320B CN110607320B (en) | 2023-05-12 |
Family
ID=68888837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811403794.0A Active CN110607320B (en) | 2018-11-23 | 2018-11-23 | Plant genome directional base editing framework vector and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110607320B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111850034A (en) * | 2020-06-24 | 2020-10-30 | 中国农业大学 | A gene editing vector and method |
CN112080517A (en) * | 2020-09-08 | 2020-12-15 | 南京农业大学 | Screening system for improving probability of obtaining gene editing plants, construction method and application thereof |
CN112575014A (en) * | 2020-12-11 | 2021-03-30 | 安徽省农业科学院水稻研究所 | Novel base editor SpCas9-LjCDAL1 and construction and application thereof |
CN112852791A (en) * | 2020-11-20 | 2021-05-28 | 中国农业科学院植物保护研究所 | Adenine base editor and related biological material and application thereof |
CN114507683A (en) * | 2021-11-19 | 2022-05-17 | 杭州嘉因生物科技有限公司 | SURE strain with Kan resistance gene knocked out and construction method and application thereof |
CN114540406A (en) * | 2020-11-26 | 2022-05-27 | 电子科技大学 | Genome editing expression box, vector and application thereof |
CN116064643A (en) * | 2022-10-08 | 2023-05-05 | 福建农林大学 | A method for improving the editing efficiency of single base editors in dicotyledonous plants and its application |
CN116135974A (en) * | 2021-11-17 | 2023-05-19 | 中国科学院天津工业生物技术研究所 | A kind of recombinant glycosylase base editing system and its application |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105132451A (en) * | 2015-07-08 | 2015-12-09 | 电子科技大学 | CRISPR/Cas9 single transcription unit directionally modified backbone vector and application thereof |
WO2017090761A1 (en) * | 2015-11-27 | 2017-06-01 | 国立大学法人神戸大学 | Method for converting monocot plant genome sequence in which nucleic acid base in targeted dna sequence is specifically converted, and molecular complex used therein |
CN107012164A (en) * | 2017-01-11 | 2017-08-04 | 电子科技大学 | CRISPR/Cpf1 Plant Genome directed modifications functional unit, the carrier comprising the functional unit and its application |
WO2018143477A1 (en) * | 2017-02-06 | 2018-08-09 | 国立大学法人 筑波大学 | Method of modifying genome of dicotyledonous plant |
US20200377910A1 (en) * | 2016-04-21 | 2020-12-03 | National University Corporation Kobe University | Method for increasing mutation introduction efficiency in genome sequence modification technique, and molecular complex to be used therefor |
CN113717960A (en) * | 2021-08-27 | 2021-11-30 | 电子科技大学 | Novel Cas9 protein, CRISPR-Cas9 genome directed editing vector and genome editing method |
WO2022060185A1 (en) * | 2020-09-18 | 2022-03-24 | 기초과학연구원 | Targeted deaminase and base editing using same |
CN114317590A (en) * | 2020-09-30 | 2022-04-12 | 北京市农林科学院 | Method for mutating base C in plant genome into base T |
-
2018
- 2018-11-23 CN CN201811403794.0A patent/CN110607320B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105132451A (en) * | 2015-07-08 | 2015-12-09 | 电子科技大学 | CRISPR/Cas9 single transcription unit directionally modified backbone vector and application thereof |
WO2017090761A1 (en) * | 2015-11-27 | 2017-06-01 | 国立大学法人神戸大学 | Method for converting monocot plant genome sequence in which nucleic acid base in targeted dna sequence is specifically converted, and molecular complex used therein |
US20200377910A1 (en) * | 2016-04-21 | 2020-12-03 | National University Corporation Kobe University | Method for increasing mutation introduction efficiency in genome sequence modification technique, and molecular complex to be used therefor |
CN107012164A (en) * | 2017-01-11 | 2017-08-04 | 电子科技大学 | CRISPR/Cpf1 Plant Genome directed modifications functional unit, the carrier comprising the functional unit and its application |
WO2018143477A1 (en) * | 2017-02-06 | 2018-08-09 | 国立大学法人 筑波大学 | Method of modifying genome of dicotyledonous plant |
WO2022060185A1 (en) * | 2020-09-18 | 2022-03-24 | 기초과학연구원 | Targeted deaminase and base editing using same |
CN114317590A (en) * | 2020-09-30 | 2022-04-12 | 北京市农林科学院 | Method for mutating base C in plant genome into base T |
CN113717960A (en) * | 2021-08-27 | 2021-11-30 | 电子科技大学 | Novel Cas9 protein, CRISPR-Cas9 genome directed editing vector and genome editing method |
Non-Patent Citations (10)
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111850034A (en) * | 2020-06-24 | 2020-10-30 | 中国农业大学 | A gene editing vector and method |
CN111850034B (en) * | 2020-06-24 | 2023-01-10 | 中国农业大学 | A carrier and method for gene editing |
CN112080517A (en) * | 2020-09-08 | 2020-12-15 | 南京农业大学 | Screening system for improving probability of obtaining gene editing plants, construction method and application thereof |
CN112852791B (en) * | 2020-11-20 | 2022-05-24 | 中国农业科学院植物保护研究所 | Adenine base editor and related biological material and application thereof |
CN112852791A (en) * | 2020-11-20 | 2021-05-28 | 中国农业科学院植物保护研究所 | Adenine base editor and related biological material and application thereof |
CN114540406A (en) * | 2020-11-26 | 2022-05-27 | 电子科技大学 | Genome editing expression box, vector and application thereof |
CN114540406B (en) * | 2020-11-26 | 2023-09-29 | 电子科技大学 | Genome editing expression cassettes, vectors and their applications |
CN112575014B (en) * | 2020-12-11 | 2022-04-01 | 安徽省农业科学院水稻研究所 | Base editor SpCas9-LjCDAL1 and construction and application thereof |
CN112575014A (en) * | 2020-12-11 | 2021-03-30 | 安徽省农业科学院水稻研究所 | Novel base editor SpCas9-LjCDAL1 and construction and application thereof |
CN116135974A (en) * | 2021-11-17 | 2023-05-19 | 中国科学院天津工业生物技术研究所 | A kind of recombinant glycosylase base editing system and its application |
CN114507683A (en) * | 2021-11-19 | 2022-05-17 | 杭州嘉因生物科技有限公司 | SURE strain with Kan resistance gene knocked out and construction method and application thereof |
CN116064643A (en) * | 2022-10-08 | 2023-05-05 | 福建农林大学 | A method for improving the editing efficiency of single base editors in dicotyledonous plants and its application |
CN116064643B (en) * | 2022-10-08 | 2025-01-28 | 福建农林大学 | A method for improving the editing efficiency of a single-base editor in dicotyledonous plants and its application |
Also Published As
Publication number | Publication date |
---|---|
CN110607320B (en) | 2023-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110607320B (en) | Plant genome directional base editing framework vector and application thereof | |
CN107012164B (en) | CRISPR/Cpf1 plant genome directed modification functional unit, vector containing functional unit and application of functional unit | |
CN105132451B (en) | A kind of single transcriptional units directed modification skeleton carrier of CRISPR/Cas9 and its application | |
CN103382468B (en) | Site-directed modification method of rice genome | |
CN110157726B (en) | Method for site-directed substitution of plant genome | |
CN103981215B (en) | A kind of for engineered key plasmid vector and application | |
CN113717960B (en) | A new Cas9 protein, CRISPR-Cas9 genome-directed editing vector and genome editing method | |
CN114075559A (en) | Type 2 CRISPR/Cas9 gene editing system and application thereof | |
CN111662367A (en) | Rice bacterial leaf blight-resistant protein and coding gene and application thereof | |
CN113846075A (en) | MAD7-NLS fusion protein, nucleic acid construct for site-directed editing of plant genome and application thereof | |
CN116426565A (en) | A vector suitable for melon CRISPR/Cas9 gene editing system and its application | |
CN116179513B (en) | Cpf1 protein and application thereof in gene editing | |
CN116463365A (en) | CasФ coding gene, compact plant genome directional modification system and plant genome epigenetic editing system | |
WO2018082611A1 (en) | Nucleic acid construct expressing exogenous gene in plant cells and use thereof | |
CN119351374B (en) | Engineering and transformation of ultra-compact CRISPR/Cas12j-8 and its applications | |
CN119162152B (en) | A highly accurate base editor that expands editing range and improves editing efficiency | |
CN117402855B (en) | Cas protein, gene editing system and application | |
CN116083432B (en) | Mulberry U6 promoter and application thereof | |
CN115851784B (en) | Plant cytosine base editing system constructed by Lbcpf1 variant and application thereof | |
CN117327742B (en) | Technical method for promoting efficient replacement and homogenization of chlamydomonas chloroplast genome | |
CN113388635B (en) | Plant double-target-point CRISPR/Cas9 vector and construction method and application thereof | |
CN117777253A (en) | Type V-A anti-CRISPR proteins suitable for plant genome editing and their applications | |
CN117384942A (en) | Split-Cas9 system suitable for plant genome editing and its applications | |
CN116064537A (en) | Plant circRNA expression frame and application thereof | |
CN119709697A (en) | Fusion protein and application thereof in single gene and multi-gene editing of rice |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |