CN118185908A - A cytosine base editing system derived from adenosine deaminase - Google Patents
A cytosine base editing system derived from adenosine deaminase Download PDFInfo
- Publication number
- CN118185908A CN118185908A CN202410237383.8A CN202410237383A CN118185908A CN 118185908 A CN118185908 A CN 118185908A CN 202410237383 A CN202410237383 A CN 202410237383A CN 118185908 A CN118185908 A CN 118185908A
- Authority
- CN
- China
- Prior art keywords
- tadcbea
- tada
- deaminase
- cda
- editing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 title claims abstract description 88
- 229940104302 cytosine Drugs 0.000 title claims abstract description 44
- 102000055025 Adenosine deaminases Human genes 0.000 title claims abstract description 21
- 101710169336 5'-deoxyadenosine deaminase Proteins 0.000 title claims abstract description 20
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 26
- 108020001507 fusion proteins Proteins 0.000 claims abstract description 20
- 102000037865 fusion proteins Human genes 0.000 claims abstract description 20
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 19
- 230000014509 gene expression Effects 0.000 claims description 42
- 239000013598 vector Substances 0.000 claims description 42
- 108091026890 Coding region Proteins 0.000 claims description 19
- 239000013604 expression vector Substances 0.000 claims description 17
- 108091033409 CRISPR Proteins 0.000 claims description 13
- 238000013518 transcription Methods 0.000 claims description 10
- 230000035897 transcription Effects 0.000 claims description 10
- 239000002773 nucleotide Substances 0.000 claims description 6
- 125000003729 nucleotide group Chemical group 0.000 claims description 5
- 108700004991 Cas12a Proteins 0.000 claims description 2
- 102200111112 rs397514590 Human genes 0.000 claims description 2
- 102220092319 rs876657875 Human genes 0.000 claims description 2
- 230000004927 fusion Effects 0.000 claims 1
- 102100026846 Cytidine deaminase Human genes 0.000 abstract description 16
- 108010031325 Cytidine deaminase Proteins 0.000 abstract description 16
- 230000000694 effects Effects 0.000 abstract description 12
- 238000010353 genetic engineering Methods 0.000 abstract description 2
- 240000007594 Oryza sativa Species 0.000 description 76
- 235000007164 Oryza sativa Nutrition 0.000 description 52
- 235000009566 rice Nutrition 0.000 description 51
- 241000196324 Embryophyta Species 0.000 description 39
- 108020005004 Guide RNA Proteins 0.000 description 28
- 208000031752 chronic bilirubin encephalopathy Diseases 0.000 description 22
- 238000004458 analytical method Methods 0.000 description 21
- 210000004027 cell Anatomy 0.000 description 18
- 238000012070 whole genome sequencing analysis Methods 0.000 description 18
- 230000009437 off-target effect Effects 0.000 description 17
- 238000006243 chemical reaction Methods 0.000 description 13
- 239000000047 product Substances 0.000 description 13
- 230000009466 transformation Effects 0.000 description 12
- 238000012163 sequencing technique Methods 0.000 description 11
- 102000012410 DNA Ligases Human genes 0.000 description 10
- 108010061982 DNA Ligases Proteins 0.000 description 10
- 238000010276 construction Methods 0.000 description 9
- 210000001938 protoplast Anatomy 0.000 description 9
- 229930024421 Adenine Natural products 0.000 description 8
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 8
- 102000004190 Enzymes Human genes 0.000 description 8
- 108090000790 Enzymes Proteins 0.000 description 8
- 108010077850 Nuclear Localization Signals Proteins 0.000 description 8
- 229960000643 adenine Drugs 0.000 description 8
- 230000035772 mutation Effects 0.000 description 8
- 230000008685 targeting Effects 0.000 description 8
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 7
- 108020004566 Transfer RNA Proteins 0.000 description 7
- 230000001419 dependent effect Effects 0.000 description 6
- 230000009261 transgenic effect Effects 0.000 description 6
- 238000012049 whole transcriptome sequencing Methods 0.000 description 5
- 108700010070 Codon Usage Proteins 0.000 description 4
- 108020004414 DNA Proteins 0.000 description 4
- 241000620209 Escherichia coli DH5[alpha] Species 0.000 description 4
- 108090000848 Ubiquitin Proteins 0.000 description 4
- 102000044159 Ubiquitin Human genes 0.000 description 4
- 230000003321 amplification Effects 0.000 description 4
- 239000003242 anti bacterial agent Substances 0.000 description 4
- 229940088710 antibiotic agent Drugs 0.000 description 4
- 230000001580 bacterial effect Effects 0.000 description 4
- 239000008367 deionised water Substances 0.000 description 4
- 229910021641 deionized water Inorganic materials 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000010362 genome editing Methods 0.000 description 4
- 210000005260 human cell Anatomy 0.000 description 4
- 230000001404 mediated effect Effects 0.000 description 4
- 238000007481 next generation sequencing Methods 0.000 description 4
- 238000003199 nucleic acid amplification method Methods 0.000 description 4
- 239000007787 solid Substances 0.000 description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Chemical compound O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 241000219194 Arabidopsis Species 0.000 description 3
- 108091027544 Subgenomic mRNA Proteins 0.000 description 3
- 240000008042 Zea mays Species 0.000 description 3
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 3
- 230000029087 digestion Effects 0.000 description 3
- 238000000034 method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000007480 sanger sequencing Methods 0.000 description 3
- 241000589158 Agrobacterium Species 0.000 description 2
- 108700028369 Alleles Proteins 0.000 description 2
- 108700007698 Genetic Terminator Regions Proteins 0.000 description 2
- 102100036263 Glutamyl-tRNA(Gln) amidotransferase subunit C, mitochondrial Human genes 0.000 description 2
- 101001001786 Homo sapiens Glutamyl-tRNA(Gln) amidotransferase subunit C, mitochondrial Proteins 0.000 description 2
- 101000642438 Oryza sativa subsp. japonica Squamosa promoter-binding-like protein 14 Proteins 0.000 description 2
- 241000193996 Streptococcus pyogenes Species 0.000 description 2
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 2
- 235000016383 Zea mays subsp huehuetenangensis Nutrition 0.000 description 2
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 2
- 150000001413 amino acids Chemical class 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000005782 double-strand break Effects 0.000 description 2
- 210000002257 embryonic structure Anatomy 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 235000009973 maize Nutrition 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 102000042567 non-coding RNA Human genes 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- UHDGCWIWMRVCDJ-UHFFFAOYSA-N 1-beta-D-Xylofuranosyl-NH-Cytosine Natural products O=C1N=C(N)C=CN1C1C(O)C(O)C(CO)O1 UHDGCWIWMRVCDJ-UHFFFAOYSA-N 0.000 description 1
- 108700040115 Adenosine deaminases Proteins 0.000 description 1
- 241000589155 Agrobacterium tumefaciens Species 0.000 description 1
- 108091093088 Amplicon Proteins 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- 238000010354 CRISPR gene editing Methods 0.000 description 1
- UHDGCWIWMRVCDJ-PSQAKQOGSA-N Cytidine Natural products O=C1N=C(N)C=CN1[C@@H]1[C@@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-PSQAKQOGSA-N 0.000 description 1
- 230000008836 DNA modification Effects 0.000 description 1
- 101100364969 Dictyostelium discoideum scai gene Proteins 0.000 description 1
- 108010042407 Endonucleases Proteins 0.000 description 1
- 102000004533 Endonucleases Human genes 0.000 description 1
- YQYJSBFKSSDGFO-UHFFFAOYSA-N Epihygromycin Natural products OC1C(O)C(C(=O)C)OC1OC(C(=C1)O)=CC=C1C=C(C)C(=O)NC1C(O)C(O)C2OCOC2C1O YQYJSBFKSSDGFO-UHFFFAOYSA-N 0.000 description 1
- 229940113491 Glycosylase inhibitor Drugs 0.000 description 1
- 101710091251 Growth-regulating factor 1 Proteins 0.000 description 1
- 235000007340 Hordeum vulgare Nutrition 0.000 description 1
- 240000005979 Hordeum vulgare Species 0.000 description 1
- 206010020649 Hyperkeratosis Diseases 0.000 description 1
- 108700005090 Lethal Genes Proteins 0.000 description 1
- 101100364971 Mus musculus Scai gene Proteins 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 241001426452 Oryza sativa alphaendornavirus Species 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 108700001094 Plant Genes Proteins 0.000 description 1
- 241000209504 Poaceae Species 0.000 description 1
- 240000000111 Saccharum officinarum Species 0.000 description 1
- 235000007201 Saccharum officinarum Nutrition 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 240000006394 Sorghum bicolor Species 0.000 description 1
- 235000011684 Sorghum saccharatum Nutrition 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 244000098338 Triticum aestivum Species 0.000 description 1
- 102000006943 Uracil-DNA Glycosidase Human genes 0.000 description 1
- 108010072685 Uracil-DNA Glycosidase Proteins 0.000 description 1
- 101710172430 Uracil-DNA glycosylase inhibitor Proteins 0.000 description 1
- 108700041896 Zea mays Ubi-1 Proteins 0.000 description 1
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 235000019877 cocoa butter equivalent Nutrition 0.000 description 1
- 235000005822 corn Nutrition 0.000 description 1
- UHDGCWIWMRVCDJ-ZAKLUEHWSA-N cytidine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-ZAKLUEHWSA-N 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000009615 deamination Effects 0.000 description 1
- 238000006481 deamination reaction Methods 0.000 description 1
- 238000001976 enzyme digestion Methods 0.000 description 1
- 238000013401 experimental design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000006801 homologous recombination Effects 0.000 description 1
- 238000002744 homologous recombination Methods 0.000 description 1
- 239000003112 inhibitor Substances 0.000 description 1
- 230000006780 non-homologous end joining Effects 0.000 description 1
- 239000002777 nucleoside Substances 0.000 description 1
- 150000003833 nucleoside derivatives Chemical class 0.000 description 1
- 230000009145 protein modification Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 230000000392 somatic effect Effects 0.000 description 1
- 230000009870 specific binding Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 230000010474 transient expression Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 229940035893 uracil Drugs 0.000 description 1
- 108700026220 vif Genes Proteins 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/78—Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/82—Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
- C12N15/8201—Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
- C12N15/8202—Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation by biological means, e.g. cell mediated or natural vector
- C12N15/8205—Agrobacterium mediated transformation
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/82—Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
- C12N15/8216—Methods for controlling, regulating or enhancing expression of transgenes in plant cells
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y305/00—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
- C12Y305/04—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
- C12Y305/04004—Adenosine deaminase (3.5.4.4)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y305/00—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
- C12Y305/04—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
- C12Y305/04005—Cytidine deaminase (3.5.4.5)
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Microbiology (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Physics & Mathematics (AREA)
- Cell Biology (AREA)
- Medicinal Chemistry (AREA)
- Enzymes And Modification Thereof (AREA)
Abstract
Description
技术领域Technical Field
本发明属于基因工程技术领域,具体涉及一种来源于腺苷脱氨酶的胞嘧啶碱基编辑系统。The present invention belongs to the technical field of genetic engineering, and specifically relates to a cytosine base editing system derived from adenosine deaminase.
背景技术Background technique
CRISPR-Cas技术是一项广泛应用的基因编辑技术,通过RNA引导实现对基因组上特定靶序列的特异性结合并切割DNA,形成双链断裂,然后利用生物非同源末端连接或同源重组进行定点基因编辑。CRISPR-Cas technology is a widely used gene editing technology that uses RNA to guide specific binding to specific target sequences on the genome and cut DNA to form double-strand breaks, and then uses biological non-homologous end joining or homologous recombination to perform site-specific gene editing.
单碱基编辑工具的开发使得在不引起DNA双链断裂的情况下能够对特定基因位点进行精确修饰,其基本原理是将胞苷脱氨酶或腺苷脱氨酶与Cas蛋白融合,依赖于CRISPR原理实现对靶点的单个碱基的修改。单碱基编辑系统中,碱基编辑器主要由两个部分组成:Cas蛋白和DNA修饰酶,目前最主要的碱基编辑器有两类:基于胞苷脱氨酶的胞嘧啶碱基编辑器(CBE),实现从C到T的转化;基于腺苷脱氨酶的腺嘌呤碱基编辑器(ABE),实现从A到G的转化。The development of single-base editing tools enables precise modification of specific gene sites without causing double-strand breaks in DNA. The basic principle is to fuse cytidine deaminase or adenosine deaminase with Cas protein, relying on the CRISPR principle to achieve single base modification of the target. In the single-base editing system, the base editor is mainly composed of two parts: Cas protein and DNA modification enzyme. Currently, there are two main types of base editors: cytosine base editor (CBE) based on cytidine deaminase, which realizes the conversion from C to T; adenine base editor (ABE) based on adenosine deaminase, which realizes the conversion from A to G.
已经报道的CBE系统包括:rAPOBEC1、hAID、PmCDA1、hAPOBEC3A和hAPOBEC3B等,胞苷脱氨酶和Cas蛋白相结合,可以实现从C到T碱基编辑,在此基础上加入一个或多个尿嘧啶DNA糖基化酶抑制物(UGI)可以进一步提高从C到T碱基编辑的编辑效率和产物纯度。目前已经发现的胞嘧啶碱基编辑系统已经可以在特定位点进行有效的精准编辑,但有研究表明,rAPOBEC1-CBE在小鼠胚胎和水稻植株中检测到全基因组的脱靶效应(Zuo等,Cytosinebase editor generates substantial off-target single-nucleotide variants inmouse embryos.2019,Science)(Jin等,Cytosine,but not adenine,base editorsinduce genome-wide off-target mutations in rice.2019,Science)。研究者们致力于寻找编辑效率更高、编辑特异性更优、普适性更好的胞嘧啶碱基编辑系统。The CBE systems that have been reported include: rAPOBEC1, hAID, PmCDA1, hAPOBEC3A and hAPOBEC3B. The combination of cytidine deaminase and Cas protein can achieve base editing from C to T. On this basis, the addition of one or more uracil DNA glycosylase inhibitors (UGI) can further improve the editing efficiency and product purity of base editing from C to T. The cytosine base editing systems that have been discovered can already perform effective and precise editing at specific sites, but studies have shown that rAPOBEC1-CBE has detected genome-wide off-target effects in mouse embryos and rice plants (Zuo et al., Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos. 2019, Science) (Jin et al., Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. 2019, Science). Researchers are committed to finding cytosine base editing systems with higher editing efficiency, better editing specificity and better universality.
ABE系统中的腺苷脱氨酶包括TadA-7.10、TadA-8e和TadA-9,这些ABE能够在人类细胞和植物中进行有效的从A到G碱基编辑。The adenosine deaminases in the ABE system include TadA-7.10, TadA-8e and TadA-9, which can perform efficient A to G base editing in human cells and plants.
发明内容Summary of the invention
本发明的目的是为了拓展植物中的CBE工具库,发掘具有更优编辑效率和编辑特异性的CBE编辑器。The purpose of the present invention is to expand the CBE tool library in plants and discover CBE editors with better editing efficiency and editing specificity.
本发明提供了一种TadA-CDa脱氨酶,具有胞苷脱氨酶作用;所述TadA-CDa脱氨酶的编码序列如SEQ ID No.1所示。The present invention provides a TadA-CDa deaminase having a cytidine deaminase function; the coding sequence of the TadA-CDa deaminase is shown in SEQ ID No.1.
本发明的技术方案是具备胞嘧啶碱基编辑功能的融合蛋白TadCBEa,包括TadA-CDa脱氨酶与Cas蛋白,结构为TadA-CDa脱氨酶-Cas蛋白或Cas蛋白-TadA-CDa脱氨酶;所述TadA-CDa脱氨酶为将具有腺苷脱氨酶作用的TadA-8e改造为行使胞苷脱氨酶作用。The technical solution of the present invention is a fusion protein TadCBEa with cytosine base editing function, including TadA-CDa deaminase and Cas protein, with a structure of TadA-CDa deaminase-Cas protein or Cas protein-TadA-CDa deaminase; the TadA-CDa deaminase is TadA-8e with adenosine deaminase function modified to perform cytidine deaminase function.
进一步的,所述TadA-CDa脱氨酶的编码序列如SEQ ID No.1所示。Furthermore, the coding sequence of the TadA-CDa deaminase is shown in SEQ ID No.1.
特别的,所述融合蛋白TadCBEa的任一端还融合了1个或2个UGI。Particularly, one or two UGIs are fused to either end of the fusion protein TadCBEa.
具体的,所述融合蛋白TadCBEa的编码序列如如SEQ ID No.2第2065~6810位所示。Specifically, the coding sequence of the fusion protein TadCBEa is shown in positions 2065 to 6810 of SEQ ID No.2.
本发明还提供了来源于腺苷脱氨酶的胞嘧啶碱基编辑系统,包含TadA-CDa脱氨酶与Cas蛋白融合构建成的融合蛋白TadCBEa,融合蛋白TadCBEa的结构为TadA-CDa脱氨酶-Cas蛋白或Cas蛋白-TadA-CDa脱氨酶;所述TadA-CDa脱氨酶为将具有腺苷脱氨酶活性的TadA-8e改造为行使胞苷脱氨酶作用。The present invention also provides a cytosine base editing system derived from adenosine deaminase, comprising a fusion protein TadCBEa constructed by fusing TadA-CDa deaminase with Cas protein, wherein the structure of the fusion protein TadCBEa is TadA-CDa deaminase-Cas protein or Cas protein-TadA-CDa deaminase; the TadA-CDa deaminase is TadA-8e with adenosine deaminase activity modified to perform cytidine deaminase function.
进一步的,所述TadA-CDa脱氨酶的编码序列如SEQ ID No.1所示。Furthermore, the coding sequence of the TadA-CDa deaminase is shown in SEQ ID No.1.
特别的,所述TadCBEa的结构为TadA-CDa-Cas-2×UGI。Particularly, the structure of the TadCBEa is TadA-CDa-Cas-2×UGI.
进一步的,所述TadCBEa的结构为启动子TadA-CDa-Cas-2×UGI-终止子。Furthermore, the structure of the TadCBEa is promoter TadA-CDa-Cas-2×UGI-terminator.
进一步的,所述启动子、TadA-CDa、Cas、2×UGI和终止子相互之间还设有NLS。Furthermore, NLS is provided between the promoter, TadA-CDa, Cas, 2×UGI and terminator.
其中,所述Cas为Cas9、Cas12a或Cas12b。Wherein, the Cas is Cas9, Cas12a or Cas12b.
具体的,所述Cas9为nCas9。Specifically, the Cas9 is nCas9.
其中,所述胞嘧啶碱基编辑系统还包括crRNA转录表达单元。Among them, the cytosine base editing system also includes a crRNA transcription expression unit.
具体的,胞嘧啶碱基编辑系统的结构为融合蛋白TadCBEa-crRNA转录表达单元或crRNA转录表达单元-融合蛋白TadCBEa。Specifically, the structure of the cytosine base editing system is a fusion protein TadCBEa-crRNA transcription expression unit or a crRNA transcription expression unit-fusion protein TadCBEa.
进一步的,所述crRNA转录表达单元结构为启动子-crRNA scaffold-终止子。Furthermore, the crRNA transcription expression unit structure is promoter-crRNA scaffold-terminator.
特别的,所述启动子为OsUbi1、ZmUbi1或P35S。Particularly, the promoter is OsUbil, ZmUbil or P35S.
其中,所述终止子为pinII、AtHSP、NOS或T35S。Wherein, the terminator is pinII, AtHSP, NOS or T35S.
优选的,所述融合蛋白TadCBEa-crRNA转录表达单元的核苷酸序列如SEQ ID No.2所示。Preferably, the nucleotide sequence of the fusion protein TadCBEa-crRNA transcription expression unit is shown in SEQ ID No.2.
本发明还提供了表达所述TadA-CDa脱氨酶或融合蛋白TadCBEa的载体、细胞或宿主。The present invention also provides a vector, a cell or a host for expressing the TadA-CDa deaminase or the fusion protein TadCBEa.
进一步的,所述载体为植物表达载体。Furthermore, the vector is a plant expression vector.
本发明还提供了含有所述胞嘧啶碱基编辑系统的载体、细胞或宿主。The present invention also provides a vector, a cell or a host containing the cytosine base editing system.
进一步的,所述载体为植物表达载体。Furthermore, the vector is a plant expression vector.
本发明的有益效果:本发明对腺苷脱氨酶TadA-8e进行工程化改造,将本身具有腺苷脱氨酶作用的TadA-8e改造为行使胞苷脱氨酶作用的TadA-CDa。TadA-CDa脱氨酶与Cas蛋白融合构建成胞嘧啶碱基编辑器TadCBEa,与融合腺苷脱氨酶TadA-8e的腺嘌呤碱基编辑器ABE8e行使从A到G的碱基编辑作用不同,TadCBEa能够实现从C到T的碱基编辑功能。脱氨酶的作用核苷酸由腺苷A转变为胞苷C,但其编辑窗口和编辑特异性未发生变化,也就是说TadCBEa具有与ABE8e相同的编辑窗口和高编辑特异性。与目前最主流的的胞嘧啶碱基编辑器A3A_Y130F-CBE相比,在水稻细胞中TadCBEa的从C到T编辑效率与其相当,同时在水稻基因组和转录组中均未检测到TadCBEa引起的脱靶效应,较A3A_Y130F-CBE来说具有更高的编辑特异性,是一种高活性、高特异性的胞嘧啶碱基编辑工具。TadA-CDa的蛋白分子量为166个氨基酸,与A3A_Y130F的199个氨基酸相比蛋白更小,有利于表达载体向目标物种的递送及后续编辑事件的发生。TadCBEa能够实现有效的从C到T的碱基编辑并且在全基因组DNA水平和转录组RNA水平上几乎检测不到脱靶编辑。本发明提供的TadCBEa拓展了植物的碱基编辑工具库,在植物种质资源开发和基因功能研究方面具有很好的应用前景。TadCBEa的应用将为基础生物学和医学研究提供更多选择。Beneficial effects of the present invention: The present invention engineers the adenosine deaminase TadA-8e, and transforms TadA-8e, which has the function of adenosine deaminase, into TadA-CDa, which functions as a cytidine deaminase. The TadA-CDa deaminase is fused with the Cas protein to construct the cytosine base editor TadCBEa. Unlike the adenine base editor ABE8e fused with the adenosine deaminase TadA-8e, which performs the base editing function from A to G, TadCBEa can realize the base editing function from C to T. The active nucleotide of the deaminase is converted from adenosine A to cytidine C, but its editing window and editing specificity have not changed, that is, TadCBEa has the same editing window and high editing specificity as ABE8e. Compared with the most mainstream cytosine base editor A3A_Y130F-CBE, the editing efficiency of TadCBEa from C to T in rice cells is comparable to that of A3A_Y130F-CBE. At the same time, no off-target effects caused by TadCBEa were detected in the rice genome and transcriptome. It has higher editing specificity than A3A_Y130F-CBE and is a highly active and specific cytosine base editing tool. The protein molecular weight of TadA-CDa is 166 amino acids, which is smaller than the 199 amino acids of A3A_Y130F, which is conducive to the delivery of expression vectors to target species and the occurrence of subsequent editing events. TadCBEa can achieve effective base editing from C to T and almost no off-target editing can be detected at the whole genome DNA level and transcriptome RNA level. The TadCBEa provided by the present invention expands the base editing tool library of plants and has good application prospects in plant germplasm resource development and gene function research. The application of TadCBEa will provide more options for basic biological and medical research.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1、TadCBEa的载体示意图,A3A_Y130F-CBE和ABE8e作为对照组。A3A_Y130F-CBE:含胞苷脱氨酶APOBEC3A_Y130F的胞嘧啶碱基编辑器;ABE8e:含腺苷脱氨酶TadA-8e的腺嘌呤碱基编辑器;TadCBEa:含TadA-CDa的胞嘌呤碱基编辑器;A3A_Y130F-CBE、ABE8e和TadCBEa碱基编辑载体:针对20个水稻内源位点设计了两个表达载体,每个表达载体各靶向10个sgRNA位点;pZmUbi1:玉米Ubiquitin 1启动子;NLS:核定位信号;nCas9:单链失活的化脓链球菌来源的Cas9蛋白基因;UGI:尿嘧啶DNA糖基化酶抑制剂;tHSP:拟南芥HSP终止子;pOsUbi1:水稻Ubiquitin 1启动子;tRNA:转运RNA基因;sgRNA:单向导RNA;tpinⅡ:pinⅡ终止子。Figure 1. Schematic diagram of the vector of TadCBEa, with A3A_Y130F-CBE and ABE8e as control groups. A3A_Y130F-CBE: cytosine base editor containing cytidine deaminase APOBEC3A_Y130F; ABE8e: adenine base editor containing adenosine deaminase TadA-8e; TadCBEa: cytosine base editor containing TadA-CDa; A3A_Y130F-CBE, ABE8e and TadCBEa base editing vectors: two expression vectors were designed for 20 endogenous rice sites, each targeting 10 sgRNA sites; pZmUbi1: maize Ubiquitin 1 promoter; NLS: nuclear localization signal; nCas9: single-stranded inactivated Streptococcus pyogenes-derived Cas9 protein gene; UGI: uracil DNA glycosylase inhibitor; tHSP: Arabidopsis HSP terminator; pOsUbi1: rice Ubiquitin 1 promoter; tRNA: transfer RNA gene; sgRNA: single guide RNA; tpinⅡ: pinⅡ terminator.
图2、基于水稻原生质体瞬时转化体系的TadCBEa在20个内源位点处的碱基编辑活性检测结果。A:20个靶位点处不同碱基编辑系统的从C到T编辑效率、从A到G编辑效率和Indel效率;WT:未转入表达载体;A3A_Y130F:转入A3A_Y130F-CBE表达载体;ABE8e:转入ABE8e表达载体;TadCBEa:转入TadCBEa表达载体。B:汇总20个内源位点的编辑效率,WT、A3A_Y130F-CBE、ABE8e和TadCBEa四种不同处理的水稻细胞中的从C到T编辑效率和Indel效率。Figure 2. Results of base editing activity detection of TadCBEa at 20 endogenous sites based on rice protoplast transient transformation system. A: C to T editing efficiency, A to G editing efficiency and Indel efficiency of different base editing systems at 20 target sites; WT: not transferred into expression vector; A3A_Y130F: transferred into A3A_Y130F-CBE expression vector; ABE8e: transferred into ABE8e expression vector; TadCBEa: transferred into TadCBEa expression vector. B: Summary of editing efficiency of 20 endogenous sites, C to T editing efficiency and Indel efficiency in rice cells treated with four different methods: WT, A3A_Y130F-CBE, ABE8e and TadCBEa.
图3、基于水稻原生质体转化的不同碱基编辑系统在sgRNA区域内发生从C到T和从A到G碱基编辑的编辑窗口。A3A_Y130F:转入A3A_Y130F-CBE表达载体,经过水稻原生质体瞬时转化的水稻细胞中的编辑窗口;ABE8e:转入ABE8e表达载体,经过水稻原生质体瞬时转化的水稻细胞中的编辑窗口;TadCBEa:转入TadCBEa表达载体,经过水稻原生质体瞬时转化的水稻细胞中的编辑窗口。Figure 3. Editing windows of base editing from C to T and from A to G in the sgRNA region in different base editing systems based on rice protoplast transformation. A3A_Y130F: Editing window in rice cells transiently transformed by rice protoplasts after being transferred into the A3A_Y130F-CBE expression vector; ABE8e: Editing window in rice cells transiently transformed by rice protoplasts after being transferred into the ABE8e expression vector; TadCBEa: Editing window in rice cells transiently transformed by rice protoplasts after being transferred into the TadCBEa expression vector.
图4、基于水稻稳定转化和水稻全基因组测序的不同碱基编辑系统在稳定转化水稻植株中的编辑效率评价。A:TadCBEa的表达载体示意图,pZmUbi1:玉米Ubiquitin 1启动子;TadA-CDa:TadA-8e腺苷脱氨酶的变体;NLS:核定位信号;nCas9:单链失活的化脓链球菌来源的Cas9蛋白基因;UGI:尿嘧啶糖基化酶抑制剂;tHSP:拟南芥HSP终止子;pOsUbi1:水稻Ubiquitin 1启动子;tRNA:转运RNA基因;sgRNA:单向导RNA;tpinⅡ:pinⅡ终止子。B:再生植株进行Sanger测序和全基因组测序的实验流程图和测序数据的分析流程图。CK:对照组;Editor:碱基编辑器;Tissue(with Agro):转入不含脱氨酶、Cas9蛋白和sgRNA表达框的转基因水稻植株,作为组培对照;A3A_Y130F-CBE:转入A3A_Y130F-CBE碱基编辑系统的稳定转化水稻植株;TadCBEa:转入TadCBEa碱基编辑系统的稳定转化水稻植株;Raw Reads:全基因组测序的原始数据;BWA:一个用于分析测序结果的软件名称。C:Tissue、ABE8e和TadCBEa稳定转化水稻植株中的从C到T编辑效率和纯合C到T编辑效率。D:TadCBEa稳定转化水稻植株的不同靶位点经C到T碱基编辑后的序列展示。Figure 4. Evaluation of editing efficiency of different base editing systems in stable transformed rice plants based on rice stable transformation and rice whole genome sequencing. A: Schematic diagram of the expression vector of TadCBEa, pZmUbi1: maize Ubiquitin 1 promoter; TadA-CDa: variant of TadA-8e adenosine deaminase; NLS: nuclear localization signal; nCas9: single-stranded inactivated Cas9 protein gene from Streptococcus pyogenes; UGI: uracil glycosylase inhibitor; tHSP: Arabidopsis HSP terminator; pOsUbi1: rice Ubiquitin 1 promoter; tRNA: transfer RNA gene; sgRNA: single guide RNA; tpinⅡ: pinⅡ terminator. B: Experimental flow chart of Sanger sequencing and whole genome sequencing of regenerated plants and analysis flow chart of sequencing data. CK: control group; Editor: base editor; Tissue (with Agro): transgenic rice plants without deaminase, Cas9 protein and sgRNA expression frame, as tissue culture control; A3A_Y130F-CBE: stably transformed rice plants transformed with A3A_Y130F-CBE base editing system; TadCBEa: stably transformed rice plants transformed with TadCBEa base editing system; Raw Reads: raw data of whole genome sequencing; BWA: a software name for analyzing sequencing results. C: C to T editing efficiency and homozygous C to T editing efficiency in Tissue, ABE8e and TadCBEa stably transformed rice plants. D: Sequence display of different target sites of TadCBEa stably transformed rice plants after C to T base editing.
图5、基于水稻稳定转化、全基因组测序和转录组测序的TadCBEa编辑特异性分析。A:水稻稳定转化流程图,全基因组测序和转录组测序的样品和测序数据分析流程图。B:基于全基因组测序数据,以野生型水稻基因组为对照进行比对,分析得到的Tissue、A3A_Y130F和TadCBEa稳定转化水稻植株中的Indel数量。C:Tissue、A3A_Y130F和TadCBEa稳定转化水稻植株中的SNV数量。D:Tissue、A3A_Y130F和TadCBEa稳定转化水稻植株中不同类型的SNV的数量。E:Tissue、A3A_Y130F和TadCBEa稳定转化水稻植株中C到T SNV的数量。F:Tissue、A3A_Y130F和TadCBEa稳定转化水稻植株中的不同类型的SNV在全部SNV中所占的比例。G:TadCBEa稳定转化水稻植株中TadCBEa碱基编辑系统的编辑基序偏好性。Figure 5. Analysis of TadCBEa editing specificity based on rice stable transformation, whole genome sequencing, and transcriptome sequencing. A: Flowchart of rice stable transformation, sample and sequencing data analysis flow chart for whole genome sequencing and transcriptome sequencing. B: Based on whole genome sequencing data, the number of Indels in Tissue, A3A_Y130F, and TadCBEa stably transformed rice plants analyzed by comparison with the wild-type rice genome. C: The number of SNVs in Tissue, A3A_Y130F, and TadCBEa stably transformed rice plants. D: The number of different types of SNVs in Tissue, A3A_Y130F, and TadCBEa stably transformed rice plants. E: The number of C to T SNVs in Tissue, A3A_Y130F, and TadCBEa stably transformed rice plants. F: The proportion of different types of SNVs in Tissue, A3A_Y130F, and TadCBEa stably transformed rice plants to all SNVs. G: Editing motif preference of the TadCBEa base editing system in TadCBEa-stably transformed rice plants.
图6、基于水稻稳定转化和水稻全基因组测序的TadCBEa稳定转化水稻植株中发生的依赖于sgRNA序列的脱靶效应检测。A:TadCBEa_OsALS:TadCBE稳定转化水稻植株在靶位点OsALS的脱靶分析;target gRNA:靶向的gRNA序列;allele1:等位基因1;allele2:等位基因2。B:TadCBEa_OsSPL14:TadCBE稳定转化水稻植株在靶位点OsSPL14的脱靶分析。Figure 6. Detection of off-target effects dependent on sgRNA sequence in TadCBEa stably transformed rice plants based on rice stable transformation and rice whole genome sequencing. A: TadCBEa_OsALS: off-target analysis of TadCBE stably transformed rice plants at the target site OsALS; target gRNA: targeted gRNA sequence; allele1: allele 1; allele2: allele 2. B: TadCBEa_OsSPL14: off-target analysis of TadCBE stably transformed rice plants at the target site OsSPL14.
图7、基于水稻稳定转化和转录组测序的TadCBEa编辑特异性分析。A:Tissue、A3A_Y130F和TadCBEa稳定转化水稻植株中RNA水平上的SNVs数量。B:Tissue、A3A_Y130F和TadCBEa稳定转化水稻植株中RNA水平上的C到U SNV的数量。C:Tissue、A3A_Y130F和TadCBEa稳定转化水稻植株中的RNA水平上的不同类型的SNV在全部SNV中所占的比例。Figure 7. Analysis of TadCBEa editing specificity based on rice stable transformation and transcriptome sequencing. A: Number of SNVs at the RNA level in rice plants stably transformed with tissue, A3A_Y130F, and TadCBEa. B: Number of C to U SNVs at the RNA level in rice plants stably transformed with tissue, A3A_Y130F, and TadCBEa. C: Proportion of different types of SNVs at the RNA level in rice plants stably transformed with tissue, A3A_Y130F, and TadCBEa in total SNVs.
具体实施方式Detailed ways
为了拓展植物中的CBE工具库,发掘具有更优编辑效率和编辑特异性的CBE,本发明对进化得到的胞苷脱氨酶TadA-CDa进行水稻密码子偏好性优化,在水稻细胞中检测TadCBEa的碱基编辑特性,并对水稻稳定转化获得的再生苗进行全基因组测序和转录组测序,以评估TadCBEa在稳定转化再生植株中的编辑效率和编辑特异性。TadCBEa的应用将为基础生物学和医学研究提供更多选择。In order to expand the CBE tool library in plants and discover CBEs with better editing efficiency and editing specificity, the present invention optimizes the rice codon preference of the evolved cytidine deaminase TadA-CDa, detects the base editing characteristics of TadCBEa in rice cells, and performs whole genome sequencing and transcriptome sequencing on the regenerated seedlings obtained by stable transformation of rice to evaluate the editing efficiency and editing specificity of TadCBEa in stable transformation regenerated plants. The application of TadCBEa will provide more options for basic biological and medical research.
如前文背景介绍所述,目前有多种植物胞嘧啶碱基编辑器可供使用,包括rAPOBEC1、PmCDA1、hAPOBEC3A等,可以实现植物基因组特定位点的从C到T碱基编辑,但目前的胞嘧啶碱基编辑器如rAPOBEC1-CBE会在非靶位点处产生脱靶(jin等人,Cytosine,butnot adenine,base editors induce genome-wide off-target mutations inrice.2019,Science),从而导致基因组安全性问题。目前所用的腺嘌呤碱基编辑器主要为TadA-8e,其能够实现靶位点从A到G的碱基编辑,且具有较高的编辑特异性。As mentioned in the background introduction above, there are currently a variety of plant cytosine base editors available, including rAPOBEC1, PmCDA1, hAPOBEC3A, etc., which can achieve base editing from C to T at specific sites in the plant genome, but current cytosine base editors such as rAPOBEC1-CBE will produce off-target mutations at non-target sites (jin et al., Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. 2019, Science), which leads to genome safety issues. The adenine base editor currently used is mainly TadA-8e, which can achieve base editing from A to G at the target site and has high editing specificity.
考虑到胞苷脱氨酶和腺苷脱氨酶行使的都是核苷脱氨的作用,是否能够将特异性较高的腺苷脱氨酶定向进化为具有较高编辑特异性的胞苷脱氨酶,以此获得编辑性能更优的胞嘧啶碱基编辑器。目前已经有研究表明通过对腺苷脱氨酶TadA-8e的定向进化获得了在人类细胞中具有良好编辑性能的腺苷脱氨酶变体TadA-CDa,定向进化产物TadA-CDa获得了胞苷脱氨酶的特性,所以可以列为胞苷脱氨酶的一种。为了拓展植物碱基编辑工具库,获得具有更高编辑活性和更高编辑特异性的胞嘧啶碱基编辑器,使用了在人类细胞中表现良好的胞苷脱氨酶TadA-CDa与Cas9蛋白相融合,获得TadA来源的胞嘧啶碱基编辑器a(TadA-derived Cytosine Base Editor a,TadCBEa),接着在水稻细胞中系统评价了TadCBEa在水稻细胞和水稻稳定植株中的编辑活性和编辑特异性,为植物碱基编辑提供了一个具有较高编辑活性和更高编辑特异性的有力工具。Considering that both cytidine deaminase and adenosine deaminase perform the function of nucleoside deamination, is it possible to direct the evolution of adenosine deaminase with higher specificity into cytidine deaminase with higher editing specificity, so as to obtain a cytosine base editor with better editing performance. At present, studies have shown that through the directed evolution of adenosine deaminase TadA-8e, an adenosine deaminase variant TadA-CDa with good editing performance in human cells was obtained. The directed evolution product TadA-CDa has the characteristics of cytidine deaminase, so it can be classified as a type of cytidine deaminase. In order to expand the library of plant base editing tools and obtain cytosine base editors with higher editing activity and higher editing specificity, the cytidine deaminase TadA-CDa, which performs well in human cells, was fused with Cas9 protein to obtain TadA-derived Cytosine Base Editor a (TadCBEa). The editing activity and editing specificity of TadCBEa in rice cells and stable rice plants were then systematically evaluated in rice cells, providing a powerful tool with higher editing activity and higher editing specificity for plant base editing.
发明人在上述基础上,考虑到跨物种的差异,植物在转录翻译表达系统与人类细胞的不同,首先对胞苷脱氨酶TadA-CDa的编码序列进行密码子偏好性优化,得到更适用于植物物种(尤其是禾本科植物,例如水稻、玉米、高粱、大麦、小麦、甘蔗等)的TadA-CDa编码序列(Seq ID No.1)。接着将TadA-CDa的编码序列与Cas9蛋白序列、UGI序列、sgRNA表达框集合到一个表达载体上,并在水稻原生质体细胞和水稻植株中分别检测TadCBEa的编辑性能,其编辑效率与现有最好的A3A_Y130F-CBE胞嘧啶碱基编辑器不相上下,且编辑特异性明显优于A3A_Y130F-CBE。On the basis of the above, the inventors first optimized the codon preference of the coding sequence of cytidine deaminase TadA-CDa, taking into account the differences between species and the differences between plants and human cells in transcriptional translation expression systems, and obtained a TadA-CDa coding sequence (Seq ID No. 1) that is more suitable for plant species (especially Gramineae, such as rice, corn, sorghum, barley, wheat, sugarcane, etc.). Then, the coding sequence of TadA-CDa was combined with the Cas9 protein sequence, UGI sequence, and sgRNA expression frame into an expression vector, and the editing performance of TadCBEa was tested in rice protoplast cells and rice plants, respectively. Its editing efficiency is comparable to the best existing A3A_Y130F-CBE cytosine base editor, and its editing specificity is significantly better than A3A_Y130F-CBE.
下面结合实施例对本发明作进一步说明,以下实施例旨在说明本发明而不是对本发明的进一步限定,不应以此限制本发明的保护范围。The present invention will be further described below in conjunction with examples. The following examples are intended to illustrate the present invention rather than to further limit the present invention, and should not be used to limit the scope of protection of the present invention.
实施例1TadCBEa胞嘧啶碱基编辑系统载体构建Example 1 Construction of TadCBEa cytosine base editing system vector
1、TadA-CDa编码序列的优化及合成1. Optimization and synthesis of TadA-CDa coding sequence
参考Neugebauer等2022年在Nature Biotechnology期刊(Neugebauer等人,Evolution of an adenine base editor into a small,efficient cytosine baseeditor with low off-target activity.2022,Nature Biotechnology)发表的TadA-CDa编码序列,我们根据水稻密码子偏好性进行优化(Seq ID No.1),使其更适用于在水稻中的表达。密码子优化后的TadA-CDa编码序列委托擎科公司进行DNA序列的合成。Referring to the TadA-CDa coding sequence published by Neugebauer et al. in Nature Biotechnology in 2022 (Neugebauer et al., Evolution of an adenine base editor into a small, efficient cytosine base editor with low off-target activity. 2022, Nature Biotechnology), we optimized it according to the codon preference of rice (Seq ID No. 1) to make it more suitable for expression in rice. The codon-optimized TadA-CDa coding sequence was commissioned to Qingke Company for DNA sequence synthesis.
Seq ID No.1TadA-CDa经水稻密码子偏好性优化后的核苷酸序列;Seq ID No.1TadA-CDa nucleotide sequence after optimization of rice codon preference;
TCTGAGGTGGAGTTTTCCCACGAGTACTGGATGAGACATGCCCTGACCCTGGCCAAGAGGGCACGGGATGAAGGCGCGGGGCCTGTGGGAGCCGTGCTGGTGCTGAACAATAGAGTGATCGGCGAGGGCTGGAACAGAGCCATCGGCCTGCACGACCCAACAGCCCATGCCGAAATTATGGCCCTGAGACAGGGCGGCCTGGTCATGCAGAACTACAGACTCTTTGACGCCACCCTGTACGTGACATTCGAGCCTTGCGTGATGTGCGCCGGCGCCATGATCAACTCTAGGATCGGCCGCGTGGTGTTTGGCGTGAGGAACTCAAAAAGAGGCGCCGCAGGCTCCCTGATGAACGTGCTGAACTACCCCGGCATGAATCACCGCGTCGAAATTACCGAGGGAATCCTGGCAGATGAATGTGCCGCCCTGCTGTGCGATTTCTATCGGATGCCTAGACAGGTGTTCAACAGCCAGAAGAAGGCCCAGAGCTCCATCAAC。TCTGAGGTGGAGTTTTCCCACGAGTACTGGATGAGACATGCCCTGACCCTGGCCAAGAGGGCACGGGATGAAGGCGCGGGGCCTGTGGGAGCCGTGCTGGTGCTGAACAATAGAGTGATCGGCGAGGGCTGGAACAGAGCCATCGGCCTGCACGACCCAACAGCCCATGCCGAAATTATGGCCCTGAGACAGGGCGGCCTGGTCATGCAGAACTACAGACTCTTTGACGCCACCCTGTACGTGACATTC GAGCCTTGCGTGATGTGCGCCGGCGCCATGATCAACTCTAGGATCGGCCGCGTGGTGTTTGGCGTGAGGAACTCAAAAAGAGGCGCCGCAGGCTCCCTGATGAACGTGCTGAACTACCCCGGCATGAATCACCGCGTCGAAATTACCGAGGGAATCCTGGCAGATGAATGTGCCGCCCTGCTGTGCGATTTCTATCGGATGCCTAGACAGGTGTTCAACAGCCAGAAGAAGGCCCAGAGCTCCATCAAC.
2、TadA-CDa-Cas9-UGI表达框的构建2. Construction of TadA-CDa-Cas9-UGI expression cassette
pTSWA是TadA-Dual-Cas9-UGI表达框的骨架载体,其作用是提供TadA-Dual-Cas9-UGI表达框与T-DNA骨架载体和sgRNA表达框相连的overhang序列,其5'端连接T-DNA骨架载体pTX1500的overhang序列为GATC,3'端连接sgRNA表达框的overhang序列为GGAC。用pTSWA作为入门载体,ZmUbi1启动子、TadA-CDa编码序列、Cas9蛋白基因、UGI编码序列和AtHsp终止子通过BsaI酶和T4 DNA连接酶介导的Golden Gate依次克隆到pTSWA上,完成TadA-CDa-Cas9-UGI表达框的组装。Golden Gate反应体系:T4 DNA连接酶缓冲液,2μL;T4 DNA连接酶,1μL;BsaI酶,1μL;pTSWA,0.5μL;ZmUbi1启动子,1μL;TadA-CDa表达元件,3μL;nCas9表达元件,2.5μL;UGI表达元件,2μL;AtHsp终止子,1μL;ddH2O,6μL。Golden Gate反应程序:(37℃,5min;16℃,10min)×15个循环;37℃,10min;65℃,10min;12℃,10min。连接反应完成后,取10μL Golden Gate连接产物转化大肠杆菌DH5α感受态细胞,涂布在含50mg/L Spec抗生素的LB固体培养基上,37℃培养16小时。挑取平板上单克隆菌落至含50μL灭菌去离子水中稀释混匀,取5μL菌液作为模板,oCS433(5'-GCTCACATGTTCTTTCCTGCG-3')和TX412(5'-CCCTAAACCCTAAATGGATGTACTA-3')为引物进行菌落PCR,其阳性扩增产物长度为533bp。进一步酶切和测序验证载体的序列,TadA-CDa-nCas9-UGI表达框构建完成。作为对照的A3A_Y130F-CBE和ABE8e中分别用A3A_Y130F脱氨酶序列和TadA-8e脱氨酶序列替代TadA-CDa编码序列,后续的载体构建步骤与TadCBEa的步骤一致。pTSWA is the backbone vector of the TadA-Dual-Cas9-UGI expression cassette. Its function is to provide the overhang sequence that connects the TadA-Dual-Cas9-UGI expression cassette to the T-DNA backbone vector and the sgRNA expression cassette. The overhang sequence of the 5' end connected to the T-DNA backbone vector pTX1500 is GATC, and the overhang sequence of the 3' end connected to the sgRNA expression cassette is GGAC. Using pTSWA as the entry vector, the ZmUbi1 promoter, TadA-CDa coding sequence, Cas9 protein gene, UGI coding sequence and AtHsp terminator are cloned into pTSWA in sequence through Golden Gate mediated by BsaI enzyme and T4 DNA ligase to complete the assembly of the TadA-CDa-Cas9-UGI expression cassette. Golden Gate reaction system: T4 DNA ligase buffer, 2 μL; T4 DNA ligase, 1 μL; BsaI enzyme, 1 μL; pTSWA, 0.5 μL; ZmUbi1 promoter, 1 μL; TadA-CDa expression element, 3 μL; nCas9 expression element, 2.5 μL; UGI expression element, 2 μL; AtHsp terminator, 1 μL; ddH 2 O, 6 μL. Golden Gate reaction program: (37°C, 5 min; 16°C, 10 min) × 15 cycles; 37°C, 10 min; 65°C, 10 min; 12°C, 10 min. After the ligation reaction is completed, take 10 μL of Golden Gate ligation product to transform Escherichia coli DH5α competent cells, spread on LB solid medium containing 50 mg/L Spec antibiotics, and culture at 37°C for 16 hours. Pick a monoclonal colony on the plate and dilute it in 50 μL sterile deionized water, take 5 μL of bacterial solution as a template, and use oCS433 (5'-GCTCACATGTTCTTTCCTGCG-3') and TX412 (5'-CCCTAAACCCTAAATGGATGTACTA-3') as primers for colony PCR. The length of the positive amplification product is 533 bp. The sequence of the vector was further verified by restriction digestion and sequencing, and the TadA-CDa-nCas9-UGI expression frame was constructed. As a control, the TadA-CDa coding sequence was replaced by the A3A_Y130F deaminase sequence and the TadA-8e deaminase sequence in A3A_Y130F-CBE and ABE8e, respectively. The subsequent vector construction steps are consistent with those of TadCBEa.
3、sgRNA表达框的构建3. Construction of sgRNA expression cassette
为了全面检测TadCBEa碱基编辑器在水稻中的编辑活性,针对水稻内源基因设计了20个靶向sgRNA位点,每10个sgRNA组成一个gRNA阵列,位点序列如表1所示,构建sgRNA表达框所用引物如表2所示。下面以构建gRNA阵列01为例讲述sgRNA表达框的构建。In order to comprehensively detect the editing activity of TadCBEa base editor in rice, 20 targeting sgRNA sites were designed for rice endogenous genes. Every 10 sgRNAs formed a gRNA array. The site sequences are shown in Table 1, and the primers used to construct the sgRNA expression frame are shown in Table 2. The construction of the sgRNA expression frame is described below using the construction of gRNA array 01 as an example.
表1 sgRNA表达框中的靶向sgRNA序列及用途Table 1 Targeting sgRNA sequences and uses in sgRNA expression frames
表2构建sgRNA表达框所用引物及用途Table 2 Primers used to construct sgRNA expression cassettes and their uses
pTX1290依次包含5'端连接TadA-Dual-Cas9-UGI表达框的overhang序列GGAC、OsUbi1启动子序列、tRNA序列、lacZα序列、sgRNA scaffold序列、tRNA序列、pinII终止子序列和3'端连接pMOD_C0000a载体的overhang序列CCGG。pMOD_C0000a行使sgRNA表达框与T-DNA骨架载体pTX1500之间的接头作用,其5'端是连接sgRNA表达框的overhang序列CCGG,3'端是连接T-DNA骨架载体pTX1500的overhang序列AGTG。在gRNA阵列组装到sgRNA表达框的过程中,含有sgRNA-sgRNA scaffold-tRNA序列的PCR扩增产物通过Golden Gate连接反应替换其中的lacZα序列,多个sgRNA-sgRNA scaffold-tRNA序列串联在一起,组成靶向多个sgRNA的sgRNA表达框。pTX1290 sequentially contains the overhang sequence GGAC connected to the TadA-Dual-Cas9-UGI expression frame at the 5' end, the OsUbi1 promoter sequence, the tRNA sequence, the lacZα sequence, the sgRNA scaffold sequence, the tRNA sequence, the pinII terminator sequence, and the overhang sequence CCGG connected to the pMOD_C0000a vector at the 3' end. pMOD_C0000a acts as a linker between the sgRNA expression frame and the T-DNA backbone vector pTX1500. Its 5' end is the overhang sequence CCGG connected to the sgRNA expression frame, and its 3' end is the overhang sequence AGTG connected to the T-DNA backbone vector pTX1500. During the assembly of the gRNA array into the sgRNA expression frame, the PCR amplification product containing the sgRNA-sgRNA scaffold-tRNA sequence replaces the lacZα sequence therein through the Golden Gate ligation reaction, and multiple sgRNA-sgRNA scaffold-tRNA sequences are connected in series to form an sgRNA expression frame targeting multiple sgRNAs.
首先进行sgRNA表达框中间载体的构建。以pTX1290作为PCR模版,引物FT188-F1和FT188-R1扩增得到FT252-frag01,引物FT188-F2和FT188-R2扩增得到FT252-frag02,引物FT188-F3和FT188-R3扩增得到FT 252-frag03,引物FT188-F4和FT188-R4扩增得到FT252-frag04;扩增产物经琼脂糖凝胶电泳进行纯化后回收,回收的扩增产物和TA Blunt Vector进行Golden Gate连接反应,Golden Gate反应体系:T4 DNA连接酶缓冲液,2μL;T4 DNA连接酶,1μL;BpiI酶,1μL;Bsp1407I酶,1μL;TA Blunt Vector,1μL;FT252-fra g,各1μL,共4μL;ddH2O,10μL。Golden Gate反应程序:(37℃,5min;16℃,10min)×15个循环;37℃,10min;65℃,10min;12℃,10min。连接反应完成后,取10μL Golden Gate连接产物转化大肠杆菌DH5α感受态细胞,涂布在含100mg/L Amp抗生素的LB固体培养基上,37℃培养16小时。挑取平板上单克隆菌落至含50μL灭菌去离子水中稀释混匀,取5μL菌液作为模板,M13-F(5'-TGTAAAACGACGGCCAGT-3')和M13-R(5'-CAGGAAACAGCTATGAC-3')为引物进行菌落PCR,其阳性扩增产物长度为1515bp。进一步酶切和测序验证载体的序列,sgRNA阵列01的中间载体01构建完成。以此类推,使用引物FT189构建中间载体02,引物FT190构建中间载体03。First, the sgRNA expression frame intermediate vector is constructed. Using pTX1290 as PCR template, primers FT188-F1 and FT188-R1 were used to amplify FT252-frag01, primers FT188-F2 and FT188-R2 were used to amplify FT252-frag02, primers FT188-F3 and FT188-R3 were used to amplify FT 252-frag03, and primers FT188-F4 and FT188-R4 were used to amplify FT252-frag04. The amplified products were purified by agarose gel electrophoresis and then recovered. The recovered amplified products and TA Blunt Vector were subjected to Golden Gate ligation reaction. The Golden Gate reaction system included: T4 DNA ligase buffer, 2 μL; T4 DNA ligase, 1 μL; BpiI enzyme, 1 μL; Bsp1407I enzyme, 1 μL; TA Blunt Vector, 1 μL; FT252-fra g, 1 μL each, a total of 4 μL; ddH 2 O, 10 μL. Golden Gate reaction program: (37℃, 5min; 16℃, 10min) × 15 cycles; 37℃, 10min; 65℃, 10min; 12℃, 10min. After the ligation reaction was completed, 10μL of Golden Gate ligation product was transformed into E. coli DH5α competent cells, spread on LB solid medium containing 100mg/L Amp antibiotics, and cultured at 37℃ for 16 hours. Pick the monoclonal colony on the plate and dilute it to 50μL sterile deionized water, take 5μL of bacterial solution as a template, and use M13-F (5'-TGTAAAACGACGGCCAGT-3') and M13-R (5'-CAGGAAACAGCTATGAC-3') as primers for colony PCR. The length of the positive amplification product was 1515bp. The sequence of the vector was further verified by restriction digestion and sequencing, and the construction of the intermediate vector 01 of sgRNA array 01 was completed. Similarly, primer FT189 was used to construct intermediate vector 02, and primer FT190 was used to construct intermediate vector 03.
接着进行sgRNA表达框的构建。以pTX1290作为入门载体,通过Golden Gate依次连接中间载体01,02和03。Golden Gate反应体系:T4 DNA连接酶缓冲液,2μL;T4 DNA连接酶,1μL;BsaI酶,1μL;pTX1290,1μL;中间载体,各2μL,共6μL;ddH2O,9μL。Golden Gate反应程序:(37℃,5min;16℃,10min)×15个循环;37℃,10min;65℃,10min;12℃,10min。连接反应完成后,取10μL Golden Gate连接产物转化大肠杆菌DH5α感受态细胞,涂布在含50mg/L Spec抗生素的LB固体培养基上,37℃培养16小时。挑取平板上单克隆菌落至含50μL灭菌去离子水中稀释混匀,取5μL菌液作为模板,TX421(5'-TAATTTGATTGACTGATTTCTGCTGTA-3')和oCS1185(5'-TGCATTACAGCTTACGAACC GAAC-3')为引物进行菌落PCR,其阳性扩增产物长度为2966bp。进一步酶切和测序验证载体的序列,sgRNA阵列01的sgRNA表达框构建完成。sgRNA阵列01是在pTX1290基础上连接了10个(sgRNA序列-sgRNA scaffold序列-tRNA序列),依次包括OsUbi1启动子序列、tRNA序列、(sgRNA序列-sgRNA scaffold序列-tRNA序列)×10和pinII终止子序列。Then the sgRNA expression frame was constructed. Using pTX1290 as the entry vector, the intermediate vectors 01, 02 and 03 were connected in sequence through Golden Gate. Golden Gate reaction system: T4 DNA ligase buffer, 2μL; T4 DNA ligase, 1μL; BsaI enzyme, 1μL; pTX1290, 1μL; intermediate vector, 2μL each, 6μL in total; ddH 2 O, 9μL. Golden Gate reaction program: (37℃, 5min; 16℃, 10min)×15 cycles; 37℃, 10min; 65℃, 10min; 12℃, 10min. After the ligation reaction is completed, take 10μL of Golden Gate ligation product to transform Escherichia coli DH5α competent cells, spread on LB solid medium containing 50mg/L Spec antibiotics, and culture at 37℃ for 16 hours. Pick the monoclonal colony on the plate and dilute it in 50 μL sterile deionized water, take 5 μL of bacterial solution as a template, and use TX421 (5'-TAATTTGATTGACTGATTTCTGCTGTA-3') and oCS1185 (5'-TGCATTACAGCTTACGAACC GAAC-3') as primers for colony PCR. The length of the positive amplification product is 2966 bp. The sequence of the vector was further verified by restriction digestion and sequencing, and the sgRNA expression frame of sgRNA array 01 was constructed. sgRNA array 01 is based on pTX1290 and 10 (sgRNA sequence-sgRNA scaffold sequence-tRNA sequence) are connected, including OsUbi1 promoter sequence, tRNA sequence, (sgRNA sequence-sgRNA scaffold sequence-tRNA sequence) × 10 and pinII terminator sequence.
最后按照sgRNA阵列01的构建方法进行sgRNA阵列02的载体构建。Finally, the vector of sgRNA array 02 was constructed according to the construction method of sgRNA array 01.
4、TadCBEa碱基编辑系统的表达载体构建4. Construction of expression vector for TadCBEa base editing system
T-DNA骨架载体pTX1500依次包含多A终止信号、潮霉素抗性基因、35S启动子、连接TadA-Dual-Cas9-UGI表达框的overhang序列GATC、CcdB致死基因和连接pMOD_C0000a的overhang序列AGTG。用pTX1500作为入门载体,通过AarI酶和T4 DNA连接酶介导的GoldenGate反应,将TadA-CDa-Cas9-UGI表达框和sgRNA表达框依次克隆到pTX1500入门载体上,Golden Gate反应体系:T4DNA连接酶缓冲液,2μL;T4 DNA连接酶,1μL;AarI酶,0.5μL;50×Oligo,0.4μL;pTX1500,0.5μL;TadA-CDa-nCas9-UGI表达框,1μL;sgRNA表达框,1μL;pMOD_C0000a,0.5μL;ddH2O,13.1μL。Golden Gate反应程序:(37℃,5min;16℃,10min)×15个循环;37℃,10min;65℃,10min;12℃,10min。连接反应完成后,取10μL Golden Gate连接产物转化大肠杆菌DH5α感受态细胞,涂布在含50mg/L Kan抗生素的LB固体培养基上,37℃培养16小时。挑取平板上单克隆菌落至含50μL灭菌去离子水中稀释混匀,取5μL菌液作为模板,TX421(5'-TAATTTGATTGACTGATTTCTGCTGTA-3')和ZY065-RB(5'-TTCTAATAAACGCTCTTTTCTCT-3')为引物进行菌落PCR,其阳性扩增产物长度为1549bp。构建完成的表达载体通过ScaI酶切鉴定和Sanger测序验证载体序列的准确性,TadCBEa表达载体构建完成(图1和Seq ID No.2)。The T-DNA backbone vector pTX1500 sequentially contains a poly-A termination signal, a hygromycin resistance gene, a 35S promoter, an overhang sequence GATC connected to the TadA-Dual-Cas9-UGI expression frame, a CcdB lethal gene, and an overhang sequence AGTG connected to pMOD_C0000a. Using pTX1500 as the entry vector, the TadA-CDa-Cas9-UGI expression cassette and the sgRNA expression cassette were cloned into the pTX1500 entry vector in sequence through the GoldenGate reaction mediated by AarI enzyme and T4 DNA ligase. The Golden Gate reaction system was as follows: T4 DNA ligase buffer, 2 μL; T4 DNA ligase, 1 μL; AarI enzyme, 0.5 μL; 50×Oligo, 0.4 μL; pTX1500, 0.5 μL; TadA-CDa-nCas9-UGI expression cassette, 1 μL; sgRNA expression cassette, 1 μL; pMOD_C0000a, 0.5 μL; ddH 2 O, 13.1 μL. The Golden Gate reaction program was as follows: (37°C, 5 min; 16°C, 10 min) × 15 cycles; 37°C, 10 min; 65°C, 10 min; 12°C, 10 min. After the ligation reaction was completed, 10 μL of Golden Gate ligation product was transformed into E. coli DH5α competent cells, spread on LB solid medium containing 50 mg/L Kan antibiotics, and cultured at 37°C for 16 hours. Pick the monoclonal colony on the plate and dilute and mix it with 50 μL sterile deionized water. Take 5 μL of bacterial solution as a template, and use TX421 (5'-TAATTTGATTGACTGATTTCTGCTGTA-3') and ZY065-RB (5'-TTCTAATAAACGCTCTTTTCTCT-3') as primers for colony PCR. The length of the positive amplification product was 1549 bp. The constructed expression vector was verified by ScaI restriction enzyme digestion and Sanger sequencing to verify the accuracy of the vector sequence, and the TadCBEa expression vector was constructed (Figure 1 and Seq ID No. 2).
Seq ID No.2TadA-CDa-nCas9-2×UGI骨架载体的核苷酸序列。其中1~1998是玉米Ubi1启动子;2044~2064是NLS序列;2065~2562是TadA-CDa脱氨酶序列;2563~2654是32AA的连接序列;2665~2685是NLS序列;2710~6810是nCas9编码序列;6811~6858是NLS序列;6895~7422是2×UGI序列;7500~7749是拟南芥HSP终止子;7776~9306是水稻Ubi1启动子;9315~9391是Gly-tRNA序列;9399~9806是LacZα编码序列;9958~10053是sgRNAscaffold序列;10054~10130是Gly-tRNA序列;10141~10449是pinII终止子;Seq ID No.2Nucleotide sequence of TadA-CDa-nCas9-2×UGI backbone vector. Among them, 1-1998 is the maize Ubi1 promoter; 2044-2064 is the NLS sequence; 2065-2562 is the TadA-CDa deaminase sequence; 2563-2654 is the 32AA linker sequence; 2665-2685 is the NLS sequence; 2710-6810 is the nCas9 coding sequence; 6811-6858 is the NLS sequence; 6895-7422 is the 2×UGI sequence; 7500-7749 is the Arabidopsis HSP terminator; 7776-9306 is the rice Ubi1 promoter; 9315-9391 is the Gly-tRNA sequence; 9399-9806 is the LacZα coding sequence; 9958-10053 is the sgRNA scaffold sequence; 10054-10130 is the Gly-tRNA sequence; 10141-10449 is the pinII terminator;
CTGCAGTGCAGCGTGACCCGGTCGTGCCCCTCTCTTGAGATAAGGTGAGCATTGCATGTCTAAGTTATAAAAAATTACCACATATTTTTTTTGTCACACTTGTTTGAAGTGCAGTTTATCTATCTTTATACATATATTTAAACTTTACTCTACGAATAATATAATCTATAGTACTACAATAATATCAGTGTTTTAGAGAATCATATAAATGAACAGTTAGACATGGTCTAAAGGACAATTGAGTATTTTGACAACAGGACTCTACAGTTTTATCTTTTTAGTGTGCATGTGTTCTCCTTTTTTTTTGCAAATAGCTTCACCTATATAATACTTCATCCATTTTATTAGTACATCCATTTAGGGTTTAGGGTTAATGGTTTTTATAGACTAATTTTTTTAGTACATCTATTTTATTCTATTTTAGCCTCTAAATTAAGAAAACTAAAACTCTATTTTAGTTTTTTTATTTAATAATTTAGATATAAAATAGAATAAAATAAAGTGACTAAAAATTAAACAAATACCCTTTAAGAAATTAAAAAAACTAAGGAAACATTTTTCTTGTTTCGAGTAGATAATGCCAGCCTGTTAAACGCCGTCGACGAGTCTAACGGACACCAACCAGCGAACCAGCAGCGTCGCGTCGGGCCAAGCGAAGCAGACGGCACGGCATCTCTGTCGCTGCCTCTGGACCCCTCTCGAGAGTTCCGCTCCACCGTTGGACTTGCTCCGCTGTCGGCATCCAGAAATTGCGTGGCGGAGCGGCAGACGTGAGCCGGCACGGCAGGCGGCCTCCTCCTCCTCTCACGGCACCGGCAGCTACGGGGGATTCCTTTCCCACCGCTCCTTCGCTTTCCCTTCCTCGCCCGCCGTAATAAATAGACACCCCCTCCACACCCTCTTTCCCCAACCTCGTGTTGTTCGGAGCGCACACACACACAACCAGAACTCCCCCAAATCCACCCGTCGGCACCTCCGCTTCAAGGTACGCCGCTCGTCCTCCCCCCCCCCCCTCTCTACCTTCTCAAGATCGGCGTTCCGGTCCATGGTTAGGGCCCGGTAGTTCTACTTCTGTTCATGTTTGTGTTAGATCCGTGTTTGTGTTAGATCCGTGCTACTAGCGTTCGTACACGGATGCGACCTGTACGTCAGACACGTTCTGATTGCTAACTTGCCAGTGTTTCTCTTTGGGGAATCCTGGGATGGCTCTAGCCGTTCCGCAGACGGGATCGATTTCATGATTTTTTTTGTTTCGTTGCATAGGGTTTGGTTTGCCCTTTTCCTTTATTTCAATATATGCCGTGCACTTGTTTGTCGGGTCATCTTTTCATGCTTTTTTTTGTCTTGGTTGTGATGATGTGGTCTGGTTGGGCGGTCGTTCAAGATCGGAGTAGAATTAATTCTGTTTCAAACTACCTGGTGGATTTATTAATTTTGGATCTGTATGTGTGTGCCATACATATTCATAGTTACGAATTGAAGATGATGGATGGAAATATCGATCTAGGATAGGTATACATGTTGATGCGGGTTTTACTGATGCATATACAGAGATGCTTTTTGTTCGCTTGGTTGTGATGATGTGGTGTGGTTGGGCGGTCGTTCATTCGTTCAAGATCGGAGTAGAATACTGTTTCAAACTACCTGGTGTATTTATTAATTTTGGAACTGTATGTGTGTGTCATACATCTTCATAGTTACGAGTTTAAGATGGATGGAAATATCGATCTAGGATAGGTATACATGTTGATGTGGGTTTTACTGATGCATATACATGATGGCATATGCAGCATCTATTCATATGCTCTAACCTTGAGTACCTATCTATTATAATAAACAAGTATGTTTTATAATTATTTTGATCTTGATATACTTGGATGATGGCATATGCAGCAGCTATATGTGGATTTTTTTAGCCCTGCCTTCATACGCTATTTATTTGCTTGGTACTGTTTCTTTTGTCGATGCTCACCCTGTTGTTTGGTGTTACTTCTGCAGCCTGCAGGAATGAAACGGACAGCCGACGGAAGCGAGTTCGAGTCACCAAAGAAGAAGCGGAAAGTCTCTGAGGTGGAGTTTTCCCACGAGTACTGGATGAGACATGCCCTGACCCTGGCCAAGAGGGCACGGGATGAAGGCGCGGGGCCTGTGGGAGCCGTGCTGGTGCTGAACAATAGAGTGATCGGCGAGGGCTGGAACAGAGCCATCGGCCTGCACGACCCAACAGCCCATGCCGAAATTATGGCCCTGAGACAGGGCGGCCTGGTCATGCAGAACTACAGACTCTTTGACGCCACCCTGTACGTGACATTCGAGCCTTGCGTGATGTGCGCCGGCGCCATGATCAACTCTAGGATCGGCCGCGTGGTGTTTGGCGTGAGGAACTCAAAAAGAGGCGCCGCAGGCTCCCTGATGAACGTGCTGAACTACCCCGGCATGAATCACCGCGTCGAAATTACCGAGGGAATCCTGGCAGATGAATGTGCCGCCCTGCTGTGCGATTTCTATCGGATGCCTAGACAGGTGTTCAACAGCCAGAAGAAGGCCCAGAGCTCCATCAACTCCGGAGGATCTAGCGGAGGCTCCTCTGGCTCTGAGACACCTGGCACAAGCGAGAGCGCAACACCTGAAAGCAGCGGGGGCAGCAGCGGGGGTTCGATGGCTCCGAAGAAGAAGAGGAAGGTTGGCATCCACGGGGTGCCAGCTGCTGACAAGAAGTACTCGATCGGCCTCGCTATTGGGACTAACTCTGTTGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCTCAAAGAAGTTCAAGGTCCTGGGCAACACCGATCGGCATTCCATCAAGAAGAATCTCATTGGCGCTCTCCTGTTCGACAGCGGCGAGACGGCTGAGGCTACGCGGCTCAAGCGCACCGCCCGCAGGCGGTACACGCGCAGGAAGAATCGCATCTGCTACCTGCAGGAGATTTTCTCCAACGAGATGGCGAAGGTTGACGATTCTTTCTTCCACAGGCTGGAGGAGTCATTCCTCGTGGAGGAGGATAAGAAGCACGAGCGGCATCCAATCTTCGGCAACATTGTCGACGAGGTTGCCTACCACGAGAAGTACCCTACGATCTACCATCTGCGGAAGAAGCTCGTGGACTCCACAGATAAGGCGGACCTCCGCCTGATCTACCTCGCTCTGGCCCACATGATTAAGTTCAGGGGCCATTTCCTGATCGAGGGGGATCTCAACCCGGACAATAGCGATGTTGACAAGCTGTTCATCCAGCTCGTGCAGACGTACAACCAGCTCTTCGAGGAGAACCCCATTAATGCGTCAGGCGTCGACGCGAAGGCTATCCTGTCCGCTAGGCTCTCGAAGTCTCGGCGCCTCGAGAACCTGATCGCCCAGCTGCCGGGCGAGAAGAAGAACGGCCTGTTCGGGAATCTCATTGCGCTCAGCCTGGGGCTCACGCCCAACTTCAAGTCGAATTTCGATCTCGCTGAGGACGCCAAGCTGCAGCTCTCCAAGGACACATACGACGATGACCTGGATAACCTCCTGGCCCAGATCGGCGATCAGTACGCGGACCTGTTCCTCGCTGCCAAGAATCTGTCGGACGCCATCCTCCTGTCTGATATTCTCAGGGTGAACACCGAGATTACGAAGGCTCCGCTCTCAGCCTCCATGATCAAGCGCTACGACGAGCACCATCAGGATCTGACCCTCCTGAAGGCGCTGGTCAGGCAGCAGCTCCCCGAGAAGTACAAGGAGATCTTCTTCGATCAGTCGAAGAACGGCTACGCTGGGTACATTGACGGCGGGGCCTCTCAGGAGGAGTTCTACAAGTTCATCAAGCCGATTCTGGAGAAGATGGACGGCACGGAGGAGCTGCTGGTGAAGCTCAATCGCGAGGACCTCCTGAGGAAGCAGCGGACATTCGATAACGGCAGCATCCCACACCAGATTCATCTCGGGGAGCTGCACGCTATCCTGAGGAGGCAGGAGGACTTCTACCCTTTCCTCAAGGATAACCGCGAGAAGATCGAGAAGATTCTGACTTTCAGGATCCCGTACTACGTCGGCCCACTCGCTAGGGGCAACTCCCGCTTCGCTTGGATGACCCGCAAGTCAGAGGAGACGATCACGCCGTGGAACTTCGAGGAGGTGGTCGACAAGGGCGCTAGCGCTCAGTCGTTCATCGAGAGGATGACGAATTTCGACAAGAACCTGCCAAATGAGAAGGTGCTCCCTAAGCACTCGCTCCTGTACGAGTACTTCACAGTCTACAACGAGCTGACTAAGGTGAAGTATGTGACCGAGGGCATGAGGAAGCCGGCTTTCCTGTCTGGGGAGCAGAAGAAGGCCATCGTGGACCTCCTGTTCAAGACCAACCGGAAGGTCACGGTTAAGCAGCTCAAGGAGGACTACTTCAAGAAGATTGAGTGCTTCGATTCGGTCGAGATCTCTGGCGTTGAGGACCGCTTCAACGCCTCCCTGGGGACCTACCACGATCTCCTGAAGATCATTAAGGATAAGGACTTCCTGGACAACGAGGAGAATGAGGATATCCTCGAGGACATTGTGCTGACACTCACTCTGTTCGAGGACCGGGAGATGATCGAGGAGCGCCTGAAGACTTACGCCCATCTCTTCGATGACAAGGTCATGAAGCAGCTCAAGAGGAGGAGGTACACCGGCTGGGGGAGGCTGAGCAGGAAGCTCATCAACGGCATTCGGGACAAGCAGTCCGGGAAGACGATCCTCGACTTCCTGAAGAGCGATGGCTTCGCGAACCGCAATTTCATGCAGCTGATTCACGATGACAGCCTCACATTCAAGGAGGATATCCAGAAGGCTCAGGTGAGCGGCCAGGGGGACTCGCTGCACGAGCATATCGCGAACCTCGCTGGCTCGCCAGCTATCAAGAAGGGGATTCTGCAGACCGTGAAGGTTGTGGACGAGCTGGTGAAGGTCATGGGCAGGCACAAGCCTGAGAACATCGTCATTGAGATGGCCCGGGAGAATCAGACCACGCAGAAGGGCCAGAAGAACTCACGCGAGAGGATGAAGAGGATCGAGGAGGGCATTAAGGAGCTGGGGTCCCAGATCCTCAAGGAGCACCCGGTGGAGAACACGCAGCTGCAGAATGAGAAGCTCTACCTGTACTACCTCCAGAATGGCCGCGATATGTATGTGGACCAGGAGCTGGATATTAACAGGCTCAGCGATTACGACGTCGATCATATCGTTCCACAGTCATTCCTGAAGGATGACTCCATTGACAACAAGGTCCTCACCAGGTCGGACAAGAACCGGGGCAAGTCTGATAATGTTCCTTCAGAGGAGGTCGTTAAGAAGATGAAGAACTACTGGCGCCAGCTCCTGAATGCCAAGCTGATCACGCAGCGGAAGTTCGATAACCTCACAAAGGCTGAGAGGGGCGGGCTCTCTGAGCTGGACAAGGCGGGCTTCATCAAGAGGCAGCTGGTCGAGACACGGCAGATCACTAAGCACGTTGCGCAGATTCTCGACTCACGGATGAACACTAAGTACGATGAGAATGACAAGCTGATCCGCGAGGTGAAGGTCATCACCCTGAAGTCAAAGCTCGTCTCCGACTTCAGGAAGGATTTCCAGTTCTACAAGGTTCGGGAGATCAACAATTACCACCATGCCCATGACGCGTACCTGAACGCGGTGGTCGGCACAGCTCTGATCAAGAAGTACCCAAAGCTCGAGAGCGAGTTCGTGTACGGGGACTACAAGGTTTACGATGTGAGGAAGATGATCGCCAAGTCGGAGCAGGAGATTGGCAAGGCTACCGCCAAGTACTTCTTCTACTCTAACATTATGAATTTCTTCAAGACAGAGATCACTCTGGCCAATGGCGAGATCCGGAAGCGCCCCCTCATCGAGACGAACGGCGAGACGGGGGAGATCGTGTGGGACAAGGGCAGGGATTTCGCGACCGTCAGGAAGGTTCTCTCCATGCCACAAGTGAATATCGTCAAGAAGACAGAGGTCCAGACTGGCGGGTTCTCTAAGGAGTCAATTCTGCCTAAGCGGAACAGCGACAAGCTCATCGCCCGCAAGAAGGACTGGGATCCGAAGAAGTACGGCGGGTTCGACAGCCCCACTGTGGCCTACTCGGTCCTGGTTGTGGCGAAGGTTGAGAAGGGCAAGTCCAAGAAGCTCAAGAGCGTGAAGGAGCTGCTGGGGATCACGATTATGGAGCGCTCCAGCTTCGAGAAGAACCCGATCGATTTCCTGGAGGCGAAGGGCTACAAGGAGGTGAAGAAGGACCTGATCATTAAGCTCCCCAAGTACTCACTCTTCGAGCTGGAGAACGGCAGGAAGCGGATGCTGGCTTCCGCTGGCGAGCTGCAGAAGGGGAACGAGCTGGCTCTGCCGTCCAAGTATGTGAACTTCCTCTACCTGGCCTCCCACTACGAGAAGCTCAAGGGCAGCCCCGAGGACAACGAGCAGAAGCAGCTGTTCGTCGAGCAGCACAAGCATTACCTCGACGAGATCATTGAGCAGATTTCCGAGTTCTCCAAGCGCGTGATCCTGGCCGACGCGAATCTGGATAAGGTCCTCTCCGCGTACAACAAGCACCGCGACAAGCCAATCAGGGAGCAGGCTGAGAATATCATTCATCTCTTCACCCTGACGAACCTCGGCGCCCCTGCTGCTTTCAAGTACTTCGACACAACTATCGATCGCAAGAGGTACACAAGCACTAAGGAGGTCCTGGACGCGACCCTCATCCACCAGTCGATTACCGGCCTCTACGAGACGCGCATCGACCTGTCTCAGCTCGGGGGCGACAAGCGGCCAGCGGCGACGAAGAAGGCGGGGCAGGCGAAGAAGAAGAAGGCAGGATCTGGTGGCTCTGGCGGCTCCGGAGGAAGCACTAATCTCTCAGATATAATCGAGAAGGAAACTGGGAAACAGCTGGTTATACAAGAGAGCATTCTCATGCTTCCGGAGGAGGTGGAGGAGGTCATCGGCAACAAGCCTGAATCTGACATTCTTGTTCATACCGCGTACGACGAAAGCACGGATGAGAATGTCATGCTTTTGACTTCTGATGCACCTGAGTACAAGCCGTGGGCGTTGGTCATCCAGGACTCTAACGGTGAGAACAAAATCAAAATGCTATCCGGCGGGTCGGGGGGGTCAGGCGGTTCGACGAACCTGAGTGATATTATTGAGAAAGAGACTGGGAAGCAGTTGGTCATCCAAGAGAGCATCCTGATGCTCCCTGAAGAAGTGGAGGAGGTGATCGGCAACAAGCCCGAATCGGACATTCTAGTGCATACTGCTTATGACGAGTCTACCGACGAGAATGTGATGTTATTGACCTCCGATGCTCCAGAATACAAGCCATGGGCGTTAGTAATACAAGATTCAAACGGCGAGAATAAAATCAAGATGCTGTCGGGAGGTTCAAAGCGTACTGCTGATGGTTCAGAGTTTGAGCCCAAAAAGAAAAGAAAAGTGTGAGCTTCTCGAGTATATGAAGATGAAGATGAAATATTTGGTGTGTCAAATAAAAAGCTTGTGTGCTTAAGTTTGTGTTTTTTTCTTGGCTTGTTGTGTTATGAATTTGTGGCTTTTTCTAATATTAAATGAATGTAAGATCACATTATAATGAATAAACAAATGTTTCTATAATCCATTGTGAATGTTTTGTTGGATCTCTTCTGCAGCATATAACTACTGTATGTGCTATGGTATGGACTATGGAATATGATTAAAGATAAGCGCTGAGCTCGGACGGCGCGCCGGAGCACATCAGTCTCTGCACAAAGTGCATCCTGGGCTGCTTCAATTATAAAGCCCCATTCACCACATTTGCTAGATAGTCGAAAAGCACCATCAATATTGAGCTTCAGGTATTTTTGGTTGTGTTGTGGTTGGATTGATTCTAATATATACCAAATCAATATAATTCACTACCAAAATATACCATAGCCATCACAACTTTATTAATTTTGGTAGCTTAAGATGGTATATATAATAACCAATTAACAACTGATTCTAATTTTACTACGGCCCAGTATCTACCAATACAAAACAACGAGTATGTTTTCTTCCGTCGTAATCGTACACAGTACAAAAAAACCTGGCCAGCCTTTCTTGGGCTGGGGCTCTCTTTCGAAAGGTCACAAAACGTACACGGCAGTAACGCCGCTTCGCTGCGTGTTAACGGCCACCAACCCCGCCGTGAGCAAACGGCATCAGCTTTCCACCTCCTCGATATCTCCGCGGCGCCGTCTGGACCCGCCCCCTTTCCGTTCCTTTCTTTCCTTCTCGCGTTTGCGTGGTGGGGACGGACTCCCCAAACCGCCTCTCCCTCTCTTTATTTGTCTATATTCTCACTGGGCCCCACCCACCGCACCCCTGGGCCCACTCACGAGTCCCCCCCTCCCCACCTATAAATACCCCACCCCCTCCTCGCCTCTTCCTCCATCAATCGAATCCCCAAAATCGCAGAGAAAAAAAAATCTCCCCTCGAAGCGAAGCGTCGAATCGCCTTCTCAAGGTATGCGATTTTCTGATCCTCTCCGTTCCTCGCGTTTGATTTGATTTCCCGGCCTGTTCGTGATTGTGAGATGTTGTTGTTAGTCTCCGTTTTGCGATCTGTGGTAGATTTGAACAGGGTTAGATGGGGTTCGCGTGGTATGCTGGATCTGTGATTATGAGCGATGCTGTTCGTGGTCCAAGTATTTATTGGTTCGGATCTAGAAGTAGAACTGTGCTAGGGTTGTGATTTGTTCCGATCTGTTCAATCAGTAGGATTTAGTCTCTGTTTTTCTCGTTGATCCAAGTAGCAGCTTCAGGTATATTTTGCTTAGGTTGTTTTTGATTCAGTCCCTCTAGTTGCATAGATTCTACTCTGTTCATGTTTAATCTAAGGGCTGCGTCTTGTTGATTAGTGATTACATAGCATAGCTTTCAGGATATTTTACTTGCTTATGCCTATCTTATCAACTGTTGCACCTGTAAATTCTAGCCTATGTTAATTAACCTGCCTTATGTGCTCTCGGGATAGTGCTAGTAGTTATTGAATCAGTTTGCCGATGGAATTCTAGTAGTTCATAGACCTGCAGATTATTTTTGTGAACTCGAGCACGGTGCGACTCTCTATTTTGTTAGGTCACTGTTGGTGTTGATAGGTACACTGATGTTATTGTGGTTTAGATCGTGTATCTAACATATTGGAATAATTTGATTGACTGATTTCTGCTGTACTTGCTTGGTATTGTTATAATTTCATGTTCATAGTTGCTGACCATGCTTCGGTAATTGTGTGTGCAGCCTGCAGGAACAAAGCACCAGTGGTCTAGTGGTAGAATAGTACCCTGCCACGGTACAGACCCGGGTTCGATTCCCGGCTGGTGCAGGAGACCTTACAGTTGGACACAGGCCAACTTGTGAAGATTGCAAAACGTGGCGGCGTGACCGCAATGGAGGCAGTGCATGCATCGCGCAATGCACTGACGGGTGCCCCCCTGGAGACGGGCGCCGCTACAGGGCGCGTCCCATTCGCCATTCAGGCTGCGCAACTGTTGGGAAGGGCGATCGGTGCGGGCCTCTTCGCTATTACGCCAGCTGGCGAAAGGGGGATGTGCTGCAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAGTCACGACGTTGTAAAACGACGGCCAGTGAGCGCGCGTAATACGACTCACTATAGGGCGAATTGGGTACCGGGCCCCCCCTCGAGGTCCTCCAGCTTTTGTTCCCTTTAGTGAGGGTTAATTGCGCGCTTGGCGTAATCATGGTCATAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTCACTGCCCGCTTTCCACCGGTGGTCTCAGTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTTGCTGGAAACAGCAAAGTGGCACCGAGTCGGTGCAACAAAGCACCAGTGGTCTAGTGGTAGAATAGTACCCTGCCACGGTACAGACCCGGGTTCGATTCCCGGCTGGTGCAGGTACTCGAGCCTAGACTTGTCCATCTTCTGGATTGGCCAAGTTAATTAATGTATGAAATAAAAGGATGCACACATAGTGACATGCTAATCACTATAATGTGGGCATCAAAGTTGTGTGTTATGTGTAATTACTAATTATCTGAATAAGAGAAAGAGATCATCCATATTTCTTATCCTAAATGAATGTCACGTGTCTTTATAATTCTTTGATGAACCAGATGCATTTTATTAACCAATTCCATATACATATAAATATTAATCATATATAATTAATATCAATTGGGTTAGCAAAACAAATCTAGTCTAGGTGTGTTTTGC。CTGCAGTGCAGCGTGACCCGGTCGTGCCCCTCTCTTGAGATAAGGTGAGCATTGCATGTCTAAGTTATAAAAAATTACCACATATTTTTTTTGTCACACTTGTTTGAAGTGCAGTTTATCTATCTTTATACATATATTTAAACTTTACTCTACGAATAATATAATCTATAGTACTACAATAATATCAGTGTTTTAGAGAATCATATAAATGAACAGTTAGACATGGTCTAAAGGACAATTGAGTATTTTGACAACAGGACTCTACAGTTTTATCTTTTTAGTGTGCATGTGTTCTCCTTTTTTTTTGCAAATAGCTTCACCTATAT AATACTTCATCCATTTTATTAGTACATCCATTTAGGGTTTAGGGTTAATGGTTTTTATAGACTAATTTTTTTAGTACATCTATTTTATTCTATTTTAGCCTCTAAATTAAGAAAACTAAAACTCTATTTTAGTTTTTTTATTTAATAATTTAGATATAAAATAGAATAAAATAAAGTGACTAAAAATTAAACAAATACCCTTTAAGAAATTAAAAAAACTAAGGAAACATTTTTCTTGTTTCGAGTAGATAATGCCAGCCTGTTAAACGCCGTCGACGAGTCTAACGGACACCAACCAGCGAACCAGCAGCGTCGCGTCGGGCCAAG CGAAGCAGACGGCACGGCATCTCTGTCGCTGCCTCTGGACCCCTCTCGAGAGTTCCGCTCCACCGTTGGACTTGCTCCGCTGTCGGCATCCAGAAATTGCGTGGCGGAGCGGCAGACGTGAGCCGGCACGGCAGGCGGCCTCCTCCTCCTCTCACGGCACCGGCAGCTACGGGGGATTCCTTTCCCACCGCTCCTTCGCTTTCCCTTCCTCGCCCGCCGTAATAAATAGACACCCCCTCCACACCCTCTTTCCCCAACCTCGTGTTGTTCGGAGCGCACACACACACAACCAGAACTCCCCCAAATCCACCCGTCGGCACCTCCGC TTCAAGGTACGCCGCTCGTCCTCCCCCCCCCCCCTCTCTACCTTCTCAAGATCGGCGTTCCGGTCCATGGTTAGGGCCCGGTAGTTCTACTTCTGTTCATGTTTGTGTTAGATCCGTGTTTGTGTTAGATCCGTGCTACTAGCGTTCGTACACGGATGCGACCTGTACGTCAGACACGTTCTGATTGCTAACTTGCCAGTGTTTCTCTTTGGGGAATCCTGGGATGGCTCTAGCCGTTCCGCAGACGGGATCGATTTCATGATTTTTTTTGTTTCGTTGCATAGGGTTTGGTTTGCCCTTTTCCTTTATTTCAATATATGCCGTGCA CTTGTTTGTCGGGTCATCTTTTCATGCTTTTTTTTGTCTTGGTTGTGATGATGTGGTCTGGTTGGGCGGTCGTTCAAGATCGGAGTAGAATTAATTCTGTTTCAAACTACCTGGTGGATTTATTAATTTTGGATCTGTATGTGTGTGCCATACATATTCATAGTTACGAATTGAAGATGATGGATGGAAATATCGATCTAGGATAGGTATACATGTTGATGCGGGTTTTACTGATGCATATACAGAGATGCTTTTTGTTCGCTTGGTTGTGATGATGTGGTGTGGTTGGGCGGTCGTTCATTCGTTCAAGATCGGAGTAGAATACT GTTTCAAACTACCTGGTGTATTTATTAATTTTGGAACTGTATGTGTGTGTCATACATCTTCATAGTTACGAGTTTAAGATGGATGGAAATATCGATCTAGGATAGGTATACATGTTGATGTGGGTTTTACTGATGCATATACATGATGGCATATGCAGCATCTATTCATATGCTCTAACCTTGAGTACCTATCTATTAATAAACAAGTATGTTTTATAATTATTTTGATCTTGATATACTTGGATGATGGCATATGCAGCAGCTATATGTGGATTTTTTTAGCCCTGCCTTCATACGCTATTTATTTGCTTGGTACTGTTTCTTT TGTCGATGCTCACCCTGTTGTTTGGTGTTACTTCTGCAGCCTGCAGGAATGAAACGGACAGCCGACGGAAGCGAGTTCGAGTCACCAAAGAAGAAGCGGAAAGTCTCTGAGGTGGAGTTTTCCCACGAGTACTGGATGAGACATGCCCTGACCCTGGCCAAGAGGGCACGGGATGAAGGCGCGGGGCCTGTGGGAGCCGTGCTGGTGCTGAACAATAGAGTGATCGGCGAGGGCTGGAACAGAGCCATCGGCCTGCACGACCCAACAGCCCATGCCGAAATTATGGCCCTGAGACAGGGCGGCCTGGTCATGCAGAACTACAGACT CTTTGACGCCACCCTGTACGTGACATTCGAGCCTTGCGTGATGTGCGCCGGCGCCATGATCAACTCTAGGATCGGCCGCGTGGTGTTTGGCGTGAGGAACTCAAAAAGAGGCGCCGCAGGCTCCCTGATGAACGTGCTGAACTACCCCGGCATGAATCACCGCGTCGAAATTACCGAGGGAATCCTGGCAGATGAATGTGCCGCCCTGCTGTGCGATTTCTATCGGATGCCTAGACAGGTGTTCAACAGCCAGAAGAAGGCCCAGAGCTCCATCAACTCCGGAGGATCTAGCGGAGGCTCCTCTGGCTCTGAGACACCTGGCACAAG CGAGAGCGCAACACCTGAAAGCAGCGGGGGCAGCAGCGGGGGTTCGATGGCTCCGAAGAAGAAGAGGAAGGTTGGCATCCACGGGGTGCCAGCTGCTGACAAGAAGTACTCGATCGGCCTCGCTATTGGGACTAACTCTGTTGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCTCAAAGAAGTTCAAGGTCCTGGGCAACACCGATCGGCATTCCATCAAGAAGAATCTCATTGGCGCTCTCCTGTTCGACAGCGGCGAGACGGCTGAGGCTACGCGGCTCAAGCGCACCGCCCGCAGGCGGTACACGCGCAGGAAGAATC GCATCTGCTACCTGCAGGAGATTTTCTCCAACGAGATGGCGAAGGTTGACGATTCTTTCTTCCACAGGCTGGAGGAGTCATTCCTCGTGGAGGAGGATAAGAAGCACGAGCGGCATCCAATCTTCGGCAACATTGTCGACGAGGTTGCCTACCACGAGAAGTACCCTACGATCTACCATCTGCGGAAGAAGCTCGTGGACTCCACAGATAAGGCGGACCTCCGCCTGATCTACCTCGCTCTGGCCCACATGATTAAGTTCAGGGGCCATTTCCTGATCGAGGGGGATCTCAACCCGGACAATAGCGATGTTGACAAGCTGTTCATCC AGCTCGTGCAGACGTACAACCAGCTCTTCGAGGAGAACCCCATTAATGCGTCAGGCGTCGACGCGAAGGCTATCCTGTCCGCTAGGCTCTCGAAGTCTCGGCGCCTCGAGAACCTGATCGCCCAGCTGCCGGGCGAGAAGAAGAACGGCCTGTTCGGGAATCTCATTGCGCTCAGCCTGGGGCTCACGCCCAACTTCAAGTCGAATTTCGATCTCGCTGAGGACGCCAAGCTGCAGCTCTCCAAGGACACATACGACGATGACCTGGATAACCTCCTGGCCCAGATCGGCGATCAGTACGCGGACCTGTTCCTCGCTGCCAAGAAT CTGTCGGACGCCATCCTCCTGTCTGATATTCTCAGGGTGAACACCGAGATTACGAAGGCTCCGCTCTCAGCCTCCATGATCAAGCGCTACGACGAGCACCATCAGGATCTGACCCTCCTGAAGGCGCTGGTCAGGCAGCAGCTCCCCGAGAAGTACAAGGAGATCTTCTTCGATCAGTCGAAGAACGGCTACGCTGGGTACATTGACGGCGGGGCCTCTCAGGAGGAGTTCTACAAGTTCATCAAGCCGATTCTGGAGAAGATGGACGGCACGGAGGAGCTGCTGGTGAAGCTCAATCGCGAGGACCTCCTGAGGAAGCAGCGGACA TTCGATAACGGCAGCATCCCACACCAGATTCATCTCGGGGAGCTGCACGCTATCCTGAGGAGGCAGGAGGACTTCTACCCTTTCCTCAAGGATAACCGCGAGAAGATCGAGAAGATTCTGACTTTCAGGATCCCGTACTACGTCGGCCCACTCGCTAGGGGCAACTCCCGCTTCGCTTGGATGACCCGCAAGTCAGAGGAGACGATCACGCCGTGGAACTTCGAGGAGGTGGTCGACAAGGGCGCTAGCGCTCAGTCGTTCATCGAGAGGATGACGAATTTCGACAAGAACCTGCCAAATGAGAAGGTGCTCCCTAAGCACTCGCT CCTGTACGAGTACTTCACAGTCTACAACGAGCTGACTAAGGTGAAGTATGTGACCGAGGGCATGAGGAAGCCGGCTTTCCTGTCTGGGGAGCAGAAGAAGGCCATCGTGGACCTCCTGTTCAAGACCAACCGGAAGGTCACGGTTAAGCAGCTCAAGGAGGACTACTTCAAGAAGATTGAGTGCTTCGATTCGGTCGAGATCTCTGGCGTTGAGGACCGCTTCAACGCCTCCCTGGGGACCTACCACGATCTCCTGAAGATCATTAAGGATAAGGACTTCCTGGACAACGAGGAGAATGAGGATATCCTCGAGGACATTGTGCTGAC ACTCACTCTGTTCGAGGACCGGGAGATGATCGAGGAGCGCCTGAAGACTTACGCCCATCTCTTCGATGACAAGGTCATGAAGCAGCTCAAGAGGAGGAGGTACACCGGCTGGGGGAGGCTGAGCAGGAAGCTCATCAACGGCATTCGGGACAAGCAGTCCGGGAAGACGATCCTCGACTTCCTGAAGAGCGATGGCTTCGCGAACCGCAATTTCATGCAGCTGATTCACGATGACAGCCTCACATTCAAGGAGGATATCCAGAAGGCTCAGGTGAGCGGCCAGGGGGACTCGCTGCACGAGCATATCGCGAACCTCGCTGGCTCGCC AGCTATCAAGAAGGGGATTCTGCAGACCGTGAAGGTTGTGGACGAGCTGGTGAAGGTCATGGGCAGGCACAAGCCTGAGAACATCGTCATTGAGATGGCCCGGGAGAATCAGACCACGCAGAAGGGCCAGAAGAACTCACGCGAGAGGATGAAGAGGATCGAGGAGGGCATTAAGGAGCTGGGGTCCCAGATCCTCAAGGAGCACCCGGTGGAGAACACGCAGCTGCAGAATGAGAAGCTCTACCTGTACTACCTCCAGAATGGCCGCGATATGTATGTGGACCAGGAGCTGGATATTAACAGGCTCAGCGATTACGACGTCGATCA TATCGTTCCACAGTCATTCCTGAAGGATGACTCCATTGACAACAAGGTCCTCACCAGGTCGGACAAGAACCGGGGCAAGTCTGATAATGTTCCTTCAGAGGAGGTCGTTAAGAAGATGAAGAACTACTGGCGCCAGCTCCTGAATGCCAAGCTGATCACGCAGCGGAAGTTCGATAACCTCACAAAGGCTGAGAGGGGCGGGCTCTCTGAGCTGGACAAGGCGGGCTTCATCAAGAGGCAGCTGGTCGAGACACGGCAGATCACTAAGCACGTTGCGCAGATTCTCGACTCACGGATGAACACTAAGTACGATGAGAATGACAAGC TGATCCGCGAGGTGAAGGTCATCACCCTGAAGTCAAAGCTCGTCTCCGACTTCAGGAAGGATTTCCAGTTCTACAAGGTTCGGGAGATCAACAATTACCACCATGCCCATGACGCGTACCTGAACGCGGTGGTCGGCACAGCTCTGATCAAGAAGTACCCAAAGCTCGAGAGCGAGTTCGTGTACGGGGACTACAAGGTTTACGATGTGAGGAAGATGATCGCCAAGTCGGAGCAGGAGATTGGCAAGGCTACCGCCAAGTACTTCTTCTACTCTAACATTATGAATTTCTTCAAGACAGAGATCACTCTGGCCAATGGCGAGATCC GGAAGCGCCCCCTCATCGAGACGAACGGCGAGACGGGGGAGATCGTGTGGGACAAGGGCAGGGATTTCGCGACCGTCAGGAAGGTTCTCTCCATGCCACAAGTGAATATCGTCAAGAAGACAGAGGTCCAGACTGGCGGGTTCTCTAAGGAGTCAATTCTGCCTAAGCGGAACAGCGACAAGCTCATCGCCCGCAAGAAGGACTGGGATCCGAAGAAGTACGGCGGGTTCGACAGCCCCACTGTGGCCTACTCGGTCCTGGTTGTGGCGAAGGTTGAGAAGGGCAAGTCCAAGAAGCTCAAGAGCGTGAAGGAGCTGCTGGGGATC ACGATTATGGAGCGCTCCAGCTTCGAGAAGAACCCGATCGATTTCCTGGAGGCGAAGGGCTACAAGGAGGTGAAGAAGGACCTGATCATTAAGCTCCCCAAGTACTCACTCTTCGAGCTGGAGAACGGCAGGAAGCGGATGCTGGCTTCCGCTGGCGAGCTGCAGAAGGGGAACGAGCTGGCTCTGCCGTCCAAGTATGTGAACTTCCTCTACCTGGCCTCCCACTACGAGAAGCTCAAGGGCAGCCCCGAGGACAACGAGCAGAAGCAGCTGTTCGTCGAGCAGCACAAGCATTACCTCGACGAGATCATTGAGCAGATTTCCGAG TTCTCCAAGCGCGTGATCCTGGCCGACGCGAATCTGGATAAGGTCCTCTCCGCGTACAACAAGCACCGCGACAAGCCAATCAGGGAGCAGGCTGAGAATATCATTCATCTCTTCACCCTGACGAACCTCGGCGCCCCTGCTGCTTTCAAGTACTTCGACACAACTATCGATCGCAAGAGGTACACAAGCACTAAGGAGGTCCTGGACGCGACCCTCATCCACCAGTCGATTACCGGCCTCTACGAGACGCGCATCGACCTGTCTCAGCTCGGGGGCGACAAGCGGCCAGCGGCGACGAAGAAGGCGGGGCAGGCGAAGAAGAAGAA GGCAGGATCTGGTGGCTCTGGCGGCTCCGGAGGAAGCACTAATCTCTCAGATATAATCGAGAAGGAAACTGGGAAACAGCTGGTTATACAAGAGAGCATTCTCATGCTTCCGGAGGAGGTGGAGGAGGTCATCGGCAACAAGCCTGAATCTGACATTCTTGTTCATACCGCGTACGACGAAAGCACGGATGAGAATGTCATGCTTTTGACTTCTGATGCACCTGAGTACAAGCCGTGGGCGTTGGTCATCCAGGACTCTAACGGTGAGAACAAAATCAAAATGCTATCCGGCGGGTCGGGGGGGTCAGGCGGTTCGACGAACCTGAG TGATATTATTGAGAAAGAGACTGGGAAGCAGTTGGTCATCCAAGAGAGCATCCTGATGCTCCCTGAAGAAGTGGAGGAGGTGATCGGCAACAAGCCCGAATCGGACATTCTAGTGCATACTGCTTATGACGAGTCTACCGACGAGAATGTGATGTTATTGACCTCCGATGCTCCAGAATACAAGCCATGGGCGTTAGTAATACAAGATTCAAACGGCGAGAATAAAATCAAGATGCTGTCGGGAGGTTCAAAGCGTACTGCTGATGGTTCAGAGTTTGAGCCCAAAAAGAAAAGAAAAGTGTGAGCTTCTCGAGTATATGAAGATG AAGATGAAATATTTGGTGTGTCAAATAAAAAGCTTGTGTGCTTAAGTTTGTGTTTTTTTCTTGGCTTGTTGTGTTATGAATTTGTGGCTTTTTCTAATATTAAATGAATGTAAGATCACATTATAATGAATAAACAAATGTTTCTATAATCCATTGTGAATGTTTTGTTGGATCTCTTCTGCAGCATATAACTACTGTATGTGCTATGGTATGGACTATGGAATATGATTAAAGATAAGCGCTGAGCTCGGACGGCGCCGGAGCACATCAGTCTCTGCACAAAGTGCATCCTGGGCTGCTTCAATTATAAAGCCCCATTCACCAC ATTTTGCTAGATAGTCGAAAAGCACCATCAATATTGAGCTTCAGGTATTTTTGGTTGTGTTGTGGTTGGATTGATTCTAATATATACCAAATCAATATAATTCACTACCAAAATATACCATAGCCATCACAACTTTATTAATTTTGGTAGCTTAAGATGGTATATATAATAACCAATTAACAACTGATTCTAATTTTACTACGGCCCAGTATCTACCAATACAAAACAACGAGTATGTTTTCTTCCGTCGTAATCGTACACAGTACAAAAAAACCTGGCCAGCCTTTCTTGGGCTGGGGCTCTCTTTCGAAAGGTCACAAAACGTAC ACGGCAGTAACGCCGCTTCGCTGCGTGTTAACGGCCACCAACCCCGCCGTGAGCAAACGGCATCAGCTTTCCACCTCCTCGATATCTCCGCGGCGCCGTCTGGACCCGCCCCCTTTCCGTTCCTTTCTTTCCTTCTCGCGTTTGCGTGGTGGGGACGGACTCCCCAAACCGCCTCTCCCTCTCTTTATTTGTCTATATTCTCACTGGGCCCCACCCACCGCACCCCTGGGCCCACTCACGAGTCCCCCCCTCCCCACCTATAAATACCCCACCCCCTCCTCGCCTCTTCCTCCATCAATCGAATCCCCAAAATCGCAGAGAAAAAAA AATCTCCCCTCGAAGCGAAGCGTCGAATCGCCTTCTCAAGGTATGCGATTTTCTGATCCTCTCCGTTCCTCGCGTTTGATTTGATTTCCCGGCCTGTTCGTGATTGTGAGATGTTGTTGTTAGTCTCCGTTTTGCGATCTGTGGTAGATTTGAACAGGGTTAGATGGGGTTCGCGTGGTATGCTGGATCTGTGATTATGAGCGATGCTGTTCGTGGTCCAAGTATTTATTGGTTCGGATCTAGAAGTAGAACTGTGCTAGGGTTGTGATTTGTTCCGATCTGTTCAATCAGTAGGATTTAGTCTCTGTTTTTCTCGTTGATCCAAG TAGCAGCTTCAGGTATATTTTGCTTAGGTTGTTTTTGATTCAGTCCCTCTAGTTGCATAGATTCTACTCTGTTCATGTTTAATCTAAGGGCTGCGTCTTGTTGATTAGTGATTACATAGCATAGCTTTCAGGATATTTTACTTGCTTATGCCTATCTTATCAACTGTTGCACCTGTAAATTCTAGCCTATGTTAATTAACCTGCCTTATGTGCTCTCGGGATAGTGCTAGTAGTTATTGAATCAGTTTGCCGATGGAATTCTAGTAGTTCATAGACCTGCAGATTATTTTTGTGAACTCGAGCACGGTGCGACTCTCTATTTTGTTA GGTCACTGTTGGTGTTGATAGGTACACTGATGTTATTGTGGTTTAGATCGTGTATCTAACATATTGGAATAATTTGATTGACTGATTTCTGCTGTACTTGCTTGGTATTGTTATAATTTCATGTTCATAGTTGCTGACCATGCTTCGGTAATTGTGTGTGCAGCCTGCAGGAACAAAGCACCAGTGGTCTAGTGGTAGAATAGTACCCTGCCACGGTACAGACCCGGGTTCGATTCCCGGCTGGTGCAGGAGACCTTACAGTTGGACACAGGCCAACTTGTGAAGATTGCAAAACGTGGCGGCGTGACCGCAATGGAGGCAGTGCA TGCATCGCGCAATGCACTGACGGGTGCCCCCCTGGAGACGGGCGCCGCTACAGGGCGCGTCCCATTCGCCATTCAGGCTGCGCAACTGTTGGGAAGGGCGATCGGTGCGGGCCTCTTCGCTATTACGCCAGCTGGCGAAAGGGGGATGTGCTGCAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAGTCACGACGTTGTAAAACGACGGCCAGTGAGCGCGCGTAATACGACTCACTATAGGGCGAATTGGGTACCGGGCCCCCCCTCGAGGTCCTCCAGCTTTTGTTCCCTTTAGTGAGGGTTAATTGCGCGCTTGGCGTAA TCATGGTCATAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTCACTGCCCGCTTTCCACCGGTGGTCTCAGTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTTGCTGGAAACAGCAAAGTGGCACCGAGTCGGTGCAACAAAGCACCAGTGGTCTAGTGGTAGAATAGTACCCTGCCACGGTACAGACCCGGGTTCGATTCCCGGC TGGTGCAGGTACTCGAGCCTAGACTTGTCCATCTTCTGGATTGGCCAAGTTAATTAATGTATGAAATAAAAGGATGCACACATAGTGACATGCTAATCACTATAATGTGGGCATCAAAGTTGTGTGTTATGTGTAATTACTAATTATCTGAATAAGAGAAAGAGATCATCCATATTTCTTATCCTAAATGAATGTCACGTGTCTTTATAATTCTTTGATGAACCAGATGCATTTTATTAACCAATTCCATATACATATAAATATTAATCATATATAATTAATATCAATTGGGTTAGCAAAACAAATCTAGTCTAGGTGTGTTTTGC.
实施例2基于原生质体瞬时表达的TadCBEa活性检测Example 2 TadCBEa activity detection based on protoplast transient expression
1、初步检测TadCBEa在水稻细胞中的编辑活性1. Preliminary detection of TadCBEa editing activity in rice cells
A3A_Y130F-CBE作为C到T编辑效率的对照,ABE8e作为A到G编辑效率的对照。20个水稻内源位点用于TadCBEa的碱基编辑活性评价,对于每个碱基编辑器,构建了两个多靶点的T-DNA载体,分别靶向水稻基因组中的10个目标位点(表1)。进行了水稻原生质体的瞬时转化,通过PCR扩增子的下一代测序(NGS)评估碱基编辑系统的编辑性能。A3A_Y130F-CBE was used as a control for C to T editing efficiency, and ABE8e was used as a control for A to G editing efficiency. 20 rice endogenous sites were used to evaluate the base editing activity of TadCBEa. For each base editor, two multi-target T-DNA vectors were constructed, targeting 10 target sites in the rice genome (Table 1). Transient transformation of rice protoplasts was performed, and the editing performance of the base editing system was evaluated by next-generation sequencing (NGS) of PCR amplicons.
通过对NGS数据的分析,发现TadCBEa在水稻细胞的所有靶位点处能够实现有效的从C到T碱基编辑(图2A)。在对所有20个靶向位点的全面分析中,TadCBEa的C到T碱基编辑效率为4.5%~55.4%,与A3A_Y130F(9.0%~53.0%)相当(图2B)。TadCBEa在一些靶向位点表现出可检测的A到G编辑(图2A)。此外,TadCBEa碱基编辑器的插入和缺失(Indel)效率都非常低,处于测序误差的范围内(图2B)。综合而言,这些数据表明TadCBEa在实现更高的胞嘧啶碱基编辑效率和纯度方面具有很好的前景。Through analysis of NGS data, it was found that TadCBEa was able to achieve efficient C to T base editing at all target sites in rice cells (Figure 2A). In a comprehensive analysis of all 20 targeted sites, the C to T base editing efficiency of TadCBEa was 4.5% to 55.4%, which was comparable to A3A_Y130F (9.0% to 53.0%) (Figure 2B). TadCBEa exhibited detectable A to G editing at some targeted sites (Figure 2A). In addition, the insertion and deletion (Indel) efficiencies of the TadCBEa base editor were very low and within the range of sequencing errors (Figure 2B). Taken together, these data indicate that TadCBEa has great prospects in achieving higher cytosine base editing efficiency and purity.
2、TadCBEa在水稻细胞中的碱基编辑特性2. Base editing properties of TadCBEa in rice cells
利用NGS数据,进一步分析了20个靶向位点处的编辑特征,与A3A_Y130F的从C到T编辑窗口跨越第3到第16位的情况相比,TadCBEa的C到T编辑窗口较窄,集中在原型间隔序列(protospacer)的第4到第8位,这与ABE8e的从A到G碱基编辑窗口一致(图3)。数据表明,经过工程改造的TadCBEa,尽管碱基编辑偏好发生了改变,但仍保留了TadA-8e的碱基编辑窗口。Using NGS data, we further analyzed the editing features at the 20 targeted sites. Compared with the C to T editing window of A3A_Y130F spanning from position 3 to position 16, the C to T editing window of TadCBEa is narrower and concentrated from position 4 to position 8 of the protospacer, which is consistent with the A to G base editing window of ABE8e (Figure 3). The data show that the engineered TadCBEa, despite the change in base editing preference, still retains the base editing window of TadA-8e.
实施例3基于TadCBEa的植物内源基因定向修饰Example 3 Targeted modification of plant endogenous genes based on TadCBEa
1、靶向多位点的TadCBEa进行水稻稳定转化1. Targeting multiple sites of TadCBEa for stable transformation of rice
以冻融法将TadCBEa-gRNA阵列02载体(图4A)转入根癌农杆菌EHA105,转入不含脱氨酶和Cas蛋白的T-DNA骨架载体和A3A_Y130F-CBE-gRNA阵列02载体分别作为组培对照和效率对照同步进行实验,筛选阳性克隆。通过农杆菌介导的遗传转化方法,将上述载体分别转入水稻愈伤,8周左右获得稳定转化的转基因植株(Hiei等,Efficient transformationof rice(Oryza sativa L.)mediated by Agrobacterium and sequence analysis ofthe boundaries of the T-DNA.1994,Plant Journal)。The TadCBEa-gRNA array 02 vector (Figure 4A) was transferred into Agrobacterium tumefaciens EHA105 by freeze-thaw method, and the T-DNA backbone vector without deaminase and Cas protein and the A3A_Y130F-CBE-gRNA array 02 vector were transferred as tissue culture control and efficiency control respectively to perform experiments simultaneously and screen positive clones. The above vectors were transferred into rice callus by Agrobacterium-mediated genetic transformation method, and stably transformed transgenic plants were obtained in about 8 weeks (Hiei et al., Efficient transformation of rice (Oryza sativa L.) mediated by Agrobacterium and sequence analysis of the boundaries of the T-DNA. 1994, Plant Journal).
2、靶向多位点的TadCBEa在水稻植株中实现稳定编辑2. TadCBEa targeting multiple sites achieves stable editing in rice plants
首先对转基因植株进行Sanger测序以确认目标位点处确实发生了碱基编辑事件,随后进行全基因组测序(WGS)(图4B)。将WGS数据与参考水稻基因组比对后,对植物中的编辑效率进行分析显示,TadCBEa在稳定植株中的C到T碱基编辑效率与A3A_Y130F-CBE的编辑效率不相上下(图4C),与原生质体中的数据一致(图2B)。值得注意的是,TadCBEa在大多数目标位点上都发现了高效的双等位C到T碱基编辑(图4C)。进一步分析编辑植株的靶向位点序列,发现TadCBEa植株的碱基编辑事件全部发生在TadCBEa的编辑窗口内(图4D)。First, the transgenic plants were subjected to Sanger sequencing to confirm that base editing events had indeed occurred at the target site, followed by whole genome sequencing (WGS) (Figure 4B). After aligning the WGS data with the reference rice genome, analysis of the editing efficiency in the plants showed that the C to T base editing efficiency of TadCBEa in stable plants was comparable to that of A3A_Y130F-CBE (Figure 4C), consistent with the data in protoplasts (Figure 2B). It is worth noting that TadCBEa found efficient biallelic C to T base editing at most target sites (Figure 4C). Further analysis of the target site sequences of the edited plants revealed that all base editing events in TadCBEa plants occurred within the editing window of TadCBEa (Figure 4D).
实施例4基于TadCBEa的植物内源基因碱基编辑的特异性分析Example 4 Specificity analysis of plant endogenous gene base editing based on TadCBEa
1、TadCBEa编辑特异性分析实验设计1. Experimental design for TadCBEa editing specificity analysis
通过全基因组测序和转录组测序来分析TadCBEa多靶位点编辑的水稻植株中的脱靶效应。基因编辑工具产生的脱靶效应,一般分为不依赖于sgRNA的脱靶效应和依赖于sgRNA的脱靶效应。在基于WGS和转录组测序的脱靶分析中,我们用不转入脱氨酶和Cas蛋白的T-DNA载体的转基因植株作为组织培养的对照,用A3A_Y130F-CBE作为编辑特异性的对照(图5A)。Whole genome sequencing and transcriptome sequencing were used to analyze the off-target effects in rice plants edited with TadCBEa multiple target sites. The off-target effects produced by gene editing tools are generally divided into off-target effects that are independent of sgRNA and off-target effects that are dependent on sgRNA. In the off-target analysis based on WGS and transcriptome sequencing, we used transgenic plants without T-DNA vectors for deaminase and Cas proteins as tissue culture controls, and used A3A_Y130F-CBE as a control for editing specificity (Figure 5A).
2、基于全基因组测序分析TadCBEa不依赖于sgRNA的脱靶效应2. Analysis of off-target effects of TadCBEa independent of sgRNA based on whole genome sequencing
对WGS数据分析发现,TadCBEa产生了与组织培养对照相似数量的插入缺失(Indel)(约60)和单核苷酸变异(SNV)(约200)(图5B和5C)。对六种SNV类型的进一步分析显示,TadCBEa碱基编辑植株和组培对照植株之间存在相似的趋势(图5D)。与组织培养对照相比,经TadCBEa编辑的植株中C到T SNV并没有显著增加,与A3A_Y130F-CBE相比TadCBEa编辑植株中C到T SNV数量显著降低,说明TadCBEa具有更好的编辑特异性(图5E)。总体而言,TadCBEa和组培对照组共享相似的SNV组成(图5F)。对突变的胞嘧啶邻近序列进行评估并未揭示任何序列偏好(图5G),比如TadA-8e已知的TA基序。这些分析表明,TadCBEa在水稻中没有显著的基因组内不依赖于sgRNA的脱靶效应,WGS发现的每个突变主要是由于组织培养引起的体细胞变异。Analysis of WGS data revealed that TadCBEa produced similar numbers of insertions and deletions (Indels) (approximately 60) and single nucleotide variations (SNVs) (approximately 200) as tissue culture controls (Figures 5B and 5C). Further analysis of the six SNV types showed similar trends between TadCBEa base-edited plants and tissue culture control plants (Figure 5D). There was no significant increase in C to T SNVs in TadCBEa-edited plants compared with tissue culture controls, and the number of C to T SNVs in TadCBEa-edited plants was significantly reduced compared with A3A_Y130F-CBE, indicating that TadCBEa has better editing specificity (Figure 5E). Overall, TadCBEa and tissue culture controls shared similar SNV compositions (Figure 5F). Evaluation of the mutated cytosine-adjacent sequences did not reveal any sequence preference (Figure 5G), such as the known TA motif of TadA-8e. These analyses showed that TadCBEa had no significant genome-wide sgRNA-independent off-target effects in rice and that each mutation discovered by WGS was primarily due to somatic variation arising from tissue culture.
3、基于全基因组测序分析TadCBEa依赖于sgRNA的脱靶效应3. Analysis of off-target effects of TadCBEa dependent on sgRNA based on whole genome sequencing
由于同时针对水稻基因组中的10个内源位点进行碱基编辑,想知道是否其中任何一个sgRNA引起了sgRNA依赖的脱靶突变。了解这些SNV中是否有由于该碱基编辑器的脱靶效应引起的是很重要的。为此,使用Cas-OFFinder(Bae等人,Cas-OFFinder:a fast andversatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases.2014,Bioinformatics)筛选了10个目标位点中20个潜在的错配碱基≤3的脱靶位点。然后,将编辑植株中发现的所有C到T SNV与这些潜在的脱靶位点匹配。分析显示,在20个潜在脱靶位点中,有16个位点在TadCBEa编辑植株中未显示出可检测的编辑事件(表3)。在剩下的4个脱靶位点中,3个脱靶位点的编辑事件发生在与目标位点有一个错配碱基的位置(图6和表3)。表明这些新型碱基编辑器的gRNA依赖脱靶突变仅仅是由于目标位点与脱靶位点之间的高序列相似性导致的,这也是先前基于Cas9在植物中进行WGS分析脱靶效应的特征。Since base editing was performed simultaneously on 10 endogenous sites in the rice genome, we wanted to know whether any of the sgRNAs caused sgRNA-dependent off-target mutations. It is important to understand whether any of these SNVs are caused by off-target effects of the base editor. To this end, Cas-OFFinder (Bae et al., Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. 2014, Bioinformatics) was used to screen 20 potential off-target sites with ≤3 mismatched bases in the 10 target sites. Then, all C to T SNVs found in the edited plants were matched to these potential off-target sites. Analysis showed that 16 of the 20 potential off-target sites did not show detectable editing events in TadCBEa-edited plants (Table 3). Of the remaining 4 off-target sites, editing events at 3 off-target sites occurred at positions with one mismatched base with the target site (Figure 6 and Table 3). This suggests that the gRNA-dependent off-target mutations of these novel base editors are simply caused by the high sequence similarity between the target and off-target sites, which is also a feature of previous off-target effects analyzed by WGS based on Cas9 in plants.
表3潜在的依赖于sgRNA的脱靶位点和发生脱靶的植株比例Table 3 Potential off-target sites dependent on sgRNA and the proportion of plants with off-target
4、基于转录组测序分析TadCBEa在RNA水平的脱靶效应4. Analysis of off-target effects of TadCBEa at the RNA level based on transcriptome sequencing
由于TadA变体在过度表达时可能在植物中引起转录组范围的脱靶突变(Li等,Alarge-scale genome and transcriptome sequencing analysis reveals the mutationlandscapes induced by high-activity adenine base editors in plants.2022,Genome Biology),我们进行了TadCBEa在转录组水平的脱靶效应分析(图5A)。与组织培养对照相比,经TadCBEa编辑植株在总SNV数和C到U SNV数上没有显著变化(图7A和7B),在TadCBEa编辑植株中,不同类型SNV的比例保持不变(图7C)。这表明碱基编辑器在水稻转录组中未引入额外的C到U突变。Since TadA variants may cause transcriptome-wide off-target mutations in plants when overexpressed (Li et al., A large-scale genome and transcriptome sequencing analysis reveals the mutationlandscapes induced by high-activity adenine base editors in plants. 2022, Genome Biology), we performed an off-target effect analysis of TadCBEa at the transcriptome level (Figure 5A). Compared with tissue culture controls, there were no significant changes in the total number of SNVs and the number of C to U SNVs in TadCBEa-edited plants (Figures 7A and 7B), and the proportions of different types of SNVs remained unchanged in TadCBEa-edited plants (Figure 7C). This indicates that the base editor did not introduce additional C to U mutations in the rice transcriptome.
总的来说,TadCBEa在DNA和RNA水平上均未引起显著的脱靶效应,表明TadCBEa在植物中是高度特异的基因组碱基编辑工具。Overall, TadCBEa did not cause significant off-target effects at both the DNA and RNA levels, indicating that TadCBEa is a highly specific genomic base editing tool in plants.
综合上述各实施例结果可见:Based on the results of the above embodiments, it can be seen that:
TadCBEa在水稻细胞中是高效的胞嘧啶碱基编辑器,具有较高的编辑纯度,同时保持了ABE8e相对较窄的编辑窗口。在转基因水稻植株中TadCBEa可以同时编辑多个靶基因。此外,通过对转基因植株的全基因组和转录组测序,未检测到TadCBEa的脱靶效应,表明其具有高编辑特异性。总的来说,本研究报告了一种源自TadA-8e的TadCBEa胞嘧啶碱基编辑器,是一种具有高编辑活性、纯度和特异性的胞嘧啶碱基编辑器,是植物基因编辑工具箱中的后备补充。TadCBEa is an efficient cytosine base editor in rice cells with high editing purity while maintaining the relatively narrow editing window of ABE8e. TadCBEa can edit multiple target genes simultaneously in transgenic rice plants. In addition, no off-target effects of TadCBEa were detected by whole genome and transcriptome sequencing of transgenic plants, indicating its high editing specificity. In summary, this study reports a TadCBEa cytosine base editor derived from TadA-8e, which is a cytosine base editor with high editing activity, purity and specificity, and is a backup supplement in the plant gene editing toolbox.
Claims (12)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410237383.8A CN118185908A (en) | 2024-03-01 | 2024-03-01 | A cytosine base editing system derived from adenosine deaminase |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410237383.8A CN118185908A (en) | 2024-03-01 | 2024-03-01 | A cytosine base editing system derived from adenosine deaminase |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN118185908A true CN118185908A (en) | 2024-06-14 |
Family
ID=91403381
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410237383.8A Pending CN118185908A (en) | 2024-03-01 | 2024-03-01 | A cytosine base editing system derived from adenosine deaminase |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118185908A (en) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118995713A (en) * | 2024-09-14 | 2024-11-22 | 中山大学 | Promoter for improving dicotyledon gene editing efficiency and application thereof |
| CN119639719A (en) * | 2024-10-29 | 2025-03-18 | 北京林业大学 | A plant cytosine single-base editor and its construction method and application |
| CN119913132A (en) * | 2024-06-17 | 2025-05-02 | 中国农业大学 | Cytosine deaminase and its related biomaterials and applications |
| CN119912582A (en) * | 2024-06-17 | 2025-05-02 | 中国农业大学 | Single-base editors and their deaminases and applications |
| CN120290624A (en) * | 2025-04-17 | 2025-07-11 | 中国中医科学院中药研究所 | An efficient single-base editing system of Danshen and its application |
-
2024
- 2024-03-01 CN CN202410237383.8A patent/CN118185908A/en active Pending
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119913132A (en) * | 2024-06-17 | 2025-05-02 | 中国农业大学 | Cytosine deaminase and its related biomaterials and applications |
| CN119912582A (en) * | 2024-06-17 | 2025-05-02 | 中国农业大学 | Single-base editors and their deaminases and applications |
| CN119912582B (en) * | 2024-06-17 | 2025-11-04 | 中国农业大学 | Single base editor, deaminase used by same and application of deaminase |
| CN118995713A (en) * | 2024-09-14 | 2024-11-22 | 中山大学 | Promoter for improving dicotyledon gene editing efficiency and application thereof |
| CN119639719A (en) * | 2024-10-29 | 2025-03-18 | 北京林业大学 | A plant cytosine single-base editor and its construction method and application |
| CN120290624A (en) * | 2025-04-17 | 2025-07-11 | 中国中医科学院中药研究所 | An efficient single-base editing system of Danshen and its application |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Ren et al. | Improved plant cytosine base editors with high editing activity, purity, and specificity | |
| Cardi et al. | CRISPR/Cas-mediated plant genome editing: outstanding challenges a decade after implementation | |
| CN118185908A (en) | A cytosine base editing system derived from adenosine deaminase | |
| Qin et al. | High‐efficient and precise base editing of C• G to T• A in the allotetraploid cotton (Gossypium hirsutum) genome using a modified CRISPR/Cas9 system | |
| Tang et al. | Single transcript unit CRISPR 2.0 systems for robust Cas9 and Cas12a mediated plant genome editing | |
| Deng et al. | Molecular evolution and functional modification of plant miRNAs with CRISPR | |
| CN112852791B (en) | Adenine base editor and related biological material and application thereof | |
| JP7667168B2 (en) | Novel CRISPR-CAS system for genome editing | |
| CN112266420B (en) | Plant efficient cytosine single-base editor and construction and application thereof | |
| CN106609282A (en) | Carrier for base substitution of specific sites of plant genome | |
| CN113564197B (en) | Construction method and application of CRISPR/Cas9 mediated plant polygene editing vector | |
| Zhang et al. | Development of TALE‐adenine base editors in plants | |
| CN118497179A (en) | Double-base editing system suitable for plants | |
| Van den Broeck et al. | An Agrobacterium‐mediated base editing approach generates transgene‐free edited banana | |
| Jiang et al. | Improving plant C-to-G base editors with a cold-adapted glycosylase and TadA-8e variants | |
| Zakrzewski et al. | Analysis of ac 0 t-1 library enables the targeted identification of minisatellite and satellite families in Beta vulgaris | |
| CN118440964A (en) | Reverse transcriptase M-MLV RT gene sequence and application thereof | |
| WO2020177751A1 (en) | Nucleic acid construct for gene editing | |
| CN117987460A (en) | LbCas12a plant genome editing system for coexpression TypeV-A anti-CRISPR protein | |
| Debbarma et al. | Recent tools of genome editing and functional genomics in plants: present and future applications on crop improvement | |
| Kronbak et al. | A novel approach to the generation of seamless constructs for plant transformation | |
| Liang et al. | Enhanced Genome Editing Activity with Novel Chimeric ScCas9 Variants in Rice | |
| Konwar et al. | 3 Recent Tools of Genome | |
| CN110606894B (en) | Cas9 fusion protein for improving gene editing efficiency, encoding gene and application thereof | |
| Hu et al. | Cas9‐Embedding Hyperactive TadA8e Confers Efficient and Highly Specific A‐To‐G Base Editing in Rice |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |