CN114317518B

CN114317518B - Application of SpRYn-CBE base editing system in base replacement in plant genomes

Info

Publication number: CN114317518B
Application number: CN202011055742.6A
Authority: CN
Inventors: 杨进孝; 赵久然; 王飞鹏; 王瑶; 张成伟
Original assignee: Beijing Academy of Agriculture and Forestry Sciences
Current assignee: Beijing Academy of Agriculture and Forestry Sciences
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2024-01-12
Anticipated expiration: 2040-09-30
Also published as: CN114317518A

Abstract

The invention discloses application of a SpRYn-CBE base editing system in plant genome base substitution. The SpRYn-CBE base editing system comprises SpRYn, pmCDA1, UGI and sgRNA; the sgrnas target sequences. Experiments prove that: the SpRYn-CBE base editing system can edit the base C in a target sequence with the PAM sequence of NAN, NCN or NTN on a plant genome, realizes the replacement from the base C to the base T, and greatly expands the range of the editable C.

Description

Application of SpRYn-CBE base editing system in base substitution in plant genomes

技术领域Technical Field

本发明涉及生物技术领域，具体涉及SpRYn-CBE碱基编辑系统在植物基因组碱基替换中的应用。The present invention relates to the field of biotechnology, and in particular to the application of the SpRYn-CBE base editing system in plant genome base replacement.

背景技术Background Art

CRISPR-Cas9技术已经成为强有力的基因组编辑手段，被广泛应用到很多组织和细胞中。CRISPR/Cas9 protein-RNA复合物通过向导RNA(guide RNA)定位于靶点上，切割产生DNA双链断裂(dsDNA break，DSB)，而后生物体会本能的启动DNA修复机制修复DSB。修复机制一般有两种，一种是非同源末端连接(non-homologous end joining，NHEJ)，另一种是同源重组(homology-directed repair，HDR)。通常情况下NHEJ占大多数，因此修复产生的随机的indels(insertions or deletions)比精确修复高很多。对于碱基精确替换，因为HDR效率低以及需要DNA模板，所以使用HDR实现碱基精确替换的应用受到很大的限制。CRISPR-Cas9 technology has become a powerful genome editing tool and has been widely used in many tissues and cells. The CRISPR/Cas9 protein-RNA complex is located on the target site through the guide RNA (guide RNA), cutting to produce double-stranded DNA breaks (dsDNA breaks, DSBs), and then the organism will instinctively initiate the DNA repair mechanism to repair DSBs. There are generally two types of repair mechanisms, one is non-homologous end joining (NHEJ) and the other is homologous recombination (homology-directed repair, HDR). Usually NHEJ accounts for the majority, so the random indels (insertions or deletions) generated by the repair are much higher than those of precise repair. For precise base replacement, the application of HDR to achieve precise base replacement is greatly limited because of the low efficiency of HDR and the need for a DNA template.

2016年，David Liu和Akihiko Kondo两个实验室分别独立报道了两种不同类型的胞嘧啶碱基编辑器(cytosine base editor，CBE)，分别使用了两种不同的胞苷脱氨酶rAPOBEC1(rat APOBEC1)和PmCDA1(activation-induced cytidine deaminase(AID)ortholog from sea lamprey)，原理都是通过使用胞苷脱氨酶直接实现对单个胞嘧啶(Cytosine，C)碱基进行编辑，而不再通过产生DSB和启动HDR修复，大大提高了C替换为胸腺嘧啶(Thymine，T)的碱基编辑效率。具体为dead Cas9(dCas9)或the Cas9 nickase(Cas9n)连带着rAPOBEC1或PmCDA1通过向导RNA定位到靶点，rAPOBEC1或PmCDA1催化非配对的单链DNA上的C发生胞嘧啶脱氨反应变成尿嘧啶(Uracil，U)，通过DNA的修复使得U与腺嘌呤(Adenine，A)配对，又通过DNA复制，最终使得T与A配对，从而实现了C到T的转换。在所测试的编辑器中，SpCas9n(D10A)&rAPOBEC1/PmCDA1&UGI碱基编辑系统(其含有尿嘧啶DNA糖化酶抑制剂(uracil DNA glycosylase inhibitor，UGI)的平均突变率较高，原因有二：一是UGI可以抑制尿嘧啶DNA糖化酶(uracil DNA glycosylase，UDG)催化清除DNA中U，二是SpCas9n(D10A)在非编辑链上产生切口，诱导真核错配修复机制或long-patch BER(base-excision repair)修复机制，促使U:G错配更多的偏好性修复成U:A。In 2016, David Liu and Akihiko Kondo's laboratories independently reported two different types of cytosine base editors (CBEs), using two different cytidine deaminases, rAPOBEC1 (rat APOBEC1) and PmCDA1 (activation-induced cytidine deaminase (AID) ortholog from sea lamprey), respectively. The principle of both is to use cytidine deaminase to directly edit a single cytosine (C) base, instead of generating DSB and initiating HDR repair, which greatly improves the base editing efficiency of replacing C with thymine (T). Specifically, dead Cas9 (dCas9) or the Cas9 nickase (Cas9n) is linked to rAPOBEC1 or PmCDA1 to locate the target through guide RNA. rAPOBEC1 or PmCDA1 catalyzes the deamination reaction of cytosine C on the unpaired single-stranded DNA to convert it into uracil (Uracil, U). Through DNA repair, U is paired with adenine (A), and through DNA replication, T is finally paired with A, thereby realizing the conversion from C to T. Among the editors tested, the SpCas9n(D10A)&rAPOBEC1/PmCDA1&UGI base editing system (which contains a uracil DNA glycosylase inhibitor (UGI)) has a higher average mutation rate for two reasons: first, UGI can inhibit the catalytic removal of U from DNA by uracil DNA glycosylase (UDG); second, SpCas9n(D10A) produces a cut on the non-edited strand, inducing the eukaryotic mismatch repair mechanism or the long-patch BER (base-excision repair) repair mechanism, prompting the U:G mismatch to be repaired more preferentially to U:A.

目前，SpCas9n(D10A)&rAPOBEC1/PmCDA1&UGI碱基编辑系统已被广泛应用到水稻中，实现C到T的转换，但编辑的靶点主要局限在PAM(Protospacer Adjacent Motif)为NGG的序列，大大限制了可编辑的C的范围。At present, the SpCas9n(D10A)&rAPOBEC1/PmCDA1&UGI base editing system has been widely used in rice to achieve the conversion from C to T, but the editing targets are mainly limited to sequences with PAM (Protospacer Adjacent Motif) as NGG, which greatly limits the range of editable C.

发明内容Summary of the invention

本发明的第一个目的是提供一种将植物基因组靶点序列中的C突变为T的方法。The first object of the present invention is to provide a method for mutating C to T in a target sequence of a plant genome.

本发明提供的将植物基因组靶点序列中的C突变为T的方法为如下1)或2)或3)或4)：The method provided by the present invention for mutating C in a plant genome target sequence to T is as follows 1) or 2) or 3) or 4):

所述1)包括如下步骤：将SpRYn、胞嘧啶脱氨酶、sgRNA和UGI导入植物体内，实现将植物基因组靶点序列中的C突变为T；The 1) comprises the following steps: introducing SpRYn, cytosine deaminase, sgRNA and UGI into the plant body to achieve mutation of C in the target sequence of the plant genome to T;

所述2)包括如下步骤：将SpRYn、胞嘧啶脱氨酶和sgRNA导入植物体内，实现将植物基因组靶点序列中的C突变为T；The 2) comprises the following steps: introducing SpRYn, cytosine deaminase and sgRNA into the plant body to achieve mutation of C in the target sequence of the plant genome to T;

所述3)包括如下步骤：将SpRYn的编码基因、胞嘧啶脱氨酶的编码基因、转录sgRNA的DNA分子和UGI的编码基因导入植物体内，使所述SpRYn、所述胞嘧啶脱氨酶、所述sgRNA和所述UGI均得到表达，实现将植物基因组靶点序列中的C突变为T；The 3) comprises the following steps: introducing the coding gene of SpRYn, the coding gene of cytosine deaminase, the DNA molecule for transcribing sgRNA and the coding gene of UGI into the plant body, so that the SpRYn, the cytosine deaminase, the sgRNA and the UGI are all expressed, so as to achieve the mutation of C in the target sequence of the plant genome to T;

所述4)包括如下步骤：将SpRYn的编码基因、胞嘧啶脱氨酶的编码基因和转录sgRNA的DNA分子导入植物体内，使所述SpRYn、所述胞嘧啶脱氨酶和所述sgRNA均得到表达，实现将植物基因组靶点序列中的C突变为T；The method 4) comprises the following steps: introducing a gene encoding SpRYn, a gene encoding cytosine deaminase and a DNA molecule for transcribing sgRNA into a plant body, so that the SpRYn, the cytosine deaminase and the sgRNA are all expressed, thereby achieving a mutation of C in the target sequence of the plant genome to T;

所述sgRNA靶向靶点序列；The sgRNA targets a target sequence;

所述靶点序列的PAM序列为NAN或NCN或NTN；N为A、T、C或G。The PAM sequence of the target sequence is NAN or NCN or NTN; N is A, T, C or G.

上述将植物基因组靶点序列中的C突变为T的方法中，所述sgRNA为tRNA-esgRNA；In the above method for mutating C in a plant genome target sequence to T, the sgRNA is tRNA-esgRNA;

所述tRNA-esgRNA如式I所示：tRNA-所述靶点序列转录的RNA-esgRNA骨架(式I)；The tRNA-esgRNA is as shown in Formula I: tRNA-RNA transcribed from the target sequence-esgRNA backbone (Formula I);

所述tRNA为m1)或m2)或m3)：The tRNA is m1) or m2) or m3):

m1)将序列1第597-673位中的T替换为U得到的RNA分子；m1) RNA molecule obtained by replacing T in positions 597-673 of sequence 1 with U;

m2)将m1)所示的RNA分子经过一个或几个核苷酸的取代和/或缺失和/或添加且具有相同功能的RNA分子；m2) an RNA molecule having the same function as the RNA molecule shown in m1) after one or more nucleotides are replaced and/or deleted and/or added;

m3)与m1)或m2)限定的核苷酸序列具有75％或75％以上同一性且具有相同功能的RNA分子；m3) an RNA molecule having 75% or more identity with the nucleotide sequence defined in m1) or m2) and having the same function;

所述esgRNA骨架为n1)或n2)或n3)：The esgRNA backbone is n1) or n2) or n3):

n1)将序列1第694-779位中的T替换为U得到的RNA分子；n1) RNA molecule obtained by replacing T in positions 694-779 of sequence 1 with U;

n2)将n1)所示的RNA分子经过一个或几个核苷酸的取代和/或缺失和/或添加且具有相同功能的RNA分子；n2) an RNA molecule having the same function as the RNA molecule shown in n1) after one or more nucleotides are substituted and/or deleted and/or added;

n3)与n1)或n2)限定的核苷酸序列具有75％或75％以上同一性且具有相同功能的RNA分子。n3) An RNA molecule that has 75% or more identity with the nucleotide sequence defined in n1) or n2) and has the same function.

上述将植物基因组靶点序列中的C突变为T的方法中，所述SpRYn为A1)或A2)或A3)：In the above method for mutating C in a plant genome target sequence to T, the SpRYn is A1) or A2) or A3):

A1)氨基酸序列是序列2所示的蛋白质；A1) The amino acid sequence is the protein shown in SEQ ID NO: 2;

A2)将序列表中序列2所示的氨基酸序列经过一个或几个氨基酸残基的取代和/或缺失和/或添加且具有相同功能的蛋白质；A2) a protein having the same function as the amino acid sequence shown in Sequence 2 in the sequence list, wherein one or more amino acid residues are substituted and/or deleted and/or added;

A3)在A1)或A2)的N端或/和C端连接标签得到的融合蛋白质。A3) A fusion protein obtained by connecting a tag to the N-terminus and/or C-terminus of A1) or A2).

所述胞嘧啶脱氨酶可为human APOBEC3A、human AID、PmCDA1或rAPOBEC1等蛋白质。在本发明的具体实施例中，所述胞嘧啶脱氨酶为PmCDA1。The cytosine deaminase may be a protein such as human APOBEC3A, human AID, PmCDA1 or rAPOBEC1. In a specific embodiment of the present invention, the cytosine deaminase is PmCDA1.

所述PmCDA1为C1)或C2)或C3)：The PmCDA1 is C1) or C2) or C3):

C1)氨基酸序列是序列3所示的蛋白质；C1) The amino acid sequence is the protein shown in SEQ ID NO: 3;

C2)将序列表中序列3所示的氨基酸序列经过一个或几个氨基酸残基的取代和/或缺失和/或添加且具有相同功能的蛋白质；C2) a protein having the same function as the amino acid sequence shown in Sequence 3 in the sequence list, wherein one or more amino acid residues are substituted and/or deleted and/or added;

C3)在C1)或C2)的N端或/和C端连接标签得到的融合蛋白质。C3) A fusion protein obtained by connecting a tag to the N-terminus and/or C-terminus of C1) or C2).

所述UGI为E1)或E2)或E3)：The UGI is E1) or E2) or E3):

E1)氨基酸序列是序列4所示的蛋白质；E1) The amino acid sequence is the protein shown in SEQ ID NO:4;

E2)将序列表中序列4所示的氨基酸序列经过一个或几个氨基酸残基的取代和/或缺失和/或添加且具有相同功能的蛋白质；E2) a protein having the same function as the amino acid sequence shown in Sequence 4 in the sequence list, wherein one or more amino acid residues are substituted and/or deleted and/or added;

E3)在E1)或E2)的N端或/和C端连接标签得到的融合蛋白质。E3) A fusion protein obtained by connecting a tag to the N-terminus and/or C-terminus of E1) or E2).

为了使A1)、C1)或E1)中的蛋白质便于纯化，可在由序列表中序列2或序列3或序列4所示的氨基酸序列组成的蛋白质的氨基末端或羧基末端连接上如下表所示的标签。In order to facilitate the purification of the protein in A1), C1) or E1), a tag as shown in the following table may be connected to the amino terminus or carboxyl terminus of the protein consisting of the amino acid sequence shown in Sequence 2, Sequence 3 or Sequence 4 in the sequence listing.

表、标签的序列Table, sequence of labels

标签Label 残基Residue 序列sequence Poly-ArgPoly-Arg 5-6(通常为5个)5-6 (usually 5) RRRRRRRRRR Poly-HisPoly-His 2-10(通常为6个)2-10 (usually 6) HHHHHHHHHHHH FLAGFLAG 88 DYKDDDDKDYKDDDDK Strep-tag IIStrep-tag II 88 WSHPQFEKWSHQ c-mycc-myc 1010 EQKLISEEDLEQKLISEEDL

上述A2)、C2)或E2)中的蛋白质，为与序列2或序列3或序列4所示蛋白质的氨基酸序列具有75％或75％以上同一性且具有相同功能的蛋白质。所述具有75％或75％以上同一性为具有75％、具有80％、具有85％、具有90％、具有95％、具有96％、具有97％、具有98％或具有99％的同一性。The protein in A2), C2) or E2) is a protein having 75% or more identity with the amino acid sequence of the protein shown in SEQ ID NO: 2, SEQ ID NO: 3 or SEQ ID NO: 4 and having the same function. The 75% or more identity is 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity.

上述A2)、C2)或E2)中的蛋白质可人工合成，也可先合成其编码基因，再进行生物表达得到。The proteins in A2), C2) or E2) above can be synthesized artificially, or their encoding genes can be synthesized first and then expressed biologically.

上述A2)、C2)或E2)中的蛋白质的编码基因可通过将序列1的第3167-7267位、第7553-8176位或第8210-8458位所示的DNA序列中缺失一个或几个氨基酸残基的密码子，和/或进行一个或几个碱基对的错义突变，和/或在其5′端和/或3′端连接上表所示的标签的编码序列得到。序列1的第3167-7267位、第7553-8176位和第8210-8458位分别编码序列2、序列3和序列4所示的蛋白质。The coding gene of the protein in the above A2), C2) or E2) can be obtained by deleting one or several codons of amino acid residues in the DNA sequence shown in positions 3167-7267, 7553-8176 or 8210-8458 of Sequence 1, and/or performing missense mutation of one or several base pairs, and/or connecting the coding sequence of the tag shown in the above table to its 5' end and/or 3' end. Positions 3167-7267, 7553-8176 and 8210-8458 of Sequence 1 encode the proteins shown in Sequence 2, Sequence 3 and Sequence 4, respectively.

所述SpRYn的编码基因为b1)或b2)或b3)：The coding gene of the SpRYn is b1) or b2) or b3):

b1)序列表中序列1第3167-7267位所示的cDNA分子或DNA分子；b1) the cDNA molecule or DNA molecule shown in positions 3167-7267 of Sequence 1 in the sequence listing;

b2)与b1)限定的核苷酸序列具有75％或75％以上同一性，且编码上述SpRYn的cDNA分子或DNA分子；b2) a cDNA molecule or a DNA molecule that has 75% or more identity with the nucleotide sequence defined in b1) and encodes the above-mentioned SpRYn;

b3)在严格条件下与b1)或b2)限定的核苷酸序列杂交，且编码上述SpRYn的cDNA分子或DNA分子；b3) a cDNA molecule or a DNA molecule that hybridizes with the nucleotide sequence defined in b1) or b2) under stringent conditions and encodes the above-mentioned SpRYn;

所述PmCDA1的编码基因为d1)或d2)或d3)：The coding gene of PmCDA1 is d1) or d2) or d3):

d1)序列表中序列1第7553-8176位所示的cDNA分子或DNA分子；d1) the cDNA molecule or DNA molecule shown in positions 7553-8176 of Sequence 1 in the sequence listing;

d2)与d1)限定的核苷酸序列具有75％或75％以上同一性，且编码上述PmCDA1的cDNA分子或DNA分子；d2) a cDNA molecule or a DNA molecule that has 75% or more identity with the nucleotide sequence defined in d1) and encodes the above-mentioned PmCDA1;

d3)在严格条件下与d1)或d2)限定的核苷酸序列杂交，且编码上述PmCDA1的cDNA分子或DNA分子；d3) a cDNA molecule or a DNA molecule that hybridizes with the nucleotide sequence defined in d1) or d2) under stringent conditions and encodes the above-mentioned PmCDA1;

所述UGI的编码基因为f1)或f2)或f3)：The coding gene of the UGI is f1) or f2) or f3):

f1)序列表中序列1第8210-8458位所示的cDNA分子或DNA分子；f1) cDNA molecules or DNA molecules shown in positions 8210-8458 of Sequence 1 in the sequence listing;

f2)与f1)限定的核苷酸序列具有75％或75％以上同一性，且编码上述UGI的cDNA分子或DNA分子；f2) a cDNA molecule or a DNA molecule that has 75% or more identity with the nucleotide sequence defined in f1) and encodes the above-mentioned UGI;

f3)在严格条件下与f1)或f2)限定的核苷酸序列杂交，且编码上述UGI的cDNA分子或DNA分子。f3) a cDNA molecule or a DNA molecule which hybridizes with the nucleotide sequence defined in f1) or f2) under stringent conditions and encodes the above-mentioned UGI.

本领域普通技术人员可以很容易地采用已知的方法，例如定向进化和点突变的方法，对本发明的编码所述SpRYn、所述PmCDA1或所述UGI的核苷酸序列进行突变。那些经过人工修饰的，具有与本发明的所述SpRYn、所述PmCDA1或所述UGI的核苷酸序列75％或者更高同一性的核苷酸，只要编码所述SpRYn、所述PmCDA1或所述UGI且具有相同功能，均是衍生于本发明的核苷酸序列并且等同于本发明的序列。A person skilled in the art can easily mutate the nucleotide sequence encoding the SpRYn, the PmCDA1 or the UGI of the present invention by using known methods, such as directed evolution and point mutation. Those artificially modified nucleotides having 75% or higher identity with the nucleotide sequence of the SpRYn, the PmCDA1 or the UGI of the present invention are derived from the nucleotide sequence of the present invention and are equivalent to the sequence of the present invention as long as they encode the SpRYn, the PmCDA1 or the UGI and have the same function.

这里使用的术语“同一性”指与天然核酸序列的序列相似性。“同一性”包括与本发明的编码序列2、3或4所示的氨基酸序列组成的蛋白质的核苷酸序列具有75％或更高，或85％或更高，或90％或更高，或95％或更高同一性的核苷酸序列。同一性可以用肉眼或计算机软件进行评价。使用计算机软件，两个或多个序列之间的同一性可以用百分比(％)表示，其可以用来评价相关序列之间的同一性。The term "identity" as used herein refers to sequence similarity to a natural nucleic acid sequence. "Identity" includes nucleotide sequences that have 75% or more, or 85% or more, or 90% or more, or 95% or more identity to the nucleotide sequence of a protein consisting of the amino acid sequence shown in the coding sequence 2, 3 or 4 of the present invention. Identity can be evaluated by the naked eye or by computer software. Using computer software, the identity between two or more sequences can be expressed as a percentage (%), which can be used to evaluate the identity between related sequences.

所述严格条件是在2×SSC，0.1％SDS的溶液中，在68℃下杂交并洗膜2次，每次5min，又于0.5×SSC，0.1％SDS的溶液中，在68℃下杂交并洗膜2次，每次15min；或，0.1×SSPE(或0.1×SSC)、0.1％SDS的溶液中，65℃条件下杂交并洗膜。The stringent conditions are hybridization and washing the membrane twice at 68°C in a 2×SSC, 0.1% SDS solution for 5 min each time, and hybridization and washing the membrane twice at 68°C in a 0.5×SSC, 0.1% SDS solution for 15 min each time; or hybridization and washing the membrane at 65°C in a 0.1×SSPE (or 0.1×SSC), 0.1% SDS solution.

上述75％或75％以上同一性，可为80％、85％、90％或95％以上的同一性。The aforementioned 75% or more identity may be 80%, 85%, 90% or 95% or more identity.

上述将植物基因组靶点序列中的C突变为T的方法中，所述转录tRNA-esgRNA的DNA分子转录后得到的所述tRNA-esgRNA为不成熟的RNA前体，该RNA前体中的tRNA会被两种酶(RNase P和RNase Z)切割掉后得到成熟的RNA。一个重组表达载体中有多少个靶点，就会得到多少个独立的成熟的RNA，每个成熟的RNA依次由所述靶点序列转录的RNA和所述esgRNA骨架组成，或依次由所述tRNA残留的个别碱基、所述靶点序列转录的RNA和所述esgRNA骨架组成。In the above method for mutating C in the target sequence of the plant genome to T, the tRNA-esgRNA obtained after the transcription of the DNA molecule that transcribes the tRNA-esgRNA is an immature RNA precursor, and the tRNA in the RNA precursor will be cut off by two enzymes (RNase P and RNase Z) to obtain a mature RNA. As many independent mature RNAs are obtained as there are targets in a recombinant expression vector, each mature RNA is composed of the RNA transcribed from the target sequence and the esgRNA skeleton, or is composed of the individual bases remaining in the tRNA, the RNA transcribed from the target sequence, and the esgRNA skeleton.

上述将植物基因组靶点序列中的C突变为T的方法中，所述1)和3)中，所述UGI的个数可为一个或两个或多个。在本发明的具体实施例中，所述UGI的个数具体为两个。In the above method for mutating C in a plant genome target sequence to T, in 1) and 3), the number of the UGIs may be one or two or more. In a specific embodiment of the present invention, the number of the UGIs is specifically two.

上述将植物基因组靶点序列中的C突变为T的方法中，所述3)中，所述SpRYn的编码基因、所述转录sgRNA的DNA分子、所述胞嘧啶脱氨酶的编码基因和所述UGI的编码基因可通过一个或多个重组表达载体导入植物体内。在本发明的具体实施例中，所述SpRYn的编码基因、所述转录tRNA-esgRNA的DNA分子、所述PmCDA1的编码基因和所述UGI的编码基因通过一个重组表达载体导入植物体内。In the above method for mutating C in the target sequence of the plant genome to T, in 3), the coding gene of SpRYn, the DNA molecule for transcribing sgRNA, the coding gene of cytosine deaminase and the coding gene of UGI can be introduced into the plant body through one or more recombinant expression vectors. In a specific embodiment of the present invention, the coding gene of SpRYn, the DNA molecule for transcribing tRNA-esgRNA, the coding gene of PmCDA1 and the coding gene of UGI are introduced into the plant body through one recombinant expression vector.

进一步的，所述重组载体还包括筛选剂抗性蛋白的编码基因。Furthermore, the recombinant vector also includes a gene encoding a selection agent resistance protein.

更进一步的，所述重组载体包括含有转录tRNA-esgRNA的DNA分子的表达盒和依次含有所述SpRYn的编码基因、所述PmCDA1的编码基因、所述UGI的编码基因、所述UGI的编码基因、所述自切割寡肽的编码基因和所述筛选剂抗性蛋白的编码基因的表达盒。Furthermore, the recombinant vector includes an expression cassette containing a DNA molecule for transcribing tRNA-esgRNA and an expression cassette containing, in sequence, the coding gene of the SpRYn, the coding gene of the PmCDA1, the coding gene of the UGI, the coding gene of the UGI, the coding gene of the self-cleaving oligopeptide and the coding gene of the selection agent resistance protein.

所述含有转录tRNA-esgRNA的DNA分子的表达盒的个数可为一个或两个或多个。具体可为一个或两个或三个。The number of expression cassettes containing the DNA molecule for transcribing tRNA-esgRNA may be one, two or more, and may be one, two or three.

所述自切割寡肽可为来源于病毒基因组的2A自切割寡肽，如口蹄疫病毒(FMDV)(F2A)肽、马A型鼻炎病毒(ERAV)(E2A)肽、明脉扁刺蛾β四体病毒(Thosea asigna virus)(T2A)肽、猪捷申病毒-1(PTV-1)(P2A)肽、泰勒病毒2A肽以及脑心肌炎病毒2A肽。具体可为P2A肽。The self-cleaving oligopeptide can be a 2A self-cleaving oligopeptide derived from a viral genome, such as a foot-and-mouth disease virus (FMDV) (F2A) peptide, an equine rhinitis virus (ERAV) (E2A) peptide, a Thosea asigna virus (T2A) peptide, a porcine Teschovirus-1 (PTV-1) (P2A) peptide, a Theiler virus 2A peptide, and an encephalomyocarditis virus 2A peptide. Specifically, it can be a P2A peptide.

所述筛选剂抗性蛋白具体可为潮霉素磷酸转移酶。The selection agent resistance protein may specifically be hygromycin phosphotransferase.

在本发明的具体实施例中，所述重组表达载体具体为SpRYn-CBE-9重组表达载体、SpRYn-CBE-10重组表达载体、SpRYn-CBE-11重组表达载体、SpRYn-CBE-12重组表达载体、SpRYn-CBE-13重组表达载体、SpRYn-CBE-14重组表达载体、SpRYn-CBE-15重组表达载体。In a specific embodiment of the present invention, the recombinant expression vector is specifically a SpRYn-CBE-9 recombinant expression vector, a SpRYn-CBE-10 recombinant expression vector, a SpRYn-CBE-11 recombinant expression vector, a SpRYn-CBE-12 recombinant expression vector, a SpRYn-CBE-13 recombinant expression vector, a SpRYn-CBE-14 recombinant expression vector, and a SpRYn-CBE-15 recombinant expression vector.

本发明的另一个目的是提供上述将植物基因组靶点序列中的C突变为T的方法的新用途。Another object of the present invention is to provide a new use of the above method for mutating C to T in a target sequence of a plant genome.

本发明提供了上述将植物基因组靶点序列中的C突变为T的方法在植物基因组碱基替换中的应用。The present invention provides application of the method for mutating C in a plant genome target sequence to T in plant genome base replacement.

本发明又提供了上述将植物基因组靶点序列中的C突变为T的方法在植物基因组碱基编辑中的应用。The present invention further provides application of the above-mentioned method of mutating C in a plant genome target sequence to T in plant genome base editing.

本发明还提供了上述将植物基因组靶点序列中的C突变为T的方法在制备植物突变体中的应用。The present invention also provides application of the above method for mutating C in a target sequence of a plant genome to T in preparing plant mutants.

本发明还有一个目的是提供成套试剂的新用途；所述成套试剂为R1)或R2)：Another object of the present invention is to provide a new use of a set of reagents; the set of reagents is R1) or R2):

所述R1)包括上述SpRYn、上述胞嘧啶脱氨酶和上述sgRNA；The R1) includes the SpRYn, the cytosine deaminase and the sgRNA;

所述R2)包括上述SpRYn、上述胞嘧啶脱氨酶、上述sgRNA和上述UGI；The R2) includes the SpRYn, the cytosine deaminase, the sgRNA and the UGI;

本发明提供了成套试剂在如下T1)-T7)任一种中的应用：The present invention provides a set of reagents for use in any of the following T1)-T7):

T1)将植物基因组靶点序列中的C突变为T；T1) Mutate C to T in the target sequence of the plant genome;

T2)制备将植物基因组靶点序列中的C突变为T的产品；T2) preparing a product in which C in the target sequence of the plant genome is mutated to T;

T3)植物基因组碱基替换；T3) Plant genome base substitution;

T4)制备植物基因组碱基替换的产品；T4) preparing products of plant genome base substitution;

T5)植物基因组碱基编辑；T5) Plant genome base editing;

T6)制备植物基因组碱基编辑的产品；T6) preparing plant genome base editing products;

T7)制备植物突变体；T7) preparing plant mutants;

进一步的，所述成套试剂还包括上述自切割寡肽和上述筛选剂抗性蛋白。Furthermore, the kit also includes the self-cleaving oligopeptide and the screening agent resistance protein.

更进一步的，所述成套试剂由上述SpRYn、上述胞嘧啶脱氨酶、上述sgRNA、上述UGI、上述自切割寡肽和上述筛选剂抗性蛋白组成。Furthermore, the set of reagents consists of the above-mentioned SpRYn, the above-mentioned cytosine deaminase, the above-mentioned sgRNA, the above-mentioned UGI, the above-mentioned self-cleaving oligopeptide and the above-mentioned screening agent resistance protein.

上述任一所述方法或应用中，所述PAM序列为与所述靶点序列3′端相连的一段DNA序列。所述PAM序列自5′端起第一个N与所述靶点序列3′端相连。所述靶点序列大小可为15-25bp，进一步可为18-22bp，更进一步可为20bp。In any of the above methods or applications, the PAM sequence is a DNA sequence connected to the 3' end of the target sequence. The first N of the PAM sequence from the 5' end is connected to the 3' end of the target sequence. The size of the target sequence can be 15-25 bp, further 18-22 bp, and further 20 bp.

进一步的，所述NAN可为NAG或NAC。Furthermore, the NAN may be NAG or NAC.

所述NCN可为NCA、NCG、NCC或NCT。The NCN may be NCA, NCG, NCC or NCT.

所述NTN可为NTA、NTC或NTT。The NTN may be NTA, NTC or NTT.

更进一步的，所述NAG可为AAG。Furthermore, the NAG may be AAG.

所述NAC可为GAC或AAC。The NAC may be GAC or AAC.

所述NCA可为GCA。The NCA may be GCA.

所述NCG可为ACG。The NCG may be ACG.

所述NCC可为GCC。The NCC may be GCC.

所述NCT可为TCT。The NCT may be TCT.

所述NTA可为GTA。The NTA may be a GTA.

所述NTC可为TTC。The NTC may be a TTC.

所述NTT可为CTT。The NTT may be a CTT.

上述任一所述方法或应用中，所述靶点序列可为一个或两个或多个。In any of the above methods or applications, the target sequence may be one, two or more.

上述任一所述方法或应用中，所述碱基编辑或碱基替换为将植物基因组靶点序列中的C突变为T。所述C可为位于所述靶点序列中任意位置的碱基C。In any of the above methods or applications, the base editing or base replacement is to mutate C in the target sequence of the plant genome to T. The C can be a base C located at any position in the target sequence.

上述任一所述方法或应用中，所述植物为S1)或S2)或S3)：In any of the above methods or applications, the plant is S1) or S2) or S3):

S1)单子叶植物或双子叶植物；S1) monocots or dicots;

S2)禾本科植物；S2) Gramineae;

S3)水稻(如日本晴)。S3) Rice (such as Nipponbare).

本发明提供了SpRYn-CBE碱基编辑系统在植物基因组碱基替换中的应用。本发明的SpRYn-CBE碱基编辑系统包括SpRYn、PmCDA1、UGI和tRNA-esgRNA。通过实验证明：本发明的SpRYn-CBE碱基编辑系统可对位于植物基因组上的PAM序列为NAN或NCN或NTN的靶点序列中的碱基C进行编辑，实现碱基C到碱基T的替换，大大拓展了可编辑的C的范围。The present invention provides an application of a SpRYn-CBE base editing system in base replacement in a plant genome. The SpRYn-CBE base editing system of the present invention comprises SpRYn, PmCDA1, UGI and tRNA-esgRNA. Experiments have shown that the SpRYn-CBE base editing system of the present invention can edit the base C in the target sequence whose PAM sequence is NAN, NCN or NTN on the plant genome, thereby achieving the replacement of the base C to the base T, greatly expanding the range of editable C.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为SpRYn-CBE碱基编辑系统载体各元件结构示意图。其中，n为靶点个数，具体可为1、2或3，OsU6具体可为OsU6a、OsU6b或OsU6c，一个靶点时使用OsU6a，两个靶点时分别使用OsU6a和OsU6b，三个靶点时分别使用OsU6a，OsU6b和OsU6c。Figure 1 is a schematic diagram of the structure of each element of the SpRYn-CBE base editing system vector. Among them, n is the number of targets, which can be 1, 2 or 3, and OsU6 can be OsU6a, OsU6b or OsU6c. OsU6a is used for one target, OsU6a and OsU6b are used for two targets, and OsU6a, OsU6b and OsU6c are used for three targets.

具体实施方式DETAILED DESCRIPTION

下面结合具体实施方式对本发明进行进一步的详细描述，给出的实施例仅为了阐明本发明，而不是为了限制本发明的范围。下述实施例中的实验方法，如无特殊说明，均为常规方法。下述实施例中所用的材料、试剂、仪器等，如无特殊说明，均可从商业途径得到。下述实施例中，如无特殊说明，序列表中各核苷酸序列的第1位均为相应DNA/RNA的5′末端核苷酸，末位均为相应DNA/RNA的3′末端核苷酸。The present invention is further described in detail below in conjunction with specific embodiments. The examples given are only for illustrating the present invention, not for limiting the scope of the present invention. The experimental methods in the following examples are conventional methods unless otherwise specified. The materials, reagents, instruments, etc. used in the following examples are all commercially available unless otherwise specified. In the following examples, unless otherwise specified, the first position of each nucleotide sequence in the sequence table is the 5' terminal nucleotide of the corresponding DNA/RNA, and the last position is the 3' terminal nucleotide of the corresponding DNA/RNA.

引物对NAA-C1由引物NAA-C1-F：5’-CGCACGGCGGGAGGTACGTGC-3’和引物NAA-C1-R：5’-ATCAATAGCTGCAGTGTACTCTG-3’组成，用于扩增靶点NAA-C1。The primer pair NAA-C1 consists of primer NAA-C1-F: 5’-CGCACGGCGGGAGGTACGTGC-3’ and primer NAA-C1-R: 5’-ATCAATAGCTGCAGTGTACTCTG-3’, which is used to amplify the target site NAA-C1.

引物对NAA-C2由引物NAA-C2-F：5’-TACCGCGCGCCGGAGCTGCT-3’和引物NAA-C2-R：5’-GCGCCTCCTCAACTGCATGTCA-3’组成，用于扩增靶点NAA-C2。The primer pair NAA-C2 consists of primer NAA-C2-F: 5’-TACCGCGCGCCGGAGCTGCT-3’ and primer NAA-C2-R: 5’-GCGCCTCCTCAACTGCATGTCA-3’, which is used to amplify the target site NAA-C2.

引物对NAA-C3由引物NAA-C3-F：5’-TATTCAGATCAGCATTTGGTGATAC-3’和引物NAA-C3-R：5’-AAGAAGATACAGTTAAGCTCCTG-3’组成，用于扩增靶点NAA-C3。The primer pair NAA-C3 consists of primer NAA-C3-F: 5’-TATTCAGATCAGCATTTGGTGATAC-3’ and primer NAA-C3-R: 5’-AAGAAGATACAGTTAAGCTCCTG-3’, which is used to amplify the target site NAA-C3.

引物对NAA-C4由引物NAA-C4-F：5’-GATCATCGCATTGGATGGA-3’和引物NAA-C4-R：5’-ATGGGTTGTTGTTGAGGTTTAG-3’组成，用于扩增靶点NAA-C4。The primer pair NAA-C4 consists of primer NAA-C4-F: 5’-GATCATCGCATTGGATGGA-3’ and primer NAA-C4-R: 5’-ATGGGTTGTTGTTGAGGTTTAG-3’, which is used to amplify the target site NAA-C4.

引物对NAA-C5由引物NAA-C5-F：5’-CGCACGGCGGGAGGTACGTGC-3’和引物NAA-C5-R：5’-ATCAATAGCTGCAGTGTACTCTG-3’组成，用于扩增靶点NAA-C5。The primer pair NAA-C5 consists of primer NAA-C5-F: 5’-CGCACGGCGGGAGGTACGTGC-3’ and primer NAA-C5-R: 5’-ATCAATAGCTGCAGTGTACTCTG-3’, which is used to amplify the target site NAA-C5.

引物对NAA-C6由引物NAA-C6-F：5’-CGCACGGCGGGAGGTACGTGC-3’和引物NAA-C6-R：5’-ATCAATAGCTGCAGTGTACTCTG-3’组成，用于扩增靶点NAA-C6。The primer pair NAA-C6 consists of primer NAA-C6-F: 5’-CGCACGGCGGGAGGTACGTGC-3’ and primer NAA-C6-R: 5’-ATCAATAGCTGCAGTGTACTCTG-3’, which is used to amplify the target site NAA-C6.

引物对NAA-C7由引物NAA-C7-F：5’-GCACTGCCAGGTGAGTGAACT-3’和引物NAA-C7-R：5’-GCGCCTCCTCAACTGCATGTCA-3’组成，用于扩增靶点NAA-C7。The primer pair NAA-C7 consists of primer NAA-C7-F: 5’-GCACTGCCAGGTGAGTGAACT-3’ and primer NAA-C7-R: 5’-GCGCCTCCTCAACTGCATGTCA-3’, which is used to amplify the target site NAA-C7.

引物对NAA-C8由引物NAA-C8-F：5’-GGGCGAGCGCGGAGTGCGT-3’和引物NAA-C8-R：5’-TCAATGCGTGGCCCACATG-3’组成，用于扩增靶点NAA-C8。The primer pair NAA-C8 consists of primer NAA-C8-F: 5’-GGGCGAGCGCGGAGTGCGT-3’ and primer NAA-C8-R: 5’-TCAATGCGTGGCCCACATG-3’, which is used to amplify the target site NAA-C8.

引物对NAT-C1由引物NAT-C1-F：5’-CGCACGGCGGGAGGTACGTGC-3’和引物NAT-C1-R：5’-ATCAATAGCTGCAGTGTACTCTG-3’组成，用于扩增靶点NAT-C1。The primer pair NAT-C1, consisting of primer NAT-C1-F: 5’-CGCACGGCGGGAGGTACGTGC-3’ and primer NAT-C1-R: 5’-ATCAATAGCTGCAGTGTACTCTG-3’, was used to amplify the target NAT-C1.

引物对NAT-C2由引物NAT-C2-F：5’-AGAAGAGGAGGAGGGATTAGG-3’和引物NAT-C2-R：5’-GGGATCACATCCCTGATGCC-3’组成，用于扩增靶点NAT-C2。The primer pair NAT-C2, consisting of primer NAT-C2-F: 5’-AGAAGAGGAGGAGGGATTAGG-3’ and primer NAT-C2-R: 5’-GGGATCACATCCCTGATGCC-3’, was used to amplify the target NAT-C2.

引物对NAT-C3由引物NAT-C3-F：5’-AGAAGAGGAGGAGGGATTAGG-3’和引物NAT-C3-R：5’-GGGATCACATCCCTGATGCC-3’组成，用于扩增靶点NAT-C3。The primer pair NAT-C3, consisting of primer NAT-C3-F: 5’-AGAAGAGGAGGAGGGATTAGG-3’ and primer NAT-C3-R: 5’-GGGATCACATCCCTGATGCC-3’, was used to amplify the target NAT-C3.

引物对NAT-C4由引物NAT-C4-F：5’-TATTCAGATCAGCATTTGGTGATAC-3’和引物NAT-C4-R：5’-AAGAAGATACAGTTAAGCTCCTG-3’组成，用于扩增靶点NAT-C4。The primer pair NAT-C4, consisting of primer NAT-C4-F: 5’-TATTCAGATCAGCATTTGGTGATAC-3’ and primer NAT-C4-R: 5’-AAGAAGATACAGTTAAGCTCCTG-3’, was used to amplify the target NAT-C4.

引物对NAT-C5由引物NAT-C5-F：5’-TACCGCGCGCCGGAGCTGCT-3’和引物NAT-C5-R：5’-GCGCCTCCTCAACTGCATGTCA-3’组成，用于扩增靶点NAT-C5。The primer pair NAT-C5 consists of primer NAT-C5-F: 5’-TACCGCGCGCCGGAGCTGCT-3’ and primer NAT-C5-R: 5’-GCGCCTCCTCAACTGCATGTCA-3’, which is used to amplify the target NAT-C5.

引物对NAT-C6由引物NAT-C6-F：5’-GGGCGAGCGCGGAGTGCGT-3’和引物NAT-C6-R：5’-TCAATGCGTGGCCCACATG-3’组成，用于扩增靶点NAT-C6。The primer pair NAT-C6, consisting of primer NAT-C6-F: 5’-GGGCGAGCGCGGAGTGCGT-3’ and primer NAT-C6-R: 5’-TCAATGCGTGGCCCACATG-3’, was used to amplify the target NAT-C6.

引物对NAT-C7由引物NAT-C7-F：5’-CCTAGCAAGGACAAGTACATCA-3’和引物NAT-C7-R：5’-GCCATGATGAGATGAGCAAGC-3’组成，用于扩增靶点NAT-C7。The primer pair NAT-C7, consisting of primer NAT-C7-F: 5’-CCTAGCAAGGACAAGTACATCA-3’ and primer NAT-C7-R: 5’-GCCATGATGAGATGAGCAAGC-3’, was used to amplify the target NAT-C7.

引物对NAT-C8由引物NAT-C8-F：5’-TACCGCGCGCCGGAGCTGCT-3’和引物NAT-C8-R：5’-GCGCCTCCTCAACTGCATGTCA-3’组成，用于扩增靶点NAT-C8。The primer pair NAT-C8, consisting of primer NAT-C8-F: 5’-TACCGCGCGCCGGAGCTGCT-3’ and primer NAT-C8-R: 5’-GCGCCTCCTCAACTGCATGTCA-3’, was used to amplify the target site NAT-C8.

引物对NAC-C1由引物NAC-C1-F：5’-TGATGTCACCTGATGATCTG-3’和引物NAC-C1-R：5’-GTGAGGCCGTGCGGGTTGG-3’组成，用于扩增靶点NAC-C1。The primer pair NAC-C1, consisting of primer NAC-C1-F: 5’-TGATGTCACCTGATGATCTG-3’ and primer NAC-C1-R: 5’-GTGAGGCCGTGCGGGTTGG-3’, was used to amplify the target site NAC-C1.

引物对NAC-C2由引物NAC-C2-F：5’-ACACAGCAAGGAGTGCCGG-3’和引物NAC-C2-R：5’-GCGTCGCATGTGATATTTGTCA-3’组成，用于扩增靶点NAC-C2。The primer pair NAC-C2, consisting of primer NAC-C2-F: 5’-ACACAGCAAGGAGTGCCGG-3’ and primer NAC-C2-R: 5’-GCGTCGCATGTGATATTTGTCA-3’, was used to amplify the target site NAC-C2.

引物对NAC-C3由引物NAC-C3-F：5’-GCCGCGACGGCCAAGACC-3’和引物NAC-C3-R：5’-AAGCCTCAATTTTCCCTGTC-3’组成，用于扩增靶点NAC-C3。The primer pair NAC-C3, consisting of primer NAC-C3-F: 5’-GCCGCGACGGCCAAGACC-3’ and primer NAC-C3-R: 5’-AAGCCTCAATTTTCCCTGTC-3’, was used to amplify the target site NAC-C3.

引物对NAC-C4由引物NAC-C4-F：5’-TGATGTCACCTGATGATCTG-3’和引物NAC-C4-R：5’-GTGAGGCCGTGCGGGTTGG-3’组成，用于扩增靶点NAC-C4。The primer pair NAC-C4, consisting of primer NAC-C4-F: 5’-TGATGTCACCTGATGATCTG-3’ and primer NAC-C4-R: 5’-GTGAGGCCGTGCGGGTTGG-3’, was used to amplify the target site NAC-C4.

引物对NAG-C1由引物NAG-C1-F：5’-GTGTCGCATCACGATTGCGA-3’和引物NAG-C1-R：5’-AAAACCAAAACTTCCATGGTTG-3’组成，用于扩增靶点NAG-C1。The primer pair NAG-C1 consisted of primer NAG-C1-F: 5’-GTGTCGCATCACGATTGCGA-3’ and primer NAG-C1-R: 5’-AAAACCAAAACTTCCATGGTTG-3’ and was used to amplify the target site NAG-C1.

引物对NAG-C2由引物NAG-C2-F：5’-TTTTGGTCGTTGCAGGGATGT-3’和引物NAG-C2-R：5’-GAACAACAAGATTAACCTAAGGCT-3’组成，用于扩增靶点NAG-C2。The primer pair NAG-C2 consists of primer NAG-C2-F: 5’-TTTTGGTCGTTGCAGGGATGT-3’ and primer NAG-C2-R: 5’-GAACAACAAGATTAACCTAAGGCT-3’, which is used to amplify the target site NAG-C2.

引物对NCA-C1由引物NCA-C1-F：5’-GGAGCTGGATGAGGTGCT-3’和引物NCA-C1-R：5’-GGAAGAAGAAAAGTAGGGAGA-3’组成，用于扩增靶点NCA-C1。The primer pair NCA-C1, consisting of primer NCA-C1-F: 5’-GGAGCTGGATGAGGTGCT-3’ and primer NCA-C1-R: 5’-GGAAGAAGAAAAGTAGGGAGA-3’, was used to amplify the target site NCA-C1.

引物对NCT-C1由引物NCT-C1-F：5’-TTATTAACAGTGCATTTAGCA-3’和引物NCT-C1-R：5’-TGTGGATGCAGAAAGCAACCTG-3’组成，用于扩增靶点NCT-C1。The primer pair NCT-C1, consisting of primer NCT-C1-F: 5’-TTATTAACAGTGCATTTAGCA-3’ and primer NCT-C1-R: 5’-TGTGGATGCAGAAAGCAACCTG-3’, was used to amplify the target site NCT-C1.

引物对NCC-C1由引物NCC-C1-F：5’-TAGTTGCCTCAAACAATAAAGACA-3’和引物NCC-C1-R：5’-CGGCGTCGGGACAGAGCTCCA-3’组成，用于扩增靶点NCC-C1。The primer pair NCC-C1, consisting of primer NCC-C1-F: 5’-TAGTTGCCTCAAACAATAAAGACA-3’ and primer NCC-C1-R: 5’-CGGCGTCGGGACAGAGCTCCA-3’, was used to amplify the target site NCC-C1.

引物对NCG-C1由引物NCG-C1-F：5’-CAATCCAAATTGTAATAAACTTCA-3’和引物NCG-C1-R：5’-CTGGTATCCCAAGCGTCCT-3’组成，用于扩增靶点NCG-C1。The primer pair NCG-C1, consisting of primer NCG-C1-F: 5’-CAATCCAAATTGTAATAAACTTCA-3’ and primer NCG-C1-R: 5’-CTGGTATCCCAAGCGTCCT-3’, was used to amplify the target site NCG-C1.

引物对NCG-C2由引物NCG-C2-F：5’-ACGACTGGCACACTGGCCCA-3’和引物NCG-C2-R：5’-GATTGCCTGAAATTTGTCACTC-3’组成，用于扩增靶点NCG-C2。The primer pair NCG-C2, consisting of primer NCG-C2-F: 5’-ACGACTGGCACACTGGCCCA-3’ and primer NCG-C2-R: 5’-GATTGCCTGAAATTTGTCACTC-3’, was used to amplify the target site NCG-C2.

引物对NTA-C1由引物NTA-C1-F：5’-GCAGCAGCGGTCGGTGCAGCG-3’和引物NTA-C1-R：5’-TGTGGATGCAGAAAGCAACCTG-3’组成，用于扩增靶点NTA-C1。The primer pair NTA-C1 consists of primer NTA-C1-F: 5’-GCAGCAGCGGTCGGTGCAGCG-3’ and primer NTA-C1-R: 5’-TGTGGATGCAGAAAGCAACCTG-3’, which is used to amplify the target site NTA-C1.

引物对NTT-C1由引物NTT-C1-F：5’-GGCTCAATCATGTTAGACA-3’和引物NTT-C1-R：5’-TTCTGGCTTTTGTACTTCACCG-3’组成，用于扩增靶点NTT-C1。The primer pair NTT-C1 consists of primer NTT-C1-F: 5’-GGCTCAATCATGTTAGACA-3’ and primer NTT-C1-R: 5’-TTCTGGCTTTTGTACTTCACCG-3’, which is used to amplify the target site NTT-C1.

引物对NTC-C1由引物NTC-C1-F：5’-ATTCCGTTGATGTTGCAAGCTT-3’和引物NTC-C1-R：5’-AGTCTCTAACAACAGTTATTACTT-3’组成，用于扩增靶点NTC-C1。The primer pair NTC-C1 consists of primer NTC-C1-F: 5’-ATTCCGTTGATGTTGCAAGCTT-3’ and primer NTC-C1-R: 5’-AGTCTCTAACAACAGTTATTACTT-3’, which is used to amplify the target site NTC-C1.

引物对NTG-C1由引物NTG-C1-F：5’-GGCTCAATCATGTTAGACA-3’和引物NTG-C1-R：5’-TTCTGGCTTTTGTACTTCACCG-3’组成，用于扩增靶点NTG-C1。The primer pair NTG-C1, consisting of primer NTG-C1-F: 5’-GGCTCAATCATGTTAGACA-3’ and primer NTG-C1-R: 5’-TTCTGGCTTTTGTACTTCACCG-3’, was used to amplify the target site NTG-C1.

引物对NTG-C2由引物NTG-C2-F：5’-GCCGCGACGGCCAAGACC-3’和引物NTG-C2-R：5’-AAGCCTCAATTTTCCCTGTC-3’组成，用于扩增靶点NTG-C2。The primer pair NTG-C2 consists of primer NTG-C2-F: 5’-GCCGCGACGGCCAAGACC-3’ and primer NTG-C2-R: 5’-AAGCCTCAATTTTCCCTGTC-3’, which is used to amplify the target site NTG-C2.

引物对NTG-C3由引物NTG-C3-F：5’-GAAGCAGGTGGTGCGGATGCT-3’和引物NTG-C3-R：5’-CGACGTACATACACGACGCG-3’组成，用于扩增靶点NTG-C3。The primer pair NTG-C3, consisting of primer NTG-C3-F: 5’-GAAGCAGGTGGTGCGGATGCT-3’ and primer NTG-C3-R: 5’-CGACGTACATACACGACGCG-3’, was used to amplify the target site NTG-C3.

以下实施例中，C·T碱基替换是指靶点序列中任何位置的C突变为T。In the following examples, C·T base substitution refers to the mutation of C to T at any position in the target sequence.

C·T碱基替换效率＝发生C·T碱基替换的阳性T0苗数/分析的总阳性T0苗数×100％。C·T base substitution efficiency = number of positive T0 seedlings with C·T base substitution/total number of positive T0 seedlings analyzed × 100%.

日本晴水稻：参考文献：梁卫红,王高华,杜京尧,等.硝普钠及其光解产物对日本晴水稻幼苗生长和5种激素标记基因表达的影响[J].河南师范大学学报(自然版),2017(2):48-52.；公众可以从北京市农林科学院获得。Nipponbare rice: References: Liang Weihong, Wang Gaohua, Du Jingyao, et al. Effects of sodium nitroprusside and its photolysis products on the growth of Nipponbare rice seedlings and the expression of five hormone marker genes [J]. Journal of Henan Normal University (Nature Edition), 2017(2):48-52.; Available to the public from Beijing Academy of Agricultural and Forestry Sciences.

恢复培养基：含有200mg/L特美汀的N6固体培养基。Recovery medium: N6 solid medium containing 200 mg/L Timentin.

筛选培养基：含有50mg/L潮霉素的N6固体培养基。Screening medium: N6 solid medium containing 50 mg/L hygromycin.

分化培养基：含有2mg/L KT、0.2mg/L NAA、0.5g/L谷氨酸、0.5g/L脯氨酸的N6固体培养基。Differentiation medium: N6 solid medium containing 2 mg/L KT, 0.2 mg/L NAA, 0.5 g/L glutamic acid, and 0.5 g/L proline.

生根培养基：含有0.2mg/L NAA、0.5g/L谷氨酸、0.5g/L脯氨酸的N6固体培养基。Rooting medium: N6 solid medium containing 0.2 mg/L NAA, 0.5 g/L glutamic acid, and 0.5 g/L proline.

实施例1、SpRYn-CBE碱基编辑系统载体的构建及其在水稻基因组中PAM序列为NAN，NCN或NTN的靶点进行碱基替换中的应用Example 1. Construction of the SpRYn-CBE base editing system vector and its application in base replacement of target sites with PAM sequences of NAN, NCN or NTN in the rice genome

一、SpRYn-CBE碱基编辑系统载体的构建1. Construction of the SpRYn-CBE base editing system vector

人工合成如下重组表达载体，各载体均为环状质粒：SpRYn-CBE-1重组表达载体，SpRYn-CBE-2重组表达载体，SpRYn-CBE-3重组表达载体，SpRYn-CBE-4重组表达载体，SpRYn-CBE-5重组表达载体，SpRYn-CBE-6重组表达载体，SpRYn-CBE-7重组表达载体，SpRYn-CBE-8重组表达载体，SpRYn-CBE-9重组表达载体，SpRYn-CBE-10重组表达载体，SpRYn-CBE-11重组表达载体，SpRYn-CBE-12重组表达载体，SpRYn-CBE-13重组表达载体，SpRYn-CBE-14重组表达载体，SpRYn-CBE-15重组表达载体和SpRYn-CBE-16重组表达载体。所有重组表达载体各元件结构示意图如图1所示。具体结构描述分别如下：The following recombinant expression vectors were artificially synthesized, and each vector is a circular plasmid: SpRYn-CBE-1 recombinant expression vector, SpRYn-CBE-2 recombinant expression vector, SpRYn-CBE-3 recombinant expression vector, SpRYn-CBE-4 recombinant expression vector, SpRYn-CBE-5 recombinant expression vector, SpRYn-CBE-6 recombinant expression vector, SpRYn-CBE-7 recombinant expression vector, SpRYn-CBE-8 recombinant expression vector, SpRYn-CBE-9 recombinant expression vector, SpRYn-CBE-10 recombinant expression vector, SpRYn-CBE-11 recombinant expression vector, SpRYn-CBE-12 recombinant expression vector, SpRYn-CBE-13 recombinant expression vector, SpRYn-CBE-14 recombinant expression vector, SpRYn-CBE-15 recombinant expression vector and SpRYn-CBE-16 recombinant expression vector. The schematic diagram of the structure of each element of all recombinant expression vectors is shown in Figure 1. The specific structural descriptions are as follows:

SpRYn-CBE-1重组表达载体的序列为序列表中的序列1。序列1的第131-596位为OsU6a启动子的核苷酸序列，第597-673位为tRNA的核苷酸序列，第674-693位为靶点NAA-C1的核苷酸序列，第694-779位为esgRNA骨架的核苷酸序列，第780-786位为PolyT序列；序列1的第787-1119位为OsU6b启动子的核苷酸序列，第1126-1202位为tRNA的核苷酸序列，第1203-1222位为靶点NAA-C3的核苷酸序列，第1223-1308位为esgRNA骨架的核苷酸序列，第1309-1320位为PolyT序列；序列1的第1327-3040位为OsUbq3启动子的核苷酸序列，第3167-7267位为SpRYn蛋白质的编码序列(不含有起始密码子和终止密码子)，编码序列2所示的SpRYn蛋白质；序列1的第7553-8176位为PmCDA1蛋白质的编码序列(不含有终止密码子)，编码序列3所示的PmCDA1蛋白质；序列1的第8210-8458位和第8471-8719位均为UGI蛋白质的编码序列(不含有终止密码子)，编码序列4所示的UGI蛋白质；序列1的第8762-8818位为P2A的编码序列；第8819-9844位为潮霉素磷酸转移酶的编码序列；第10184-10436位为Nos终止子的核苷酸序列。SpRYn-CBE-1重组表达载体含有的两个靶点NAA-C1和NAA-C3，序列见表1。The sequence of the SpRYn-CBE-1 recombinant expression vector is Sequence 1 in the sequence table. Sequence 1 has nucleotide sequences of OsU6a promoter from 131 to 596, tRNA from 597 to 673, target NAA-C1 from 674 to 693, esgRNA backbone from 694 to 779, and PolyT sequences from 780 to 786; Sequence 1 has nucleotide sequences of OsU6b promoter from 787 to 1119, tRNA from 1126 to 1202, esgRNA backbone from 1203 to 1222, target NAA-C3 from 1223 to 1308, and PolyT sequences from 1309 to 1320; Sequence 1 has nucleotide sequences of OsUbq3 promoter from 1327 to 3040 The nucleotide sequence of the ...

SpRYn-CBE-2重组表达载体的序列为将SpRYn-CBE-1重组表达载体序列中NAA-C1靶点序列替换为NAA-C2靶点序列，NAA-C3靶点序列替换为NAA-C4靶点序列，且保持其他序列不变后得到的序列。NAA-C2靶点序列和NAA-C4靶点序列见表1。The sequence of the SpRYn-CBE-2 recombinant expression vector is obtained by replacing the NAA-C1 target sequence in the SpRYn-CBE-1 recombinant expression vector sequence with the NAA-C2 target sequence, replacing the NAA-C3 target sequence with the NAA-C4 target sequence, and keeping the other sequences unchanged. The NAA-C2 target sequence and the NAA-C4 target sequence are shown in Table 1.

SpRYn-CBE-3重组表达载体的序列为将SpRYn-CBE-1重组表达载体序列中NAA-C1靶点序列替换为NAA-C5靶点序列，NAA-C3靶点序列替换为NAA-C6靶点序列，且保持其他序列不变后得到的序列。NAA-C5靶点序列和NAA-C6靶点序列见表1。The sequence of the SpRYn-CBE-3 recombinant expression vector is obtained by replacing the NAA-C1 target sequence in the SpRYn-CBE-1 recombinant expression vector sequence with the NAA-C5 target sequence, replacing the NAA-C3 target sequence with the NAA-C6 target sequence, and keeping other sequences unchanged. The NAA-C5 target sequence and the NAA-C6 target sequence are shown in Table 1.

SpRYn-CBE-4重组表达载体的序列为将SpRYn-CBE-1重组表达载体序列中NAA-C1靶点序列替换为NAA-C7靶点序列，NAA-C3靶点序列替换为NAA-C8靶点序列，且保持其他序列不变后得到的序列。NAA-C7靶点序列和NAA-C8靶点序列见表1。The sequence of the SpRYn-CBE-4 recombinant expression vector is obtained by replacing the NAA-C1 target sequence in the SpRYn-CBE-1 recombinant expression vector sequence with the NAA-C7 target sequence, replacing the NAA-C3 target sequence with the NAA-C8 target sequence, and keeping the other sequences unchanged. The NAA-C7 target sequence and the NAA-C8 target sequence are shown in Table 1.

SpRYn-CBE-5重组表达载体的序列为将SpRYn-CBE-1重组表达载体序列中NAA-C1靶点序列替换为NAT-C1靶点序列，NAA-C3靶点序列替换为NAT-C4靶点序列，且保持其他序列不变后得到的序列。NAT-C1靶点序列和NAT-C4靶点序列见表1。The sequence of the SpRYn-CBE-5 recombinant expression vector is obtained by replacing the NAA-C1 target sequence in the SpRYn-CBE-1 recombinant expression vector sequence with the NAT-C1 target sequence, replacing the NAA-C3 target sequence with the NAT-C4 target sequence, and keeping the other sequences unchanged. The NAT-C1 target sequence and the NAT-C4 target sequence are shown in Table 1.

SpRYn-CBE-6重组表达载体的序列为将SpRYn-CBE-1重组表达载体序列中NAA-C1靶点序列替换为NAT-C2靶点序列，NAA-C3靶点序列替换为NAT-C3靶点序列，且保持其他序列不变后得到的序列。NAT-C2靶点序列和NAT-C3靶点序列见表1。The sequence of the SpRYn-CBE-6 recombinant expression vector is obtained by replacing the NAA-C1 target sequence in the SpRYn-CBE-1 recombinant expression vector sequence with the NAT-C2 target sequence, replacing the NAA-C3 target sequence with the NAT-C3 target sequence, and keeping the other sequences unchanged. The NAT-C2 target sequence and the NAT-C3 target sequence are shown in Table 1.

SpRYn-CBE-7重组表达载体的序列为将SpRYn-CBE-1重组表达载体序列中NAA-C1靶点序列替换为NAT-C5靶点序列，NAA-C3靶点序列替换为NAT-C8靶点序列，且保持其他序列不变后得到的序列。NAT-C5靶点序列和NAT-C8靶点序列见表1。The sequence of the SpRYn-CBE-7 recombinant expression vector is obtained by replacing the NAA-C1 target sequence in the SpRYn-CBE-1 recombinant expression vector sequence with the NAT-C5 target sequence, replacing the NAA-C3 target sequence with the NAT-C8 target sequence, and keeping the other sequences unchanged. The NAT-C5 target sequence and the NAT-C8 target sequence are shown in Table 1.

SpRYn-CBE-8重组表达载体的序列为将SpRYn-CBE-1重组表达载体序列中NAA-C1靶点序列替换为NAT-C6靶点序列，NAA-C3靶点序列替换为NAT-C7靶点序列，且保持其他序列不变后得到的序列。NAT-C6靶点序列和NAT-C7靶点序列见表1。The sequence of the SpRYn-CBE-8 recombinant expression vector is obtained by replacing the NAA-C1 target sequence in the SpRYn-CBE-1 recombinant expression vector sequence with the NAT-C6 target sequence, replacing the NAA-C3 target sequence with the NAT-C7 target sequence, and keeping the other sequences unchanged. The NAT-C6 target sequence and the NAT-C7 target sequence are shown in Table 1.

SpRYn-CBE-9重组表达载体的序列为将SpRYn-CBE-1重组表达载体序列中NAA-C1靶点序列替换为NAC-C1靶点序列，NAA-C3靶点序列替换为NAC-C2靶点序列，且保持其他序列不变后得到的序列。NAC-C1靶点序列和NAC-C2靶点序列见表1。The sequence of the SpRYn-CBE-9 recombinant expression vector is obtained by replacing the NAA-C1 target sequence in the SpRYn-CBE-1 recombinant expression vector sequence with the NAC-C1 target sequence, replacing the NAA-C3 target sequence with the NAC-C2 target sequence, and keeping the other sequences unchanged. The NAC-C1 target sequence and the NAC-C2 target sequence are shown in Table 1.

SpRYn-CBE-10重组表达载体的序列为将SpRYn-CBE-1重组表达载体序列中NAA-C1靶点序列替换为NAC-C3靶点序列，NAA-C3靶点序列替换为NAC-C4靶点序列，且保持其他序列不变后得到的序列。NAC-C3靶点序列和NAC-C4靶点序列见表1。The sequence of the SpRYn-CBE-10 recombinant expression vector is obtained by replacing the NAA-C1 target sequence in the SpRYn-CBE-1 recombinant expression vector sequence with the NAC-C3 target sequence, and replacing the NAA-C3 target sequence with the NAC-C4 target sequence, while keeping other sequences unchanged. The NAC-C3 target sequence and the NAC-C4 target sequence are shown in Table 1.

SpRYn-CBE-11重组表达载体的序列为将SpRYn-CBE-1重组表达载体序列中NAA-C1靶点序列替换为NAG-C1靶点序列，NAA-C3靶点序列替换为NAG-C2靶点序列，且保持其他序列不变后得到的序列。NAG-C1靶点序列和NAG-C2靶点序列见表1。The sequence of the SpRYn-CBE-11 recombinant expression vector is obtained by replacing the NAA-C1 target sequence in the SpRYn-CBE-1 recombinant expression vector sequence with the NAG-C1 target sequence, replacing the NAA-C3 target sequence with the NAG-C2 target sequence, and keeping other sequences unchanged. The NAG-C1 target sequence and the NAG-C2 target sequence are shown in Table 1.

SpRYn-CBE-12重组表达载体的序列为将SpRYn-CBE-1重组表达载体序列中NAA-C1靶点序列替换为NCA-C1靶点序列，NAA-C3靶点序列替换为NCC-C1靶点序列，且保持其他序列不变后得到的序列。NCA-C1靶点序列和NCC-C1靶点序列见表1。The sequence of the SpRYn-CBE-12 recombinant expression vector is obtained by replacing the NAA-C1 target sequence in the SpRYn-CBE-1 recombinant expression vector sequence with the NCA-C1 target sequence, replacing the NAA-C3 target sequence with the NCC-C1 target sequence, and keeping other sequences unchanged. The NCA-C1 target sequence and the NCC-C1 target sequence are shown in Table 1.

SpRYn-CBE-13重组表达载体的序列为将序列表中序列1的第131-1320位的序列替换为序列表中的序列5，且保持其他序列不变后得到的序列。序列5的第1-466位为OsU6a启动子的核苷酸序列，第467-543位为tRNA的核苷酸序列，第544-563位为靶点NCT-C1的核苷酸序列，第564-649位为esgRNA骨架的核苷酸序列，第650-656位为PolyT序列；序列5的第657-989位为OsU6b启动子的核苷酸序列，第996-1072位为tRNA的核苷酸序列，第1073-1092位为靶点NCG-C1的核苷酸序列，第1093-1178位为esgRNA骨架的核苷酸序列，第1179-1185位为PolyT序列；序列5的第1186-1927位为OsU6c启动子的核苷酸序列，第1934-2010位为tRNA的核苷酸序列，第2011-2030位为靶点NCG-C2的核苷酸序列，第2031-2116位为esgRNA骨架的核苷酸序列，第2117-2128位为PolyT序列。NCT-C1靶点序列、NCG-C1靶点序列和NCG-C2靶点序列见表1。The sequence of the SpRYn-CBE-13 recombinant expression vector is obtained by replacing the sequence 1 in the sequence list with the sequence 5 in the sequence list at positions 131-1320, and keeping the other sequences unchanged. Positions 1-466 of sequence 5 are the nucleotide sequence of the OsU6a promoter, positions 467-543 are the nucleotide sequence of tRNA, positions 544-563 are the nucleotide sequence of the target NCT-C1, positions 564-649 are the nucleotide sequence of the esgRNA backbone, and positions 650-656 are the PolyT sequence; positions 657-989 of sequence 5 are the nucleotide sequence of the OsU6b promoter, positions 996-1072 are the nucleotide sequence of tRNA, and positions 1073-1092 are the target NCG -C1, the nucleotide sequence of the esgRNA backbone at positions 1093-1178, and the PolyT sequence at positions 1179-1185; the nucleotide sequence of the OsU6c promoter at positions 1186-1927 of sequence 5, the nucleotide sequence of the tRNA at positions 1934-2010, the nucleotide sequence of the target NCG-C2 at positions 2011-2030, the nucleotide sequence of the esgRNA backbone at positions 2031-2116, and the PolyT sequence at positions 2117-2128. The target sequences of NCT-C1, NCG-C1, and NCG-C2 are shown in Table 1.

SpRYn-CBE-14重组表达载体的序列为将SpRYn-CBE-1重组表达载体序列中NAA-C1靶点序列替换为NTT-C1靶点序列，NAA-C3靶点序列替换为NTC-C1靶点序列，且保持其他序列不变后得到的序列。NTT-C1靶点序列和NTC-C1靶点序列见表1。The sequence of the SpRYn-CBE-14 recombinant expression vector is obtained by replacing the NAA-C1 target sequence in the SpRYn-CBE-1 recombinant expression vector sequence with the NTT-C1 target sequence, replacing the NAA-C3 target sequence with the NTC-C1 target sequence, and keeping the other sequences unchanged. The NTT-C1 target sequence and the NTC-C1 target sequence are shown in Table 1.

SpRYn-CBE-15重组表达载体的序列为将序列表中序列1的第131-1320位的序列替换为序列表中的序列6，且保持其他序列不变后得到的序列。序列6的第1-466位为OsU6a启动子的核苷酸序列，第467-543位为tRNA的核苷酸序列，第544-563位为靶点NTA-C1的核苷酸序列，第564-649位为esgRNA骨架的核苷酸序列，第650-661位为PolyT序列。NTA-C1靶点序列见表1。The sequence of the SpRYn-CBE-15 recombinant expression vector is obtained by replacing the sequence 131-1320 in the sequence list with the sequence 6 in the sequence list, and keeping the other sequences unchanged. The 1st to 466th positions of sequence 6 are the nucleotide sequence of the OsU6a promoter, the 467th to 543rd positions are the nucleotide sequence of tRNA, the 544th to 563rd positions are the nucleotide sequence of the target NTA-C1, the 564th to 649th positions are the nucleotide sequence of the esgRNA backbone, and the 650th to 661st positions are the PolyT sequence. The NTA-C1 target sequence is shown in Table 1.

SpRYn-CBE-16重组表达载体的序列为将SpRYn-CBE-13重组表达载体序列中NCT-C1靶点序列替换为NTG-C1靶点序列，NCG-C1靶点序列替换为NTG-C2靶点序列，NCG-C2靶点序列替换为NTG-C3靶点序列，且保持其他序列不变后得到的序列。NTG-C1靶点序列、NTG-C2靶点序列和NTG-C3靶点序列见表1。The sequence of the SpRYn-CBE-16 recombinant expression vector is obtained by replacing the NCT-C1 target sequence in the SpRYn-CBE-13 recombinant expression vector sequence with the NTG-C1 target sequence, the NCG-C1 target sequence with the NTG-C2 target sequence, the NCG-C2 target sequence with the NTG-C3 target sequence, and keeping other sequences unchanged. The NTG-C1 target sequence, the NTG-C2 target sequence, and the NTG-C3 target sequence are shown in Table 1.

各载体的esgRNA的靶点核苷酸序列及相应的PAM序列如表1所示。The target nucleotide sequences of esgRNAs of each vector and the corresponding PAM sequences are shown in Table 1.

表1、各载体的esgRNA的靶点核苷酸序列及相应的PAM序列Table 1. Target nucleotide sequences and corresponding PAM sequences of esgRNAs in each vector

二、水稻植株中对靶点进行碱基编辑2. Base editing of target sites in rice plants

将步骤一获得的SpRYn-CBE-1重组表达载体，SpRYn-CBE-2重组表达载体，SpRYn-CBE-3重组表达载体，SpRYn-CBE-4重组表达载体，SpRYn-CBE-5重组表达载体，SpRYn-CBE-6重组表达载体，SpRYn-CBE-7重组表达载体，SpRYn-CBE-8重组表达载体，SpRYn-CBE-9重组表达载体，SpRYn-CBE-10重组表达载体，SpRYn-CBE-11重组表达载体，SpRYn-CBE-12重组表达载体，SpRYn-CBE-13重组表达载体，SpRYn-CBE-14重组表达载体，SpRYn-CBE-15重组表达载体和SpRYn-CBE-16重组表达载体分别按照如下步骤1-11进行操作：The SpRYn-CBE-1 recombinant expression vector, SpRYn-CBE-2 recombinant expression vector, SpRYn-CBE-3 recombinant expression vector, SpRYn-CBE-4 recombinant expression vector, SpRYn-CBE-5 recombinant expression vector, SpRYn-CBE-6 recombinant expression vector, SpRYn-CBE-7 recombinant expression vector, SpRYn-CBE-8 recombinant expression vector, SpRYn-CBE-9 recombinant expression vector, SpRYn-CBE-10 recombinant expression vector, SpRYn-CBE-11 recombinant expression vector, SpRYn-CBE-12 recombinant expression vector, SpRYn-CBE-13 recombinant expression vector, SpRYn-CBE-14 recombinant expression vector, SpRYn-CBE-15 recombinant expression vector and SpRYn-CBE-16 recombinant expression vector obtained in step 1 are respectively operated according to the following steps 1-11:

1、将载体导入农杆菌EHA105(上海唯地生物技术有限公司的产品，CAT#:AC1010)，得到重组农杆菌。1. Introduce the vector into Agrobacterium EHA105 (product of Shanghai Weidi Biotechnology Co., Ltd., CAT#: AC1010) to obtain recombinant Agrobacterium.

2、采用培养基(含50μg/ml卡那霉素和25μg/ml利福平的YEP培养基)培养重组农杆菌，28℃，150rpm震荡培养至OD₆₀₀为1.0-2.0，室温条件下，10000rpm离心1min，用侵染液(将N6液体培养基中的糖替换为葡萄糖和蔗糖，葡萄糖和蔗糖在侵染液中的浓度分别为10g/L和20g/L)重悬菌体并稀释至OD₆₀₀为0.2，得到农杆菌侵染液。2. The recombinant Agrobacterium was cultured in a culture medium (YEP medium containing 50 μg/ml kanamycin and 25 μg/ml rifampicin) at 28°C and 150 rpm with shaking until the _OD600 was 1.0-2.0. The culture was centrifuged at 10,000 rpm for 1 min at room temperature. The cells were resuspended in infection solution (the sugar in the N6 liquid culture medium was replaced with glucose and sucrose, and the concentrations of glucose and sucrose in the infection solution were 10 g/L and 20 g/L, respectively) and diluted to an _OD600 of 0.2 to obtain the Agrobacterium infection solution.

3、水稻品种日本晴成熟种子去壳脱粒，置于100mL三角瓶中，加入70％(v/v)乙醇水溶液浸泡30sec，再置于25％(v/v)次氯酸钠水溶液中，120rpm震荡灭菌30min，无菌水冲洗3次，用滤纸吸干水分，然后将种子胚朝下置于N6固体培养基上，28℃暗培养4-6周，得到水稻愈伤。3. Mature seeds of the rice variety Nipponbare were shelled and threshed, placed in a 100 mL conical flask, added with a 70% (v/v) ethanol aqueous solution and soaked for 30 seconds, then placed in a 25% (v/v) sodium hypochlorite aqueous solution, shaken and sterilized at 120 rpm for 30 minutes, rinsed with sterile water three times, and dried with filter paper. Then, the seeds were placed with the embryo facing downward on an N6 solid culture medium, and cultured in the dark at 28° C. for 4-6 weeks to obtain rice callus.

4、完成步骤3后，将水稻愈伤浸泡置于农杆菌侵染液甲(农杆菌侵染液甲为向农杆菌侵染液中加入乙酰丁香酮得到的液体，乙酰丁香酮的添加量满足乙酰丁香酮与农杆菌侵染液的体积比为25μl：50ml)中浸泡10min，然后，放在铺有两层灭菌滤纸的培养皿(内含约200ml不含农杆菌的侵染液)上，21℃暗培养1天。4. After completing step 3, soak the rice callus in Agrobacterium infection solution A (Agrobacterium infection solution A is a liquid obtained by adding acetosyringone to the Agrobacterium infection solution, and the amount of acetosyringone added satisfies the volume ratio of acetosyringone to the Agrobacterium infection solution of 25 μl:50 ml) for 10 minutes, then place it on a culture dish (containing about 200 ml of infection solution without Agrobacterium) covered with two layers of sterilized filter paper, and culture it in the dark at 21°C for 1 day.

5、取步骤4得到的水稻愈伤放入恢复培养基上，25-28℃暗培养3天。5. Place the rice callus obtained in step 4 on a recovery medium and culture it in the dark at 25-28°C for 3 days.

6、取步骤5得到的水稻愈伤，置于筛选培养基上，28℃暗培养2周。6. Take the rice callus obtained in step 5, place it on the screening medium, and culture it in the dark at 28°C for 2 weeks.

7、取步骤6得到的水稻愈伤，再次置于筛选培养基上，28℃暗培养2周，得到水稻抗性愈伤。7. Take the rice callus obtained in step 6, place it on the screening medium again, and culture it in the dark at 28° C. for 2 weeks to obtain rice resistant callus.

8、取步骤7得到的水稻抗性愈伤放入分化培养基上，25℃光照培养1个月左右，将分化出来的小苗移至生根培养基上，25℃光照培养2周，获取水稻T0苗。8. Take the rice resistant callus obtained in step 7 and place it on a differentiation medium, culture it under light at 25°C for about 1 month, transfer the differentiated seedlings to a rooting medium, culture them under light at 25°C for 2 weeks, and obtain rice T0 seedlings.

9、提取水稻T0苗的基因组DNA并以其作为模板，采用引物F(5’-ttattgccactagttcattctacttat-3’)和引物R(5’-ggggtacttctcgtggtagg-3’)组成的引物对进行PCR扩增，得到PCR扩增产物；将该PCR扩增产物进行琼脂糖凝胶电泳，然后进行如下判断：如果PCR扩增产物中含有约729bp的DNA片段，则相应的水稻T0苗为水稻阳性T0苗；如果PCR扩增产物中不含有约729bp的DNA片段，则相应的水稻T0苗不为水稻阳性T0苗。9. Extract the genomic DNA of rice T0 seedlings and use it as a template, and use a primer pair consisting of primer F (5'-ttattgccactagttcattctacttat-3') and primer R (5'-ggggtacttctcgtggtagg-3') to perform PCR amplification to obtain a PCR amplification product; subject the PCR amplification product to agarose gel electrophoresis, and then make the following judgment: if the PCR amplification product contains a DNA fragment of about 729 bp, the corresponding rice T0 seedling is a rice-positive T0 seedling; if the PCR amplification product does not contain a DNA fragment of about 729 bp, the corresponding rice T0 seedling is not a rice-positive T0 seedling.

10、各载体分别取步骤9所获得的水稻阳性T0苗的基因组DNA作为模板，对于NAA-C1靶点，采用引物对NAA-C1进行PCR扩增，得到PCR扩增产物；对于NAA-C2靶点，采用引物对NAA-C2进行PCR扩增，得到PCR扩增产物；对于NAA-C3靶点，采用引物对NAA-C3进行PCR扩增，得到PCR扩增产物；对于NAA-C4靶点，采用引物对NAA-C4进行PCR扩增，得到PCR扩增产物；对于NAA-C5靶点，采用引物对NAA-C5进行PCR扩增，得到PCR扩增产物；对于NAA-C6靶点，采用引物对NAA-C6进行PCR扩增，得到PCR扩增产物；对于NAA-C7靶点，采用引物对NAA-C7进行PCR扩增，得到PCR扩增产物；对于NAA-C8靶点，采用引物对NAA-C8进行PCR扩增，得到PCR扩增产物；对于NAT-C1靶点，采用引物对NAT-C1进行PCR扩增，得到PCR扩增产物；对于NAT-C2靶点，采用引物对NAT-C2进行PCR扩增，得到PCR扩增产物；对于NAT-C3靶点，采用引物对NAT-C3进行PCR扩增，得到PCR扩增产物；对于NAT-C4靶点，采用引物对NAT-C4进行PCR扩增，得到PCR扩增产物；对于NAT-C5靶点，采用引物对NAT-C5进行PCR扩增，得到PCR扩增产物；对于NAT-C6靶点，采用引物对NAT-C6进行PCR扩增，得到PCR扩增产物；对于NAT-C7靶点，采用引物对NAT-C7进行PCR扩增，得到PCR扩增产物；对于NAT-C8靶点，采用引物对NAT-C8进行PCR扩增，得到PCR扩增产物；对于NAC-C1靶点，采用引物对NAC-C1进行PCR扩增，得到PCR扩增产物；对于NAC-C2靶点，采用引物对NAC-C2进行PCR扩增，得到PCR扩增产物；对于NAC-C3靶点，采用引物对NAC-C3进行PCR扩增，得到PCR扩增产物；对于NAC-C4靶点，采用引物对NAC-C4进行PCR扩增，得到PCR扩增产物；对于NAG-C1靶点，采用引物对NAG-C1进行PCR扩增，得到PCR扩增产物；对于NAG-C2靶点，采用引物对NAG-C2进行PCR扩增，得到PCR扩增产物；对于NCA-C1靶点，采用引物对NCA-C1进行PCR扩增，得到PCR扩增产物；对于NCT-C1靶点，采用引物对NCT-C1进行PCR扩增，得到PCR扩增产物；对于NCC-C1靶点，采用引物对NCC-C1进行PCR扩增，得到PCR扩增产物；对于NCG-C1靶点，采用引物对NCG-C1进行PCR扩增，得到PCR扩增产物；对于NCG-C2靶点，采用引物对NCG-C2进行PCR扩增，得到PCR扩增产物；对于NTA-C1靶点，采用引物对NTA-C1进行PCR扩增，得到PCR扩增产物；对于NTT-C1靶点，采用引物对NTT-C1进行PCR扩增，得到PCR扩增产物；对于NTC-C1靶点，采用引物对NTC-C1进行PCR扩增，得到PCR扩增产物；对于NTG-C1靶点，采用引物对NTG-C1进行PCR扩增，得到PCR扩增产物；对于NTG-C2靶点，采用引物对NTG-C2进行PCR扩增，得到PCR扩增产物；对于NTG-C3靶点，采用引物对NTG-C3进行PCR扩增，得到PCR扩增产物。10. Each vector takes the genomic DNA of the rice positive T0 seedling obtained in step 9 as a template, and for the NAA-C1 target site, uses the primer pair NAA-C1 to perform PCR amplification to obtain a PCR amplification product; for the NAA-C2 target site, uses the primer pair NAA-C2 to perform PCR amplification to obtain a PCR amplification product; for the NAA-C3 target site, uses the primer pair NAA-C3 to perform PCR amplification to obtain a PCR amplification product; for the NAA-C4 target site, uses the primer pair NAA-C4 to perform PCR amplification to obtain a PCR amplification product; for the NAA-C5 target site, uses the primer pair NAA-C5 to perform PCR amplification to obtain a PCR amplification product; for the NAA-C6 target site, uses the primer pair NAA-C6 to perform PCR amplification to obtain a PCR amplification product; for the NAA-C7 target site, uses the primer pair NAA-C7 to perform PCR amplification to obtain a PCR amplification product; for the NAA-C8 target site, uses the primer pair NAA-C8 to perform PCR amplification to obtain a PCR amplification product. PCR amplification was performed to obtain a PCR amplification product; for the NAT-C1 target, PCR amplification was performed using primer pair NAT-C1 to obtain a PCR amplification product; for the NAT-C2 target, PCR amplification was performed using primer pair NAT-C2 to obtain a PCR amplification product; for the NAT-C3 target, PCR amplification was performed using primer pair NAT-C3 to obtain a PCR amplification product; for the NAT-C4 target, PCR amplification was performed using primer pair NAT-C4 to obtain a PCR amplification product; for the NAT-C5 target, PCR amplification was performed using primer pair NAT-C5 to obtain a PCR amplification product; for the NAT-C6 target, PCR amplification was performed using primer pair NAT-C6 to obtain a PCR amplification product; for the NAT-C7 target, PCR amplification was performed using primer pair NAT-C7 to obtain a PCR amplification product; for the NAT-C8 target, PCR amplification was performed using primer pair NAT-C8 to obtain a PCR amplification product; For the NAC-C1 target, PCR amplification was performed using primer pair NAC-C1 to obtain a PCR amplification product; for the NAC-C2 target, PCR amplification was performed using primer pair NAC-C2 to obtain a PCR amplification product; for the NAC-C3 target, PCR amplification was performed using primer pair NAC-C3 to obtain a PCR amplification product; for the NAC-C4 target, PCR amplification was performed using primer pair NAC-C4 to obtain a PCR amplification product; for the NAG-C1 target, PCR amplification was performed using primer pair NAG-C1 to obtain a PCR amplification product; for the NAG-C2 target, PCR amplification was performed using primer pair NAG-C2 to obtain a PCR amplification product; for the NCA-C1 target, PCR amplification was performed using primer pair NCA-C1 to obtain a PCR amplification product; for the NCT-C1 target, PCR amplification was performed using primer pair NCT-C1 to obtain a PCR amplification product; for the NCC-C1 target, PCR amplification was performed using primer pair NCC-C 1 is used for PCR amplification to obtain a PCR amplification product; for the NCG-C1 target, the primer pair NCG-C1 is used for PCR amplification to obtain a PCR amplification product; for the NCG-C2 target, the primer pair NCG-C2 is used for PCR amplification to obtain a PCR amplification product; for the NTA-C1 target, the primer pair NTA-C1 is used for PCR amplification to obtain a PCR amplification product; for the NTT-C1 target, the primer pair NTT-C1 is used for PCR amplification to obtain a PCR amplification product; for the NTC-C1 target, the primer pair NTC-C1 is used for PCR amplification to obtain a PCR amplification product; for the NTG-C1 target, the primer pair NTG-C1 is used for PCR amplification to obtain a PCR amplification product; for the NTG-C2 target, the primer pair NTG-C2 is used for PCR amplification to obtain a PCR amplification product; for the NTG-C3 target, the primer pair NTG-C3 is used for PCR amplification to obtain a PCR amplification product.

11、将步骤10得到的PCR扩增产物进行Sanger测序及分析。测序结果只针对各靶点区进行分析。分别统计各靶点发生C·T碱基替换的阳性T0苗数，计算得出C·T碱基替换效率，结果见表2。11. The PCR amplification products obtained in step 10 were subjected to Sanger sequencing and analysis. The sequencing results were analyzed only for each target region. The number of positive T0 seedlings with C·T base substitution at each target was counted, and the C·T base substitution efficiency was calculated. The results are shown in Table 2.

结果表明，SpRYn-CBE碱基编辑系统，除了不能对PAM序列为NAA的八个靶点(NAA-C1、NAA-C2、NAA-C3、NAA-C4、NAA-C5、NAA-C6、NAA-C7和NAA-C8)、PAM序列为NAT的八个靶点(NAT-C1、NAT-C2、NAT-C3、NAT-C4、NAT-C5、NAT-C6、NAT-C7和NAT-C8)和PAM序列为NTG的三个靶点(NTG-C1、NTG-C2和NTG-C3)编辑外，对其他PAM序列的靶点均能够有效编辑，得到C·T碱基替换的T0苗，碱基编辑效率为7.5％-70％。由此表明SpRYn-CBE碱基编辑系统可以对水稻基因组中PAM序列为NAC，NAG，NCA，NCT，NCC，NCG，NTA，NTT和NTC的靶点序列进行碱基编辑，实现C·T碱基替换，大大拓展了碱基编辑范围。The results showed that the SpRYn-CBE base editing system could effectively edit the targets with PAM sequences, except for the eight targets with PAM sequence NAA (NAA-C1, NAA-C2, NAA-C3, NAA-C4, NAA-C5, NAA-C6, NAA-C7 and NAA-C8), the eight targets with PAM sequence NAT (NAT-C1, NAT-C2, NAT-C3, NAT-C4, NAT-C5, NAT-C6, NAT-C7 and NAT-C8), and the three targets with PAM sequence NTG (NTG-C1, NTG-C2 and NTG-C3), to obtain T0 seedlings with C·T base substitution, with a base editing efficiency of 7.5%-70%. This shows that the SpRYn-CBE base editing system can base edit target sequences with PAM sequences of NAC, NAG, NCA, NCT, NCC, NCG, NTA, NTT and NTC in the rice genome, achieving C·T base replacement, greatly expanding the scope of base editing.

表2、C·T碱基替换效率Table 2. C·T base substitution efficiency

以上对本发明进行了详述。对于本领域技术人员来说，在不脱离本发明的宗旨和范围，以及无需进行不必要的实验情况下，可在等同参数、浓度和条件下，在较宽范围内实施本发明。虽然本发明给出了特殊的实施例，应该理解为，可以对本发明作进一步的改进。总之，按本发明的原理，本申请欲包括任何变更、用途或对本发明的改进，包括脱离了本申请中已公开范围，而用本领域已知的常规技术进行的改变。按以下附带的权利要求的范围，可以进行一些基本特征的应用。The present invention has been described in detail above. It will be apparent to those skilled in the art that the present invention may be implemented in a wide range under equivalent parameters, concentrations and conditions without departing from the spirit and scope of the present invention and without unnecessary experimentation. Although the present invention provides specific embodiments, it should be understood that further improvements may be made to the present invention. In short, according to the principles of the present invention, this application is intended to include any changes, uses or improvements to the present invention, including changes made by conventional techniques known in the art that depart from the scope disclosed in this application. Applications of some of the basic features may be made within the scope of the following appended claims.

序列表Sequence Listing

<110> 北京市农林科学院<110> Beijing Academy of Agricultural and Forestry Sciences

<120> SpRYn-CBE碱基编辑系统在植物基因组碱基替换中的应用<120> Application of SpRYn-CBE base editing system in plant genome base replacement

<160> 6<160> 6

<170> PatentIn version 3.5<170> PatentIn version 3.5

<210> 1<210> 1

<211> 16842<211> 16842

<212> DNA<212> DNA

<213> Artificial Sequence<213> Artificial Sequence

<400> 1<400> 1

ggtggcagga tatattgtgg tgtaaacatg gcactagcct caccgtcttc gcagacgagg 60ggtggcagga tatattgtgg tgtaaacatg gcactagcct caccgtcttc gcagacgagg 60

ccgctaagtc gcagctacgc tctcaacggc actgactagg tagtttaaac gtgcacttaa 120ccgctaagtc gcagctacgc tctcaacggc actgactagg tagtttaaac gtgcacttaa 120

ttaaggtacc tggaatcggc agcaaaggat tttttcctgt agttttccca caaccatttt 180ttaaggtacc tggaatcggc agcaaaggat tttttcctgt agttttccca caaccatttt 180

ttaccatccg aatgatagga taggaaaaat atccaagtga acagtattcc tataaaattc 240ttaccatccg aatgatagga taggaaaaat atccaagtga acagtattcc tataaaattc 240

ccgtaaaaag cctgcaatcc gaatgagccc tgaagtctga actagccggt cacctgtaca 300ccgtaaaaag cctgcaatcc gaatgagccc tgaagtctga actagccggt cacctgtaca 300

ggctatcgag atgccataca agagacggta gtaggaacta ggaagacgat ggttgattcg 360ggctatcgag atgccataca agagacggta gtaggaacta ggaagacgat ggttgattcg 360

tcaggcgaaa tcgtcgtcct gcagtcgcat ctatgggcct ggacggaata ggggaaaaag 420tcaggcgaaa tcgtcgtcct gcagtcgcat ctatgggcct ggacggaata ggggaaaaag 420

ttggccggat aggagggaaa ggcccaggtg cttacgtgcg aggtaggcct gggctctcag 480ttggccggat aggagggaaa ggcccaggtg cttacgtgcg aggtaggcct gggctctcag 480

cacttcgatt cgttggcacc ggggtaggat gcaatagaga gcaacgttta gtaccacctc 540cacttcgatt cgttggcacc ggggtaggat gcaatagaga gcaacgttta gtaccacctc 540

gcttagctag agcaaactgg actgccttat atgcgcgggt gctggcttgg ctgccgaaca 600gcttagctag agcaaactgg actgccttat atgcgcgggt gctggcttgg ctgccgaaca 600

aagcaccagt ggtctagtgg tagaatagta ccctgccacg gtacagaccc gggttcgatt 660aagcaccagt ggtctagtgg tagaatagta ccctgccacg gtacagaccc gggttcgatt 660

cccggctggt gcaccgtcct gaggggatgt tcagtttcag agctatgctg gaaacagcat 720cccggctggt gcaccgtcct gaggggatgt tcagtttcag agctatgctg gaaacagcat 720

agcaagttga aataaggcta gtccgttatc aacttgaaaa agtggcaccg agtcggtgct 780agcaagttga aataaggcta gtccgttatc aacttgaaaa agtggcaccg agtcggtgct 780

tttttttgca agaacgaact aagccggaca aaaaaaaaag gagcacatat acaaaccggt 840tttttttgca agaacgaact aagccggaca aaaaaaaaag gagcacatat acaaaccggt 840

tttattcatg aatggtcacg atggatgatg gggctcagac ttgagctacg aggccgcagg 900tttattcatg aatggtcacg atggatgatg gggctcagac ttgagctacg aggccgcagg 900

cgagagaagc ctagtgtgct ctctgcttgt ttgggccgta acggaggata cggccgacga 960cgagagaagc ctagtgtgct ctctgcttgt ttgggccgta acggaggata cggccgacga 960

gcgtgtacta ccgcgcggga tgccgctggg cgctgcgggg gccgttggat ggggatcggt 1020gcgtgtacta ccgcgcggga tgccgctggg cgctgcgggg gccgttggat ggggatcggt 1020

gggtcgcggg agcgttgagg ggagacaggt ttagtaccac ctcgcctacc gaacaatgaa 1080gggtcgcggg agcgttgagg ggagacaggt ttagtaccac ctcgcctacc gaacaatgaa 1080

gaacccacct tataaccccg cgcgctgccg cttgtgttgg gatccaacaa agcaccagtg 1140gaacccacct tataaccccg cgcgctgccg cttgtgttgg gatccaacaa agcaccagtg 1140

gtctagtggt agaatagtac cctgccacgg tacagacccg ggttcgattc ccggctggtg 1200gtctagtggt agaatagtac cctgccacgg tacagacccg ggttcgattc ccggctggtg 1200

cacccttcat gagatatatg atgtttcaga gctatgctgg aaacagcata gcaagttgaa 1260cacccttcat gagatatatg atgtttcaga gctatgctgg aaacagcata gcaagttgaa 1260

ataaggctag tccgttatca acttgaaaaa gtggcaccga gtcggtgctt tttttttttt 1320ataaggctag tccgttatca acttgaaaaa gtggcaccga gtcggtgctt tttttttttt 1320

aagcttacaa attcgggtca aggcggaagc cagcgcgcca ccccacgtca gcaaatacgg 1380aagcttacaa attcgggtca aggcggaagc cagcgcgcca ccccacgtca gcaaatacgg 1380

aggcgcgggg ttgacggcgt cacccggtcc taacggcgac caacaaacca gccagaagaa 1440aggcgcgggg ttgacggcgt cacccggtcc taacggcgac caacaaacca gccagaagaa 1440

attacagtaa aaaaaaagta aattgcactt tgatccacct tttattacct aagtctcaat 1500attacagtaa aaaaaaagta aattgcactt tgatccacct tttattacct aagtctcaat 1500

ttggatcacc cttaaaccta tcttttcaat ttgggccggg ttgtggtttg gactaccatg 1560ttggatcacc cttaaaccta tcttttcaat ttgggccggg ttgtggtttg gactaccatg 1560

aacaactttt cgtcatgtct aacttccctt tcagcaaaca tatgaaccat atatagagga 1620aacaactttt cgtcatgtct aacttccctt tcagcaaaca tatgaaccat atatagagga 1620

gatcggccgt atactagagc tgatgtgttt aaggtcgttg attgcacgag aaaaaaaaat 1680gatcggccgt atactagagc tgatgtgttt aaggtcgttg attgcacgag aaaaaaaaat 1680

ccaaatcgca acaatagcaa atttatctgg ttcaaagtga aaagatatgt ttaaaggtag 1740ccaaatcgca acaatagcaa atttatctgg ttcaaagtga aaagatatgt ttaaaggtag 1740

tccaaagtaa aacttataga taataaaatg tggtccaaag cgtaattcac tcaaaaaaaa 1800tccaaagtaa aacttataga taataaaatg tggtccaaag cgtaattcac tcaaaaaaaa 1800

tcaacgagac gtgtaccaaa cggagacaaa cggcatcttc tcgaaatttc ccaaccgctc 1860tcaacgagac gtgtaccaaa cggagacaaa cggcatcttc tcgaaatttc ccaaccgctc 1860

gctcgcccgc ctcgtcttcc cggaaaccgc ggtggtttca gcgtggcgga ttctccaagc 1920gctcgcccgc ctcgtcttcc cggaaaccgc ggtggtttca gcgtggcgga ttctccaagc 1920

agacggagac gtcacggcac gggactcctc ccaccaccca accgccataa ataccagccc 1980agacggagac gtcacggcac gggactcctc ccaccaccca accgccataa ataccagccc 1980

cctcatctcc tctcctcgca tcagctccac ccccgaaaaa tttctcccca atctcgcgag 2040cctcatctcc tctcctcgca tcagctccac ccccgaaaaa tttctcccca atctcgcgag 2040

gctctcgtcg tcgaatcgaa tcctctcgcg tcctcaaggt acgctgcttc tcctctcctc 2100gctctcgtcg tcgaatcgaa tcctctcgcg tcctcaaggt acgctgcttc tcctctcctc 2100

gcttcgtttc gattcgattt cggacgggtg aggttgtttt gttgctagat ccgattggtg 2160gcttcgtttc gattcgattt cggacgggtg aggttgtttt gttgctagat ccgattggtg 2160

gttagggttg tcgatgtgat tatcgtgaga tgtttagggg ttgtagatct gatggttgtg 2220gttagggttg tcgatgtgat tatcgtgaga tgtttagggg ttgtagatct gatggttgtg 2220

atttgggcac ggttggttcg ataggtggaa tcgtggttag gttttgggat tggatgttgg 2280atttgggcac ggttggttcg ataggtggaa tcgtggttag gttttgggat tggatgttgg 2280

ttctgatgat tggggggaat ttttacggtt agatgaattg ttggatgatt cgattgggga 2340ttctgatgat tggggggaat ttttacggtt agatgaattg ttggatgatt cgattgggga 2340

aatcggtgta gatctgttgg ggaattgtgg aactagtcat gcctgagtga ttggtgcgat 2400aatcggtgta gatctgttgg ggaattgtgg aactagtcat gcctgagtga ttggtgcgat 2400

ttgtagcgtg ttccatcttg taggccttgt tgcgagcatg ttcagatcta ctgttccgct 2460ttgtagcgtg ttccatcttg taggccttgt tgcgagcatg ttcagatcta ctgttccgct 2460

cttgattgag ttattggtgc catgggttgg tgcaaacaca ggctttaata tgttatatct 2520cttgattgag ttattggtgc catgggttgg tgcaaacaca ggctttaata tgttatatct 2520

gttttgtgtt tgatgtagat ctgtagggta gttcttctta gacatggttc aattatgtag 2580gttttgtgtt tgatgtagat ctgtagggta gttcttctta gacatggttc aattatgtag 2580

cttgtgcgtt tcgatttgat ttcatatgtt cacagattag ataatgatga actcttttaa 2640cttgtgcgtt tcgatttgat ttcatatgtt cacagattag ataatgatga actcttttaa 2640

ttaattgtca atggtaaata ggaagtcttg tcgctatatc tgtcataatg atctcatgtt 2700ttaattgtca atggtaaata ggaagtcttg tcgctatatc tgtcataatg atctcatgtt 2700

actatctgcc agtaatttat gctaagaact atattagaat atcatgttac aatctgtagt 2760actatctgcc agtaatttat gctaagaact atattagaat atcatgttac aatctgtagt 2760

aatatcatgt tacaatctgt agttcatcta tataatctat tgtggtaatt tctttttact 2820aatatcatgt tacaatctgt agttcatcta tataatctat tgtggtaatt tctttttatact 2820

atctgtgtga agattattgc cactagttca ttctacttat ttctgaagtt caggatacgt 2880atctgtgtga agattattgc cactagttca ttctacttat ttctgaagtt caggatacgt 2880

gtgctgttac tacctatctg aatacatgtg tgatgtgcct gttactatct ttttgaatac 2940gtgctgttac tacctatctg aatacatgtg tgatgtgcct gttactatct ttttgaatac 2940

atgtatgttc tgttggaata tgtttgctgt ttgatccgtt gttgtgtcct taatcttgtg 3000atgtatgttc tgttggaata tgtttgctgt ttgatccgtt gttgtgtcct taatcttgtg 3000

ctagttctta ccctatctgt ttggtgatta tttcttgcag tacgtaatgg actacaagga 3060ctagttctta ccctatctgt ttggtgatta tttcttgcag tacgtaatgg actacaagga 3060

ccacgacggc gactacaagg atcatgacat cgactacaag gacgacgacg acaagatggc 3120ccacgacggc gactacaagg atcatgacat cgactacaag gacgacgacg acaagatggc 3120

tcctaagaag aagcggaagg ttggtattca cggggtgcct gcggctgaca agaagtactc 3180tcctaagaag aagcggaagg ttggtattca cggggtgcct gcggctgaca agaagtactc 3180

catcggcctc gccatcggca ccaacagcgt cggctgggcg gtgatcaccg acgagtacaa 3240catcggcctc gccatcggca ccaacagcgt cggctgggcg gtgatcaccg acgagtacaa 3240

ggtcccgtcc aagaagttca aggtcctggg caacaccgac cgccactcca tcaagaagaa 3300ggtcccgtcc aagaagttca aggtcctggg caacaccgac cgccactcca tcaagaagaa 3300

cctcatcggc gccctcctct tcgactccgg cgagacggcg gagcgcaccc gcctcaagcg 3360cctcatcggc gccctcctct tcgactccgg cgagacggcg gagcgcaccc gcctcaagcg 3360

caccgcccgc cgccgctaca cccgccgcaa gaaccgcatc tgctacctcc aggagatctt 3420caccgcccgc cgccgctaca cccgccgcaa gaaccgcatc tgctacctcc aggagatctt 3420

ctccaacgag atggcgaagg tcgacgactc cttcttccac cgcctcgagg agtccttcct 3480ctccaacgag atggcgaagg tcgacgactc cttcttccac cgcctcgagg agtccttcct 3480

cgtggaggag gacaagaagc acgagcgcca ccccatcttc ggcaacatcg tcgacgaggt 3540cgtggaggag gacaagaagc acgagcgcca ccccatcttc ggcaacatcg tcgacgaggt 3540

cgcctaccac gagaagtacc ccactatcta ccaccttcgt aagaagcttg ttgactctac 3600cgcctaccac gagaagtacc ccactatcta ccaccttcgt aagaagcttg ttgactctac 3600

tgataaggct gatcttcgtc tcatctacct tgctctcgct cacatgatca agttccgtgg 3660tgataaggct gatcttcgtc tcatctacct tgctctcgct cacatgatca agttccgtgg 3660

tcacttcctt atcgagggtg accttaaccc tgataactcc gacgtggaca agctcttcat 3720tcacttcctt atcgagggtg accttaaccc tgataactcc gacgtggaca agctcttcat 3720

ccagctcgtc cagacctaca accagctctt cgaggagaac cctatcaacg cttccggtgt 3780ccagctcgtc cagacctaca accagctctt cgaggagaac cctatcaacg cttccggtgt 3780

cgacgctaag gcgatccttt ccgctaggct ctccaagtcc aggcgtctcg agaacctcat 3840cgacgctaag gcgatccttt ccgctaggct ctccaagtcc aggcgtctcg agaacctcat 3840

cgcccagctc cctggtgaga agaagaacgg tcttttcggt aacctcatcg ctctctccct 3900cgcccagctc cctggtgaga agaagaacgg tcttttcggt aacctcatcg ctctctccct 3900

cggtctgacc cctaacttca agtccaactt cgacctcgct gaggacgcta agcttcagct 3960cggtctgacc cctaacttca agtccaactt cgacctcgct gaggacgcta agcttcagct 3960

ctccaaggat acctacgacg atgatctcga caacctcctc gctcagattg gagatcagta 4020ctccaaggat acctacgacg atgatctcga caacctcctc gctcagattg gagatcagta 4020

cgctgatctc ttccttgctg ctaagaacct ctccgatgct atcctccttt cggatatcct 4080cgctgatctc ttccttgctg ctaagaacct ctccgatgct atcctccttt cggatatcct 4080

tagggttaac actgagatca ctaaggctcc tctttctgct tccatgatca agcgctacga 4140tagggttaac actgagatca ctaaggctcc tctttctgct tccatgatca agcgctacga 4140

cgagcaccac caggacctca ccctcctcaa ggctcttgtt cgtcagcagc tccccgagaa 4200cgagcaccac caggacctca ccctcctcaa ggctcttgtt cgtcagcagc tccccgagaa 4200

gtacaaggag atcttcttcg accagtccaa gaacggctac gccggttaca ttgacggtgg 4260gtacaaggag atcttcttcg accagtccaa gaacggctac gccggttaca ttgacggtgg 4260

agctagccag gaggagttct acaagttcat caagccaatc cttgagaaga tggatggtac 4320agctagccag gaggagttct acaagttcat caagccaatc cttgagaaga tggatggtac 4320

tgaggagctt ctcgttaagc ttaaccgtga ggacctcctt aggaagcaga ggactttcga 4380tgaggagctt ctcgttaagc ttaaccgtga ggacctcctt aggaagcaga ggactttcga 4380

taacggctct atccctcacc agatccacct tggtgagctt cacgccatcc ttcgtaggca 4440taacggctct atccctcacc agatccacct tggtgagctt cacgccatcc ttcgtaggca 4440

ggaggacttc taccctttcc tcaaggacaa ccgtgagaag atcgagaaga tccttacttt 4500ggaggacttc taccctttcc tcaaggacaa ccgtgagaag atcgagaaga tccttacttt 4500

ccgtattcct tactacgttg gtcctcttgc tcgtggtaac tcccgtttcg cttggatgac 4560ccgtattcct tactacgttg gtcctcttgc tcgtggtaac tcccgtttcg cttggatgac 4560

taggaagtcc gaggagacta tcaccccttg gaacttcgag gaggttgttg acaagggtgc 4620taggaagtcc gaggagacta tcaccccttg gaacttcgag gaggttgttg acaagggtgc 4620

ttccgcccag tccttcatcg agcgcatgac caacttcgac aagaacctcc ccaacgagaa 4680ttccgcccag tccttcatcg agcgcatgac caacttcgac aagaacctcc ccaacgagaa 4680

ggtcctcccc aagcactccc tcctctacga gtacttcacg gtctacaacg agctcaccaa 4740ggtcctcccc aagcactccc tcctctacga gtacttcacg gtctacaacg agctcaccaa 4740

ggtcaagtac gtcaccgagg gtatgcgcaa gcctgccttc ctctccggcg agcagaagaa 4800ggtcaagtac gtcaccgagg gtatgcgcaa gcctgccttc ctctccggcg agcagaagaa 4800

ggctatcgtt gacctcctct tcaagaccaa ccgcaaggtc accgtcaagc agctcaagga 4860ggctatcgtt gacctcctct tcaagaccaa ccgcaaggtc accgtcaagc agctcaagga 4860

ggactacttc aagaagatcg agtgcttcga ctccgtcgag atcagcggcg ttgaggaccg 4920ggactacttc aagaagatcg agtgcttcga ctccgtcgag atcagcggcg ttgaggaccg 4920

tttcaacgct tctctcggta cctaccacga tctcctcaag atcatcaagg acaaggactt 4980tttcaacgct tctctcggta cctaccacga tctcctcaag atcatcaagg acaaggactt 4980

cctcgacaac gaggagaacg aggacatcct cgaggacatc gtcctcactc ttactctctt 5040cctcgacaac gaggagaacg aggacatcct cgaggacatc gtcctcactc ttactctctt 5040

cgaggatagg gagatgatcg aggagaggct caagacttac gctcatctct tcgatgacaa 5100cgaggatagg gagatgatcg aggagaggct caagacttac gctcatctct tcgatgacaa 5100

ggttatgaag cagctcaagc gtcgccgtta caccggttgg ggtaggctct cccgcaagct 5160ggttatgaag cagctcaagc gtcgccgtta caccggttgg ggtaggctct cccgcaagct 5160

catcaacggt atcagggata agcagagcgg caagactatc ctcgacttcc tcaagtctga 5220catcaacggt atcagggata agcagagcgg caagactatc ctcgacttcc tcaagtctga 5220

tggtttcgct aacaggaact tcatgcagct catccacgat gactctctta ccttcaagga 5280tggtttcgct aacaggaact tcatgcagct catccacgat gactctctta ccttcaagga 5280

ggatattcag aaggctcagg tgtccggtca gggcgactct ctccacgagc acattgctaa 5340ggatattcag aaggctcagg tgtccggtca gggcgactct ctccacgagc acattgctaa 5340

ccttgctggt tcccctgcta tcaagaaggg catccttcag actgttaagg ttgtcgatga 5400ccttgctggt tcccctgcta tcaagaaggg catccttcag actgttaagg ttgtcgatga 5400

gcttgtcaag gttatgggtc gtcacaagcc tgagaacatc gtcatcgaga tggctcgtga 5460gcttgtcaag gttatgggtc gtcacaagcc tgagaacatc gtcatcgaga tggctcgtga 5460

gaaccagact acccagaagg gtcagaagaa ctcgagggag cgcatgaaga ggattgagga 5520gaaccagact acccagaagg gtcagaagaa ctcgagggag cgcatgaaga ggattgagga 5520

gggtatcaag gagcttggtt ctcagatcct taaggagcac cctgtcgaga acacccagct 5580gggtatcaag gagcttggtt ctcagatcct taaggagcac cctgtcgaga acacccagct 5580

ccagaacgag aagctctacc tctactacct ccagaacggt agggatatgt acgttgacca 5640ccagaacgag aagctctacc tctactacct ccagaacggt agggatatgt acgttgacca 5640

ggagctcgac atcaacaggc tttctgacta cgacgtcgac cacattgttc ctcagtcttt 5700ggagctcgac atcaacaggc tttctgacta cgacgtcgac cacattgttc ctcagtcttt 5700

ccttaaggat gactccatcg acaacaaggt cctcacgagg tccgacaaga acaggggtaa 5760ccttaaggat gactccatcg acaacaaggt cctcacgagg tccgacaaga acaggggtaa 5760

gtcggacaac gtcccttccg aggaggttgt caagaagatg aagaactact ggaggcagct 5820gtcggacaac gtcccttccg aggaggttgt caagaagatg aagaactact ggaggcagct 5820

tctcaacgct aagctcatta cccagaggaa gttcgacaac ctcacgaagg ctgagagggg 5880tctcaacgct aagctcatta cccagaggaa gttcgacaac ctcacgaagg ctgagagggg 5880

tggcctttcc gagcttgaca aggctggttt catcaagagg cagcttgttg agacgaggca 5940tggcctttcc gagcttgaca aggctggttt catcaagagg cagcttgttg agacgaggca 5940

gattaccaag cacgttgctc agatcctcga ttctaggatg aacaccaagt acgacgagaa 6000gattaccaag cacgttgctc agatcctcga ttctaggatg aacaccaagt acgacgagaa 6000

cgacaagctc atccgcgagg tcaaggtgat caccctcaag tccaagctcg tctccgactt 6060cgacaagctc atccgcgagg tcaaggtgat caccctcaag tccaagctcg tctccgactt 6060

ccgcaaggac ttccagttct acaaggtccg cgagatcaac aactaccacc acgctcacga 6120ccgcaaggac ttccagttct acaaggtccg cgagatcaac aactaccacc acgctcacga 6120

tgcttacctt aacgctgtcg ttggtaccgc tcttatcaag aagtacccta agcttgagtc 6180tgcttacctt aacgctgtcg ttggtaccgc tctttatcaag aagtacccta agcttgagtc 6180

cgagttcgtc tacggtgact acaaggtcta cgacgttcgt aagatgatcg ccaagtccga 6240cgagttcgtc tacggtgact acaaggtcta cgacgttcgt aagatgatcg ccaagtccga 6240

gcaggagatc ggcaaggcca ccgccaagta cttcttctac tccaacatca tgaacttctt 6300gcaggagatc ggcaaggcca ccgccaagta cttcttctac tccaacatca tgaacttctt 6300

caagaccgag atcaccctcg ccaacggcga gatccgcaag cgccctctta tcgagacgaa 6360caagaccgag atcaccctcg ccaacggcga gatccgcaag cgccctctta tcgagacgaa 6360

cggtgagact ggtgagatcg tttgggacaa gggtcgcgac ttcgctactg ttcgcaaggt 6420cggtgagact ggtgagatcg tttgggacaa gggtcgcgac ttcgctactg ttcgcaaggt 6420

cctttctatg cctcaggtta acatcgtcaa gaagaccgag gtccagaccg gtggcttctc 6480cctttctatg cctcaggtta acatcgtcaa gaagaccgag gtccagaccg gtggcttctc 6480

caaggagtct atccgcccaa agagaaactc ggacaagctc atcgctagga agaaggattg 6540caaggagtct atccgcccaa agagaaactc ggacaagctc atcgctagga agaaggattg 6540

ggaccctaag aagtacggtg gtttcctgtg gcctactgtc gcctactccg tcctcgtggt 6600ggaccctaag aagtacggtg gtttcctgtg gcctactgtc gcctactccg tcctcgtggt 6600

cgccaaggtg gagaagggta agtcgaagaa gctcaagtcc gtcaaggagc tcctcggcat 6660cgccaaggtg gagaagggta agtcgaagaa gctcaagtcc gtcaaggagc tcctcggcat 6660

caccatcatg gagcgctcct ccttcgagaa gaacccgatc gacttcctcg aggccaaggg 6720caccatcatg gagcgctcct ccttcgagaa gaacccgatc gacttcctcg aggccaaggg 6720

ctacaaggag gtcaagaagg acctcatcat caagctcccc aagtactctc ttttcgagct 6780ctacaaggag gtcaagaagg acctcatcat caagctcccc aagtactctc ttttcgagct 6780

cgagaacggt cgtaagagga tgctggcttc cgctaagcag ctccagaagg gtaacgagct 6840cgagaacggt cgtaagagga tgctggcttc cgctaagcag ctccagaagg gtaacgagct 6840

tgctcttcct tccaagtacg tgaacttcct ctacctcgcc tcccactacg agaagctcaa 6900tgctcttcct tccaagtacg tgaacttcct ctacctcgcc tcccactacg agaagctcaa 6900

gggttcccct gaggataacg agcagaagca gctcttcgtg gagcagcaca agcactacct 6960gggttcccct gaggataacg agcagaagca gctcttcgtg gagcagcaca agcactacct 6960

cgacgagatc atcgagcaga tctccgagtt ctccaagcgc gtcatcctcg ctgacgctaa 7020cgacgagatc atcgagcaga tctccgagtt ctccaagcgc gtcatcctcg ctgacgctaa 7020

cctcgacaag gtcctctccg cctacaacaa gcaccgcgac aagcccatcc gcgagcaggc 7080cctcgacaag gtcctctccg cctacaacaa gcaccgcgac aagcccatcc gcgagcaggc 7080

cgagaacatc atccacctct tcacgctcac gcgcctcggc gcccctcgcg ctttcaagta 7140cgagaacatc atccacctct tcacgctcac gcgcctcggc gcccctcgcg ctttcaagta 7140

cttcgacacc accatcgacc ccaagcagta ccgctccacc aaggaggttc tcgacgctac 7200cttcgacacc accatcgacc ccaagcagta ccgctccacc aaggaggttc tcgacgctac 7200

tctcatccac cagtccatca ccggtcttta cgagactcgt atcgaccttt cccagcttgg 7260tctcatccac cagtccatca ccggtcttta cgagactcgt atcgaccttt cccagcttgg 7260

tggtgatgga ggaggaggca cgggaggagg aggctccgcc gagtatgtgc gcgcgctctt 7320tggtgatgga ggaggaggca cgggaggagg aggctccgcc gagtatgtgc gcgcgctctt 7320

cgacttcaac ggcaatgacg aggaggatct ccctttcaag aagggcgaca tcctccgcat 7380cgacttcaac ggcaatgacg aggaggatct ccctttcaag aagggcgaca tcctccgcat 7380

ccgcgataag ccggaggagc agtggtggaa cgcagaggac tccgagggca agcggggcat 7440ccgcgataag ccggaggagc agtggtggaa cgcagaggac tccgagggca agcggggcat 7440

gatcctggtg ccatacgtcg agaagtacag cggcgattac aaggaccacg atggcgacta 7500gatcctggtg ccatacgtcg agaagtacag cggcgattac aaggaccacg atggcgacta 7500

caaggatcat gacatcgatt acaaggacga tgacgataag tccggcgtcg acatgacgga 7560caaggatcat gacatcgatt acaaggacga tgacgataag tccggcgtcg acatgacgga 7560

cgcggagtat gtgcgcatcc acgagaagct cgatatctac accttcaaga agcagttctt 7620cgcggagtat gtgcgcatcc acgagaagct cgatatctac accttcaaga agcagttctt 7620

caacaataag aagtcggtgt cccatcggtg ctacgtcctc ttcgagctga agcgcagggg 7680caacaataag aagtcggtgt cccatcggtg ctacgtcctc ttcgagctga agcgcagggg 7680

agagcgccgc gcctgcttct ggggctacgc ggtgaataag ccgcagtcag gcacagagcg 7740agagcgccgc gcctgcttct ggggctacgc ggtgaataag ccgcagtcag gcacagagcg 7740

cggcatccac gccgagatct tctcgatccg gaaggtcgag gagtacctcc gcgacaaccc 7800cggcatccac gccgagatct tctcgatccg gaaggtcgag gagtacctcc gcgacaaccc 7800

aggccagttc acgatcaatt ggtactccag ctggtcccct tgcgcagatt gcgcagagaa 7860aggccagttc acgatcaatt ggtactccag ctggtcccct tgcgcagatt gcgcagagaa 7860

gatcctcgag tggtacaacc aggagctgag gggcaatggc cataccctca agatctgggc 7920gatcctcgag tggtacaacc aggagctgag gggcaatggc cataccctca agatctgggc 7920

ctgcaagctg tactacgaga agaacgcgag gaatcagatc ggcctctgga acctgcggga 7980ctgcaagctg tactacgaga agaacgcgag gaatcagatc ggcctctgga acctgcggga 7980

taatggcgtg ggcctcaacg tgatggtgtc cgagcactac cagtgctgcc gcaagatctt 8040taatggcgtg ggcctcaacg tgatggtgtc cgagcactac cagtgctgcc gcaagatctt 8040

catccagtcc tcccacaatc agctgaacga gaataggtgg ctcgaaaaga ccctgaagcg 8100catccagtcc tcccacaatc agctgaacga gaataggtgg ctcgaaaaga ccctgaagcg 8100

cgccgagaag tggaggagcg agctgtctat catgatccag gtcaagatcc tgcacaccac 8160cgccgagaag tggaggagcg agctgtctat catgatccag gtcaagatcc tgcacaccac 8160

aaagtcaccg gcggtgggcg gcggcggcag cgatgattcc ggcggcagca ccaacctctc 8220aaagtcaccg gcggtgggcg gcggcggcag cgatgattcc ggcggcagca ccaacctctc 8220

cgacatcatc gagaaggaga caggcaagca gctcgtgatc caggagagca tcctcatgct 8280cgacatcatc gagaaggaga caggcaagca gctcgtgatc caggagagca tcctcatgct 8280

cccggaggag gtggaggagg tcatcggcaa caagccggag tccgacatcc tcgtgcacac 8340cccggaggag gtggaggagg tcatcggcaa caagccggag tccgacatcc tcgtgcacac 8340

cgcctacgac gagtccaccg acgagaacgt gatgctcctc acctcagatg caccagagta 8400cgcctacgac gagtccaccg acgagaacgt gatgctcctc acctcagatg caccagagta 8400

caagccatgg gcactcgtga tccaggacag caacggcgag aacaagatca agatgctctc 8460caagccatgg gcactcgtga tccaggacag caacggcgag aacaagatca agatgctctc 8460

cggcggcagc accaacctct ccgacatcat cgagaaggag acaggcaagc agctcgtgat 8520cggcggcagc accaacctct ccgacatcat cgagaaggag acaggcaagc agctcgtgat 8520

ccaggagagc atcctcatgc tcccggagga ggtggaggag gtcatcggca acaagccgga 8580ccaggagagc atcctcatgc tcccggagga ggtggaggag gtcatcggca acaagccgga 8580

gtccgacatc ctcgtgcaca ccgcctacga cgagtccacc gacgagaacg tgatgctcct 8640gtccgacatc ctcgtgcaca ccgcctacga cgagtccacc gacgagaacg tgatgctcct 8640

cacctcagat gcaccagagt acaagccatg ggcactcgtg atccaggaca gcaacggcga 8700cacctcagat gcaccagagt acaagccatg ggcactcgtg atccaggaca gcaacggcga 8700

gaacaagatc aagatgctct ccggcggctc cccgaagaag aagaggaaag tgggatcagg 8760gaacaagatc aagatgctct ccggcggctc cccgaagaag aagaggaaag tgggatcagg 8760

agccaccaac ttctccctcc tcaagcaggc cggcgacgtg gaggagaacc cgggcccaat 8820agccaccaac ttctccctcc tcaagcaggc cggcgacgtg gaggagaacc cgggcccaat 8820

gaaaaagcct gaactcaccg cgacgtctgt cgagaagttt ctgatcgaaa agttcgacag 8880gaaaaagcct gaactcaccg cgacgtctgt cgagaagttt ctgatcgaaa agttcgacag 8880

cgtctccgac ctgatgcagc tctcggaggg cgaagaatct cgtgctttca gcttcgatgt 8940cgtctccgac ctgatgcagc tctcggaggg cgaagaatct cgtgctttca gcttcgatgt 8940

aggagggcgt ggatatgtcc tgcgggtaaa tagctgcgcc gatggtttct acaaagatcg 9000aggagggcgt ggatatgtcc tgcgggtaaa tagctgcgcc gatggtttct acaaagatcg 9000

ttatgtttat cggcactttg catcggccgc gctcccgatt ccggaagtgc ttgacattgg 9060ttatgtttat cggcactttg catcggccgc gctcccgatt ccggaagtgc ttgacattgg 9060

ggagtttagc gagagcctga cctattgcat ctcccgccgt tcacagggtg tcacgttgca 9120ggagtttagc gagagcctga cctattgcat ctcccgccgt tcacagggtg tcacgttgca 9120

agacctgcct gaaaccgaac tgcccgctgt tctacaaccg gtcgcggagg ctatggatgc 9180agacctgcct gaaaccgaac tgcccgctgt tctacaaccg gtcgcggagg ctatggatgc 9180

gatcgctgcg gccgatctta gccagacgag cgggttcggc ccattcggac cgcaaggaat 9240gatcgctgcg gccgatctta gccagacgag cgggttcggc ccattcggac cgcaaggaat 9240

cggtcaatac actacatggc gtgatttcat atgcgcgatt gctgatcccc atgtgtatca 9300cggtcaatac actacatggc gtgatttcat atgcgcgatt gctgatcccc atgtgtatca 9300

ctggcaaact gtgatggacg acaccgtcag tgcgtccgtc gcgcaggctc tcgatgagct 9360ctggcaaact gtgatggacg acaccgtcag tgcgtccgtc gcgcaggctc tcgatgagct 9360

gatgctttgg gccgaggact gccccgaagt ccggcacctc gtgcacgcgg atttcggctc 9420gatgctttgg gccgaggact gccccgaagt ccggcacctc gtgcacgcgg atttcggctc 9420

caacaatgtc ctgacggaca atggccgcat aacagcggtc attgactgga gcgaggcgat 9480caacaatgtc ctgacggaca atggccgcat aacagcggtc attgactgga gcgaggcgat 9480

gttcggggat tcccaatacg aggtcgccaa catcttcttc tggaggccgt ggttggcttg 9540gttcggggat tcccaatacg aggtcgccaa catcttcttc tggaggccgt ggttggcttg 9540

tatggagcag cagacgcgct acttcgagcg gaggcatccg gagcttgcag gatcgccacg 9600tatggagcag cagacgcgct acttcgagcg gaggcatccg gagcttgcag gatcgccacg 9600

actccgggcg tatatgctcc gcattggtct tgaccaactc tatcagagct tggttgacgg 9660actccgggcg tatatgctcc gcattggtct tgaccaactc tatcagagct tggttgacgg 9660

caatttcgat gatgcagctt gggcgcaggg tcgatgcgac gcaatcgtcc gatccggagc 9720caatttcgat gatgcagctt gggcgcaggg tcgatgcgac gcaatcgtcc gatccggagc 9720

cgggactgtc gggcgtacac aaatcgcccg cagaagcgcg gccgtctgga ccgatggctg 9780cgggactgtc gggcgtacac aaatcgcccg cagaagcgcg gccgtctgga ccgatggctg 9780

tgtagaagta ctcgccgata gtggaaaccg acgccccagc actcgtccga gggcaaagaa 9840tgtagaagta ctcgccgata gtggaaaccg acgccccagc actcgtccga gggcaaagaa 9840

atagactagt tcagccagtt tggtggagct gccgatgtgc ctggtcgtcc cgagcctctg 9900atagactagt tcagccagtt tggtggagct gccgatgtgc ctggtcgtcc cgagcctctg 9900

ttcgtcaagt atttgtggtg ctgatgtcta cttgtgtctg gtttaatgga ccatcgagtc 9960ttcgtcaagt atttgtggtg ctgatgtcta cttgtgtctg gtttaatgga ccatcgagtc 9960

cgtatgatat gttagtttta tgaaacagtt tcctgtggga cagcagtatg ctttatgaat 10020cgtatgatat gttagtttta tgaaacagtt tcctgtggga cagcagtatg ctttatgaat 10020

aagttggatt tgaacctaaa tatgtgctca atttgctcat ttgcatctca ttcctgttga 10080aagttggatt tgaacctaaa tatgtgctca atttgctcat ttgcatctca ttcctgttga 10080

tgttttatct gagttgcaag tttgaaaatg ctgcatattc ttattaaatc gtcatttact 10140tgttttatct gagttgcaag tttgaaaatg ctgcatattc ttattaaatc gtcatttatact 10140

tttatcttaa tgagctttgc aatggcctat gggatataaa agagatcgtt caaacatttg 10200tttatcttaa tgagctttgc aatggcctat gggatataaa agagatcgtt caaacatttg 10200

gcaataaagt ttcttaagat tgaatcctgt tgccggtctt gcgatgatta tcatataatt 10260gcaataaagt ttcttaagat tgaatcctgt tgccggtctt gcgatgatta tcatataatt 10260

tctgttgaat tacgttaagc atgtaataat taacatgtaa tgcatgacgt tatttatgag 10320tctgttgaat tacgttaagc atgtaataat taacatgtaa tgcatgacgt tatttatgag 10320

atgggttttt atgattagag tcccgcaatt atacatttaa tacgcgatag aaaacaaaat 10380atgggttttt atgattagag tcccgcaatt atacatttaa tacgcgatag aaaacaaaat 10380

atagcgcgca aactaggata aattatcgcg cgcggtgtca tctatgttac tagatccctg 10440atagcgcgca aactaggata aattatcgcg cgcggtgtca tctatgttac tagatccctg 10440

caggacgcgt ttaattaagt gcacgcggcc gcctacttag tcaagagcct cgcacgcgac 10500caggacgcgt ttaattaagt gcacgcggcc gcctacttag tcaagagcct cgcacgcgac 10500

tgtcacgcgg ccaggatcgc ctcgtgagcc tcgcaatctg tacctagtgt ttaaactatc 10560tgtcacgcgg ccaggatcgc ctcgtgagcc tcgcaatctg tacctagtgt ttaaactatc 10560

agtgtttgac aggatatatt ggcgggtaaa cctaagagaa aagagcgttt attagaataa 10620agtgtttgac aggatatatt ggcgggtaaa cctaagagaa aagagcgttt attagaataa 10620

cggatattta aaagggcgtg aaaaggttta tccgttcgtc catttgtatg tgcatgccaa 10680cggatattta aaagggcgtg aaaaggttta tccgttcgtc catttgtatg tgcatgccaa 10680

ccacagggtt cccctcggga tcaaagtact ttgatccaac ccctccgctg ctatagtgca 10740ccacagggtt cccctcggga tcaaagtact ttgatccaac ccctccgctg ctatagtgca 10740

gtcggcttct gacgttcagt gcagccgtct tctgaaaacg acatgtcgca caagtcctaa 10800gtcggcttct gacgttcagt gcagccgtct tctgaaaacg acatgtcgca caagtcctaa 10800

gttacgcgac aggctgccgc cctgcccttt tcctggcgtt ttcttgtcgc gtgttttagt 10860gttacgcgac aggctgccgc cctgcccttt tcctggcgtt ttcttgtcgc gtgttttagt 10860

cgcataaagt agaatacttg cgactagaac cggagacatt acgccatgaa caagagcgcc 10920cgcataaagt agaatacttg cgactagaac cggagacatt acgccatgaa caagagcgcc 10920

gccgctggcc tgctgggcta tgcccgcgtc agcaccgacg accaggactt gaccaaccaa 10980gccgctggcc tgctgggcta tgcccgcgtc agcaccgacg accaggactt gaccaaccaa 10980

cgggccgaac tgcacgcggc cggctgcacc aagctgtttt ccgagaagat caccggcacc 11040cgggccgaac tgcacgcggc cggctgcacc aagctgtttt ccgagaagat caccggcacc 11040

aggcgcgacc gcccggagct ggccaggatg cttgaccacc tacgccctgg cgacgttgtg 11100aggcgcgacc gcccggagct ggccaggatg cttgaccacc tacgccctgg cgacgttgtg 11100

acagtgacca ggctagaccg cctggcccgc agcacccgcg acctactgga cattgccgag 11160acagtgacca ggctagaccg cctggcccgc agcacccgcg acctactgga cattgccgag 11160

cgcatccagg aggccggcgc gggcctgcgt agcctggcag agccgtgggc cgacaccacc 11220cgcatccagg aggccggcgc gggcctgcgt agcctggcag agccgtgggc cgacaccacc 11220

acgccggccg gccgcatggt gttgaccgtg ttcgccggca ttgccgagtt cgagcgttcc 11280acgccggccg gccgcatggt gttgaccgtg ttcgccggca ttgccgagtt cgagcgttcc 11280

ctaatcatcg accgcacccg gagcgggcgc gaggccgcca aggcccgagg cgtgaagttt 11340ctaatcatcg accgcacccg gagcgggcgc gaggccgcca aggcccgagg cgtgaagttt 11340

ggcccccgcc ctaccctcac cccggcacag atcgcgcacg cccgcgagct gatcgaccag 11400ggcccccgcc ctaccctcac cccggcacag atcgcgcacg cccgcgagct gatcgaccag 11400

gaaggccgca ccgtgaaaga ggcggctgca ctgcttggcg tgcatcgctc gaccctgtac 11460gaaggccgca ccgtgaaaga ggcggctgca ctgcttggcg tgcatcgctc gaccctgtac 11460

cgcgcacttg agcgcagcga ggaagtgacg cccaccgagg ccaggcggcg cggtgccttc 11520cgcgcacttg agcgcagcga ggaagtgacg cccaccgagg ccaggcggcg cggtgccttc 11520

cgtgaggacg cattgaccga ggccgacgcc ctggcggccg ccgagaatga acgccaagag 11580cgtgaggacg cattgaccga ggccgacgcc ctggcggccg ccgagaatga acgccaagag 11580

gaacaagcat gaaaccgcac caggacggcc aggacgaacc gtttttcatt accgaagaga 11640gaacaagcat gaaaccgcac caggacggcc aggacgaacc gtttttcatt accgaagaga 11640

tcgaggcgga gatgatcgcg gccgggtacg tgttcgagcc gcccgcgcac gtctcaaccg 11700tcgaggcgga gatgatcgcg gccgggtacg tgttcgagcc gcccgcgcac gtctcaaccg 11700

tgcggctgca tgaaatcctg gccggtttgt ctgatgccaa gctggcggcc tggccggcca 11760tgcggctgca tgaaatcctg gccggtttgt ctgatgccaa gctggcggcc tggccggcca 11760

gcttggccgc tgaagaaacc gagcgccgcc gtctaaaaag gtgatgtgta tttgagtaaa 11820gcttggccgc tgaagaaacc gagcgccgcc gtctaaaaag gtgatgtgta tttgagtaaa 11820

acagcttgcg tcatgcggtc gctgcgtata tgatgcgatg agtaaataaa caaatacgca 11880acagcttgcg tcatgcggtc gctgcgtata tgatgcgatg agtaaataaa caaatacgca 11880

aggggaacgc atgaaggtta tcgctgtact taaccagaaa ggcgggtcag gcaagacgac 11940aggggaacgc atgaaggtta tcgctgtact taaccagaaa ggcgggtcag gcaagacgac 11940

catcgcaacc catctagccc gcgccctgca actcgccggg gccgatgttc tgttagtcga 12000catcgcaacc catctagccc gcgccctgca actcgccggg gccgatgttc tgttagtcga 12000

ttccgatccc cagggcagtg cccgcgattg ggcggccgtg cgggaagatc aaccgctaac 12060ttccgatccc cagggcagtg cccgcgattg ggcggccgtg cgggaagatc aaccgctaac 12060

cgttgtcggc atcgaccgcc cgacgattga ccgcgacgtg aaggccatcg gccggcgcga 12120cgttgtcggc atcgaccgcc cgacgattga ccgcgacgtg aaggccatcg gccggcgcga 12120

cttcgtagtg atcgacggag cgccccaggc ggcggacttg gctgtgtccg cgatcaaggc 12180cttcgtagtg atcgacggag cgccccaggc ggcggacttg gctgtgtccg cgatcaaggc 12180

agccgacttc gtgctgattc cggtgcagcc aagcccttac gacatatggg ccaccgccga 12240agccgacttc gtgctgattc cggtgcagcc aagcccttac gacatatggg ccaccgccga 12240

cctggtggag ctggttaagc agcgcattga ggtcacggat ggaaggctac aagcggcctt 12300cctggtggag ctggttaagc agcgcattga ggtcacggat ggaaggctac aagcggcctt 12300

tgtcgtgtcg cgggcgatca aaggcacgcg catcggcggt gaggttgccg aggcgctggc 12360tgtcgtgtcg cgggcgatca aaggcacgcg catcggcggt gaggttgccg aggcgctggc 12360

cgggtacgag ctgcccattc ttgagtcccg tatcacgcag cgcgtgagct acccaggcac 12420cgggtacgag ctgcccattc ttgagtcccg tatcacgcag cgcgtgagct acccaggcac 12420

tgccgccgcc ggcacaaccg ttcttgaatc agaacccgag ggcgacgctg cccgcgaggt 12480tgccgccgcc ggcacaaccg ttcttgaatc agaacccgag ggcgacgctg cccgcgaggt 12480

ccaggcgctg gccgctgaaa ttaaatcaaa actcatttga gttaatgagg taaagagaaa 12540ccaggcgctg gccgctgaaa ttaaatcaaa actcatttga gttaatgagg taaagagaaa 12540

atgagcaaaa gcacaaacac gctaagtgcc ggccgtccga gcgcacgcag cagcaaggct 12600atgagcaaaa gcacaaacac gctaagtgcc ggccgtccga gcgcacgcag cagcaaggct 12600

gcaacgttgg ccagcctggc agacacgcca gccatgaagc gggtcaactt tcagttgccg 12660gcaacgttgg ccagcctggc agacacgcca gccatgaagc gggtcaactt tcagttgccg 12660

gcggaggatc acaccaagct gaagatgtac gcggtacgcc aaggcaagac cattaccgag 12720gcggaggatc acaccaagct gaagatgtac gcggtacgcc aaggcaagac cattaccgag 12720

ctgctatctg aatacatcgc gcagctacca gagtaaatga gcaaatgaat aaatgagtag 12780ctgctatctg aatacatcgc gcagctacca gagtaaatga gcaaatgaat aaatgagtag 12780

atgaatttta gcggctaaag gaggcggcat ggaaaatcaa gaacaaccag gcaccgacgc 12840atgaatttta gcggctaaag gaggcggcat ggaaaatcaa gaacaaccag gcaccgacgc 12840

cgtggaatgc cccatgtgtg gaggaacggg cggttggcca ggcgtaagcg gctgggttgt 12900cgtggaatgc cccatgtgtg gaggaacggg cggttggcca ggcgtaagcg gctgggttgt 12900

ctgccggccc tgcaatggca ctggaacccc caagcccgag gaatcggcgt gacggtcgca 12960ctgccggccc tgcaatggca ctggaacccc caagcccgag gaatcggcgt gacggtcgca 12960

aaccatccgg cccggtacaa atcggcgcgg cgctgggtga tgacctggtg gagaagttga 13020aaccatccgg cccggtacaa atcggcgcgg cgctgggtga tgacctggtg gagaagttga 13020

aggccgcgca ggccgcccag cggcaacgca tcgaggcaga agcacgcccc ggtgaatcgt 13080aggccgcgca ggccgcccag cggcaacgca tcgaggcaga agcacgcccc ggtgaatcgt 13080

ggcaagcggc cgctgatcga atccgcaaag aatcccggca accgccggca gccggtgcgc 13140ggcaagcggc cgctgatcga atccgcaaag aatcccggca accgccggca gccggtgcgc 13140

cgtcgattag gaagccgccc aagggcgacg agcaaccaga ttttttcgtt ccgatgctct 13200cgtcgattag gaagccgccc aagggcgacg agcaaccaga ttttttcgtt ccgatgctct 13200

atgacgtggg cacccgcgat agtcgcagca tcatggacgt ggccgttttc cgtctgtcga 13260atgacgtggg cacccgcgat agtcgcagca tcatggacgt ggccgttttc cgtctgtcga 13260

agcgtgaccg acgagctggc gaggtgatcc gctacgagct tccagacggg cacgtagagg 13320agcgtgaccg acgagctggc gaggtgatcc gctacgagct tccagacggg cacgtagagg 13320

tttccgcagg gccggccggc atggccagtg tgtgggatta cgacctggta ctgatggcgg 13380tttccgcagg gccggccggc atggccagtg tgtgggatta cgacctggta ctgatggcgg 13380

tttcccatct aaccgaatcc atgaaccgat accgggaagg gaagggagac aagcccggcc 13440tttcccatct aaccgaatcc atgaaccgat accgggaagg gaagggagac aagcccggcc 13440

gcgtgttccg tccacacgtt gcggacgtac tcaagttctg ccggcgagcc gatggcggaa 13500gcgtgttccg tccacacgtt gcggacgtac tcaagttctg ccggcgagcc gatggcggaa 13500

agcagaaaga cgacctggta gaaacctgca ttcggttaaa caccacgcac gttgccatgc 13560agcagaaaga cgacctggta gaaacctgca ttcggttaaa caccacgcac gttgccatgc 13560

agcgtacgaa gaaggccaag aacggccgcc tggtgacggt atccgagggt gaagccttga 13620agcgtacgaa gaaggccaag aacggccgcc tggtgacggt atccgagggt gaagccttga 13620

ttagccgcta caagatcgta aagagcgaaa ccgggcggcc ggagtacatc gagatcgagc 13680ttagccgcta caagatcgta aagagcgaaa ccgggcggcc ggagtacatc gagatcgagc 13680

tagctgattg gatgtaccgc gagatcacag aaggcaagaa cccggacgtg ctgacggttc 13740tagctgattg gatgtaccgc gagatcacag aaggcaagaa cccggacgtg ctgacggttc 13740

accccgatta ctttttgatc gatcccggca tcggccgttt tctctaccgc ctggcacgcc 13800accccgatta ctttttgatc gatcccggca tcggccgttt tctctaccgc ctggcacgcc 13800

gcgccgcagg caaggcagaa gccagatggt tgttcaagac gatctacgaa cgcagtggca 13860gcgccgcagg caaggcagaa gccagatggt tgttcaagac gatctacgaa cgcagtggca 13860

gcgccggaga gttcaagaag ttctgtttca ccgtgcgcaa gctgatcggg tcaaatgacc 13920gcgccggaga gttcaagaag ttctgtttca ccgtgcgcaa gctgatcggg tcaaatgacc 13920

tgccggagta cgatttgaag gaggaggcgg ggcaggctgg cccgatccta gtcatgcgct 13980tgccggagta cgatttgaag gaggaggcgg ggcaggctgg cccgatccta gtcatgcgct 13980

accgcaacct gatcgagggc gaagcatccg ccggttccta atgtacggag cagatgctag 14040accgcaacct gatcgagggc gaagcatccg ccggttccta atgtacggag cagatgctag 14040

ggcaaattgc cctagcaggg gaaaaaggtc gaaaaggtct ctttcctgtg gatagcacgt 14100ggcaaattgc cctagcaggg gaaaaaggtc gaaaaggtct ctttcctgtg gatagcacgt 14100

acattgggaa cccaaagccg tacattggga accggaaccc gtacattggg aacccaaagc 14160acattgggaa cccaaagccg tacattggga accggaaccc gtacattggg aacccaaagc 14160

cgtacattgg gaaccggtca cacatgtaag tgactgatat aaaagagaaa aaaggcgatt 14220cgtacattgg gaaccggtca cacatgtaag tgactgatat aaaagagaaa aaaggcgatt 14220

tttccgccta aaactcttta aaacttatta aaactcttaa aacccgcctg gcctgtgcat 14280tttccgccta aaactcttta aaacttatta aaactcttaa aacccgcctg gcctgtgcat 14280

aactgtctgg ccagcgcaca gccgaagagc tgcaaaaagc gcctaccctt cggtcgctgc 14340aactgtctgg ccagcgcaca gccgaagagc tgcaaaaagc gcctaccctt cggtcgctgc 14340

gctccctacg ccccgccgct tcgcgtcggc ctatcgcggc cgctggccgc tcaaaaatgg 14400gctccctacg ccccgccgct tcgcgtcggc ctatcgcggc cgctggccgc tcaaaaatgg 14400

ctggcctacg gccaggcaat ctaccagggc gcggacaagc cgcgccgtcg ccactcgacc 14460ctggcctacg gccaggcaat ctaccaggggc gcggacaagc cgcgccgtcg ccactcgacc 14460

gccggcgccc acatcaaggc accctgcctc gcgcgtttcg gtgatgacgg tgaaaacctc 14520gccggcgccc acatcaaggc accctgcctc gcgcgtttcg gtgatgacgg tgaaaacctc 14520

tgacacatgc agctcccgga gacggtcaca gcttgtctgt aagcggatgc cgggagcaga 14580tgacacatgc agctcccgga gacggtcaca gcttgtctgt aagcggatgc cgggagcaga 14580

caagcccgtc agggcgcgtc agcgggtgtt ggcgggtgtc ggggcgcagc catgacccag 14640caagcccgtc agggcgcgtc agcgggtgtt ggcgggtgtc ggggcgcagc catgacccag 14640

tcacgtagcg atagcggagt gtatactggc ttaactatgc ggcatcagag cagattgtac 14700tcacgtagcg atagcggagt gtatactggc ttaactatgc ggcatcagag cagattgtac 14700

tgagagtgca ccatatgcgg tgtgaaatac cgcacagatg cgtaaggaga aaataccgca 14760tgagagtgca ccatatgcgg tgtgaaatac cgcacagatg cgtaaggaga aaataccgca 14760

tcaggcgctc ttccgcttcc tcgctcactg actcgctgcg ctcggtcgtt cggctgcggc 14820tcaggcgctc ttccgcttcc tcgctcactg actcgctgcg ctcggtcgtt cggctgcggc 14820

gagcggtatc agctcactca aaggcggtaa tacggttatc cacagaatca ggggataacg 14880gagcggtatc agctcactca aaggcggtaa tacggttatc cacagaatca ggggataacg 14880

caggaaagaa catgtgagca aaaggccagc aaaaggccag gaaccgtaaa aaggccgcgt 14940caggaaagaa catgtgagca aaaggccagc aaaaggccag gaaccgtaaa aaggccgcgt 14940

tgctggcgtt tttccatagg ctccgccccc ctgacgagca tcacaaaaat cgacgctcaa 15000tgctggcgtt tttccatagg ctccgccccc ctgacgagca tcacaaaaat cgacgctcaa 15000

gtcagaggtg gcgaaacccg acaggactat aaagatacca ggcgtttccc cctggaagct 15060gtcagaggtg gcgaaacccg acaggactat aaagatacca ggcgtttccc cctggaagct 15060

ccctcgtgcg ctctcctgtt ccgaccctgc cgcttaccgg atacctgtcc gcctttctcc 15120ccctcgtgcg ctctcctgtt ccgaccctgc cgcttaccgg atacctgtcc gcctttctcc 15120

cttcgggaag cgtggcgctt tctcatagct cacgctgtag gtatctcagt tcggtgtagg 15180cttcgggaag cgtggcgctt tctcatagct cacgctgtag gtatctcagt tcggtgtagg 15180

tcgttcgctc caagctgggc tgtgtgcacg aaccccccgt tcagcccgac cgctgcgcct 15240tcgttcgctc caagctgggc tgtgtgcacg aacccccgt tcagcccgac cgctgcgcct 15240

tatccggtaa ctatcgtctt gagtccaacc cggtaagaca cgacttatcg ccactggcag 15300tatccggtaa ctatcgtctt gagtccaacc cggtaagaca cgacttatcg ccactggcag 15300

cagccactgg taacaggatt agcagagcga ggtatgtagg cggtgctaca gagttcttga 15360cagccactgg taacaggatt agcagagcga ggtatgtagg cggtgctaca gagttcttga 15360

agtggtggcc taactacggc tacactagaa ggacagtatt tggtatctgc gctctgctga 15420agtggtggcc taactacggc tacactagaa ggacagtatt tggtatctgc gctctgctga 15420

agccagttac cttcggaaaa agagttggta gctcttgatc cggcaaacaa accaccgctg 15480agccagttac cttcggaaaa agagttggta gctcttgatc cggcaaacaa accaccgctg 15480

gtagcggtgg tttttttgtt tgcaagcagc agattacgcg cagaaaaaaa ggatctcaag 15540gtagcggtgg tttttttgtt tgcaagcagc agattacgcg cagaaaaaaa ggatctcaag 15540

aagatccttt gatcttttct acggggtctg acgctcagtg gaacgaaaac tcacgttaag 15600aagatccttt gatcttttct acggggtctg acgctcagtg gaacgaaaac tcacgttaag 15600

ggattttggt catgcattct aggtactaaa acaattcatc cagtaaaata taatatttta 15660ggattttggt catgcattct aggtactaaa acaattcatc cagtaaaata taatatttta 15660

ttttctccca atcaggcttg atccccagta agtcaaaaaa tagctcgaca tactgttctt 15720ttttctccca atcaggcttg atccccagta agtcaaaaaa tagctcgaca tactgttctt 15720

ccccgatatc ctccctgatc gaccggacgc agaaggcaat gtcataccac ttgtccgccc 15780ccccgatatc ctccctgatc gaccggacgc agaaggcaat gtcataccac ttgtccgccc 15780

tgccgcttct cccaagatca ataaagccac ttactttgcc atctttcaca aagatgttgc 15840tgccgcttct cccaagatca ataaagccac ttactttgcc atctttcaca aagatgttgc 15840

tgtctcccag gtcgccgtgg gaaaagacaa gttcctcttc gggcttttcc gtctttaaaa 15900tgtctcccag gtcgccgtgg gaaaagacaa gttcctcttc gggcttttcc gtctttaaaa 15900

aatcatacag ctcgcgcgga tctttaaatg gagtgtcttc ttcccagttt tcgcaatcca 15960aatcatacag ctcgcgcgga tctttaaatg gagtgtcttc ttcccagttt tcgcaatcca 15960

catcggccag atcgttattc agtaagtaat ccaattcggc taagcggctg tctaagctat 16020catcggccag atcgttattc agtaagtaat ccaattcggc taagcggctg tctaagctat 16020

tcgtataggg acaatccgat atgtcgatgg agtgaaagag cctgatgcac tccgcataca 16080tcgtataggg acaatccgat atgtcgatgg agtgaaagag cctgatgcac tccgcataca 16080

gctcgataat cttttcaggg ctttgttcat cttcatactc ttccgagcaa aggacgccat 16140gctcgataat cttttcaggg ctttgttcat cttcatactc ttccgagcaa aggacgccat 16140

cggcctcact catgagcaga ttgctccagc catcatgccg ttcaaagtgc aggacctttg 16200cggcctcact catgagcaga ttgctccagc catcatgccg ttcaaagtgc aggacctttg 16200

gaacaggcag ctttccttcc agccatagca tcatgtcctt ttcccgttcc acatcatagg 16260gaacaggcag ctttccttcc agccatagca tcatgtcctt ttcccgttcc acatcatagg 16260

tggtcccttt ataccggctg tccgtcattt ttaaatatag gttttcattt tctcccacca 16320tggtcccttt ataccggctg tccgtcattt ttaaatatag gttttcattt tctcccacca 16320

gcttatatac cttagcagga gacattcctt ccgtatcttt tacgcagcgg tatttttcga 16380gcttatatac cttagcagga gacattcctt ccgtatcttt tacgcagcgg tatttttcga 16380

tcagtttttt caattccggt gatattctca ttttagccat ttattatttc cttcctcttt 16440tcagtttttt caattccggt gatattctca ttttagccat ttattatttc cttcctcttt 16440

tctacagtat ttaaagatac cccaagaagc taattataac aagacgaact ccaattcact 16500tctacagtat ttaaagatac cccaagaagc taattataac aagacgaact ccaattcact 16500

gttccttgca ttctaaaacc ttaaatacca gaaaacagct ttttcaaagt tgttttcaaa 16560gttccttgca ttctaaaacc ttaaatacca gaaaacagct ttttcaaagt tgttttcaaa 16560

gttggcgtat aacatagtat cgacggagcc gattttgaaa ccgcggtgat cacaggcagc 16620gttggcgtat aacatagtat cgacggagcc gattttgaaa ccgcggtgat cacaggcagc 16620

aacgctctgt catcgttaca atcaacatgc taccctccgc gagatcatcc gtgtttcaaa 16680aacgctctgt catcgttaca atcaacatgc taccctccgc gagatcatcc gtgtttcaaa 16680

cccggcagct tagttgccgt tcttccgaat agcatcggta acatgagcaa agtctgccgc 16740cccggcagct tagttgccgt tcttccgaat agcatcggta acatgagcaa agtctgccgc 16740

cttacaacgg ctctcccgct gacgccgtcc cggactgatg ggctgcctgt atcgagtggt 16800cttacaacgg ctctcccgct gacgccgtcc cggactgatg ggctgcctgt atcgagtggt 16800

gattttgtgc cgagctgccg gtcggggagc tgttggctgg ct 16842gattttgtgc cgagctgccg gtcggggagc tgttggctgg ct 16842

<210> 2<210> 2

<211> 1368<211> 1368

<212> PRT<212> PRT

<213> Artificial Sequence<213> Artificial Sequence

<400> 2<400> 2

Met Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser ValMet Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val

1 5 10 151 5 10 15

Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys PheGly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe

20 25 3020 25 30

Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu IleLys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile

35 40 4535 40 45

Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Arg Thr Arg LeuGly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Arg Thr Arg Leu

50 55 6050 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile CysLys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys

65 70 75 8065 70 75 80

Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp SerTyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser

85 90 9585 90 95

Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys LysPhe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys

100 105 110100 105 110

His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala TyrHis Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr

115 120 125115 120 125

His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val AspHis Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp

130 135 140130 135 140

Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala HisSer Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His

145 150 155 160145 150 155 160

Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn ProMet Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro

165 170 175165 170 175

Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr TyrAsp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr

180 185 190180 185 190

Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp AlaAsn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala

195 200 205195 200 205

Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu AsnLys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn

210 215 220210 215 220

Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly AsnLeu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn

225 230 235 240225 230 235 240

Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn PheLeu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe

245 250 255245 250 255

Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr AspAsp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp

260 265 270260 265 270

Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala AspAsp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp

275 280 285275 280 285

Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser AspLeu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp

290 295 300290 295 300

Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala SerIle Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser

305 310 315 320305 310 315 320

Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu LysMet Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys

325 330 335325 330 335

Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe PheAla Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe

340 345 350340 345 350

Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala SerAsp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser

355 360 365355 360 365

Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met AspGln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp

370 375 380370 375 380

Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu ArgGly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg

385 390 395 400385 390 395 400

Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His LeuLys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu

405 410 415405 410 415

Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro PheGly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe

420 425 430420 425 430

Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg IleLeu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile

435 440 445435 440 445

Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala TrpPro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp

450 455 460450 455 460

Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu GluMet Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu

465 470 475 480465 470 475 480

Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met ThrVal Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr

485 490 495485 490 495

Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His SerAsn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser

500 505 510500 505 510

Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val LysLeu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys

515 520 525515 520 525

Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu GlnTyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln

530 535 540530 535 540

Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val ThrLys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr

545 550 555 560545 550 555 560

Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe AspVal Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp

565 570 575565 570 575

Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu GlySer Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly

580 585 590580 585 590

Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu AspThr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp

595 600 605595 600 605

Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu ThrAsn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr

610 615 620610 615 620

Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr AlaLeu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala

625 630 635 640625 630 635 640

His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg TyrHis Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr

645 650 655645 650 655

Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg AspThr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp

660 665 670660 665 670

Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly PheLys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe

675 680 685675 680 685

Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr PheAla Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe

690 695 700690 695 700

Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser LeuLys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu

705 710 715 720705 710 715 720

His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys GlyHis Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly

725 730 735725 730 735

Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met GlyIle Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly

740 745 750740 745 750

Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn GlnArg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln

755 760 765755 760 765

Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg IleThr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile

770 775 780770 775 780

Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His ProGlu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro

785 790 795 800785 790 795 800

Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr LeuVal Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu

805 810 815805 810 815

Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn ArgGln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg

820 825 830820 825 830

Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu LysLeu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys

835 840 845835 840 845

Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn ArgAsp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg

850 855 860850 855 860

Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met LysGly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys

865 870 875 880865 870 875 880

Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg LysAsn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys

885 890 895885 890 895

Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu AspPhe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

900 905 910900 905 910

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile ThrLys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr

915 920 925915 920 925

Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr AspLys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp

930 935 940930 935 940

Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys SerGlu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser

945 950 955 960945 950 955 960

Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val ArgLys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg

965 970 975965 970 975

Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala ValGlu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val

980 985 990980 985 990

Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu PheVal Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe

995 1000 1005995 1000 1005

Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile AlaVal Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala

1010 1015 10201010 1015 1020

Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe PheLys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe

1025 1030 10351025 1030 1035

Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu AlaTyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala

1040 1045 10501040 1045 1050

Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly GluAsn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu

1055 1060 10651055 1060 1065

Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr ValThr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val

1070 1075 10801070 1075 1080

Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys ThrArg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr

1085 1090 10951085 1090 1095

Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Arg Pro LysGlu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Arg Pro Lys

1100 1105 11101100 1105 1110

Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp ProArg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro

1115 1120 11251115 1120 1125

Lys Lys Tyr Gly Gly Phe Leu Trp Pro Thr Val Ala Tyr Ser ValLys Lys Tyr Gly Gly Phe Leu Trp Pro Thr Val Ala Tyr Ser Val

1130 1135 11401130 1135 1140

Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu LysLeu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys

1145 1150 11551145 1150 1155

Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser SerSer Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser

1160 1165 11701160 1165 1170

Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr LysPhe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys

1175 1180 11851175 1180 1185

Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser LeuGlu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu

1190 1195 12001190 1195 1200

Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala LysPhe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Lys

1205 1210 12151205 1210 1215

Gln Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr ValGln Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val

1220 1225 12301220 1225 1230

Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly SerAsn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser

1235 1240 12451235 1240 1245

Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His LysPro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys

1250 1255 12601250 1255 1260

His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser LysHis Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys

1265 1270 12751265 1270 1275

Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser AlaArg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala

1280 1285 12901280 1285 1290

Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu AsnTyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn

1295 1300 13051295 1300 1305

Ile Ile His Leu Phe Thr Leu Thr Arg Leu Gly Ala Pro Arg AlaIle Ile His Leu Phe Thr Leu Thr Arg Leu Gly Ala Pro Arg Ala

1310 1315 13201310 1315 1320

Phe Lys Tyr Phe Asp Thr Thr Ile Asp Pro Lys Gln Tyr Arg SerPhe Lys Tyr Phe Asp Thr Thr Ile Asp Pro Lys Gln Tyr Arg Ser

1325 1330 13351325 1330 1335

Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile ThrThr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr

1340 1345 13501340 1345 1350

Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly AspGly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp

1355 1360 13651355 1360 1365

<210> 3<210> 3

<211> 208<211> 208

<212> PRT<212> PRT

<213> Artificial Sequence<213> Artificial Sequence

<400> 3<400> 3

Met Thr Asp Ala Glu Tyr Val Arg Ile His Glu Lys Leu Asp Ile TyrMet Thr Asp Ala Glu Tyr Val Arg Ile His Glu Lys Leu Asp Ile Tyr

1 5 10 151 5 10 15

Thr Phe Lys Lys Gln Phe Phe Asn Asn Lys Lys Ser Val Ser His ArgThr Phe Lys Lys Gln Phe Phe Asn Asn Lys Lys Ser Val Ser His Arg

20 25 3020 25 30

Cys Tyr Val Leu Phe Glu Leu Lys Arg Arg Gly Glu Arg Arg Ala CysCys Tyr Val Leu Phe Glu Leu Lys Arg Arg Gly Glu Arg Arg Ala Cys

35 40 4535 40 45

Phe Trp Gly Tyr Ala Val Asn Lys Pro Gln Ser Gly Thr Glu Arg GlyPhe Trp Gly Tyr Ala Val Asn Lys Pro Gln Ser Gly Thr Glu Arg Gly

50 55 6050 55 60

Ile His Ala Glu Ile Phe Ser Ile Arg Lys Val Glu Glu Tyr Leu ArgIle His Ala Glu Ile Phe Ser Ile Arg Lys Val Glu Glu Tyr Leu Arg

65 70 75 8065 70 75 80

Asp Asn Pro Gly Gln Phe Thr Ile Asn Trp Tyr Ser Ser Trp Ser ProAsp Asn Pro Gly Gln Phe Thr Ile Asn Trp Tyr Ser Ser Trp Ser Pro

85 90 9585 90 95

Cys Ala Asp Cys Ala Glu Lys Ile Leu Glu Trp Tyr Asn Gln Glu LeuCys Ala Asp Cys Ala Glu Lys Ile Leu Glu Trp Tyr Asn Gln Glu Leu

100 105 110100 105 110

Arg Gly Asn Gly His Thr Leu Lys Ile Trp Ala Cys Lys Leu Tyr TyrArg Gly Asn Gly His Thr Leu Lys Ile Trp Ala Cys Lys Leu Tyr Tyr

115 120 125115 120 125

Glu Lys Asn Ala Arg Asn Gln Ile Gly Leu Trp Asn Leu Arg Asp AsnGlu Lys Asn Ala Arg Asn Gln Ile Gly Leu Trp Asn Leu Arg Asp Asn

130 135 140130 135 140

Gly Val Gly Leu Asn Val Met Val Ser Glu His Tyr Gln Cys Cys ArgGly Val Gly Leu Asn Val Met Val Ser Glu His Tyr Gln Cys Cys Arg

145 150 155 160145 150 155 160

Lys Ile Phe Ile Gln Ser Ser His Asn Gln Leu Asn Glu Asn Arg TrpLys Ile Phe Ile Gln Ser Ser His Asn Gln Leu Asn Glu Asn Arg Trp

165 170 175165 170 175

Leu Glu Lys Thr Leu Lys Arg Ala Glu Lys Trp Arg Ser Glu Leu SerLeu Glu Lys Thr Leu Lys Arg Ala Glu Lys Trp Arg Ser Glu Leu Ser

180 185 190180 185 190

Ile Met Ile Gln Val Lys Ile Leu His Thr Thr Lys Ser Pro Ala ValIle Met Ile Gln Val Lys Ile Leu His Thr Thr Lys Ser Pro Ala Val

195 200 205195 200 205

<210> 4<210> 4

<211> 83<211> 83

<212> PRT<212> PRT

<213> Artificial Sequence<213> Artificial Sequence

<400> 4<400> 4

Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu ValThr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val

1 5 10 151 5 10 15

Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val IleIle Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile

20 25 3020 25 30

Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp GluGly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu

35 40 4535 40 45

Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu TyrSer Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr

50 55 6050 55 60

Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys IleLys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile

65 70 75 8065 70 75 80

Lys Met LeuLys Met Leu

<210> 5<210> 5

<211> 2128<211> 2128

<212> DNA<212> DNA

<213> Artificial Sequence<213> Artificial Sequence

<400> 5<400> 5

tggaatcggc agcaaaggat tttttcctgt agttttccca caaccatttt ttaccatccg 60tggaatcggc agcaaaggat tttttcctgt agttttccca caaccatttt ttaccatccg 60

aatgatagga taggaaaaat atccaagtga acagtattcc tataaaattc ccgtaaaaag 120aatgatagga taggaaaaat atccaagtga acagtattcc tataaaattc ccgtaaaaag 120

cctgcaatcc gaatgagccc tgaagtctga actagccggt cacctgtaca ggctatcgag 180cctgcaatcc gaatgagccc tgaagtctga actagccggt cacctgtaca ggctatcgag 180

atgccataca agagacggta gtaggaacta ggaagacgat ggttgattcg tcaggcgaaa 240atgccataca agagacggta gtaggaacta ggaagacgat ggttgattcg tcaggcgaaa 240

tcgtcgtcct gcagtcgcat ctatgggcct ggacggaata ggggaaaaag ttggccggat 300tcgtcgtcct gcagtcgcat ctatgggcct ggacggaata ggggaaaaag ttggccggat 300

aggagggaaa ggcccaggtg cttacgtgcg aggtaggcct gggctctcag cacttcgatt 360aggagggaaa ggcccaggtg cttacgtgcg aggtaggcct gggctctcag cacttcgatt 360

cgttggcacc ggggtaggat gcaatagaga gcaacgttta gtaccacctc gcttagctag 420cgttggcacc ggggtaggat gcaatagaga gcaacgttta gtaccacctc gcttagctag 420

agcaaactgg actgccttat atgcgcgggt gctggcttgg ctgccgaaca aagcaccagt 480agcaaactgg actgccttat atgcgcgggt gctggcttgg ctgccgaaca aagcaccagt 480

ggtctagtgg tagaatagta ccctgccacg gtacagaccc gggttcgatt cccggctggt 540ggtctagtgg tagaatagta ccctgccacg gtacagaccc gggttcgatt cccggctggt 540

gcaatctcaa gatcttataa cttgtttcag agctatgctg gaaacagcat agcaagttga 600gcaatctcaa gatcttataa cttgtttcag agctatgctg gaaacagcat agcaagttga 600

aataaggcta gtccgttatc aacttgaaaa agtggcaccg agtcggtgct tttttttgca 660aataaggcta gtccgttatc aacttgaaaa agtggcaccg agtcggtgct tttttttgca 660

agaacgaact aagccggaca aaaaaaaaag gagcacatat acaaaccggt tttattcatg 720agaacgaact aagccggaca aaaaaaaaag gagcacatat acaaaccggttttattcatg 720

aatggtcacg atggatgatg gggctcagac ttgagctacg aggccgcagg cgagagaagc 780aatggtcacg atggatgatg gggctcagac ttgagctacg aggccgcagg cgagagaagc 780

ctagtgtgct ctctgcttgt ttgggccgta acggaggata cggccgacga gcgtgtacta 840ctagtgtgct ctctgcttgt ttgggccgta acggaggata cggccgacga gcgtgtacta 840

ccgcgcggga tgccgctggg cgctgcgggg gccgttggat ggggatcggt gggtcgcggg 900ccgcgcggga tgccgctggg cgctgcgggg gccgttggat ggggatcggt gggtcgcggg 900

agcgttgagg ggagacaggt ttagtaccac ctcgcctacc gaacaatgaa gaacccacct 960agcgttgagg ggagacaggt ttagtaccac ctcgcctacc gaacaatgaa gaacccacct 960

tataaccccg cgcgctgccg cttgtgttgg gatccaacaa agcaccagtg gtctagtggt 1020tataaccccg cgcgctgccg cttgtgttgg gatccaacaa agcaccagtg gtctagtggt 1020

agaatagtac cctgccacgg tacagacccg ggttcgattc ccggctggtg caccaccatg 1080agaatagtac cctgccacgg tacagacccg ggttcgattc ccggctggtg caccaccatg 1080

tcggctctca ccgtttcaga gctatgctgg aaacagcata gcaagttgaa ataaggctag 1140tcggctctca ccgtttcaga gctatgctgg aaacagcata gcaagttgaa ataaggctag 1140

tccgttatca acttgaaaaa gtggcaccga gtcggtgctt tttttctcat tagcggtatg 1200tccgttatca acttgaaaaa gtggcaccga gtcggtgctt tttttctcat tagcggtatg 1200

catgttggta gaagtcggag atgtaaataa ttttcattat ataaaaaagg tacttcgaga 1260catgttggta gaagtcggag atgtaaataa ttttcattat ataaaaaagg tacttcgaga 1260

aaaataaatg catacgaatt aattcttttt atgtttttta aaccaagtat atagaattta 1320aaaataaatg catacgaatt aattcttttt atgtttttta aaccaagtat atagaattta 1320

ttgatggtta aaatttcaaa aatatgacga gagaaaggtt aaacgtacgg catatacttc 1380ttgatggtta aaatttcaaa aatatgacga gagaaaggtt aaacgtacgg catatacttc 1380

tgaacagaga gggaatatgg ggtttttgtt gctcccaaca attcttaagc acgtaaagga 1440tgaacagaga gggaatatgg ggtttttgtt gctcccaaca attcttaagc acgtaaagga 1440

aaaaagcaca ttatccacat tgtacttcca gagatatgta cagcattacg taggtacgtt 1500aaaaagcaca ttatccacat tgtacttcca gagatatgta cagcattacg taggtacgtt 1500

ttctttttct tcccggagag atgatacaat aatcatgtaa acccagaatt taaaaaatat 1560ttctttttct tcccggagag atgatacaat aatcatgtaa acccagaatt taaaaaatat 1560

tctttactat aaaaatttta attagggaac gtattatttt ttacatgaca ccttttgaga 1620tctttactat aaaaatttta attagggaac gtattatttt ttacatgaca ccttttgaga 1620

aagagggact tgtaatatgg gacaaatgaa caatttctaa gaaatgggca tatgactctc 1680aagagggact tgtaatatgg gacaaatgaa caatttctaa gaaatgggca tatgactctc 1680

agtacaatgg accaaattcc ctccagtcgg cccagcaata caaagggaaa gaaatgaggg 1740agtacaatgg accaaattcc ctccagtcgg cccagcaata caaagggaaa gaaatgaggg 1740

ggcccacagg ccacggccca cttttctccg tggtggggag atccagctag aggtccggcc 1800ggcccacagg ccacggccca cttttctccg tggtggggag atccagctag aggtccggcc 1800

cacaagtggc ccttgccccg tgggacggtg ggattgcaga gcgcgtgggc ggaaacaaca 1860cacaagtggc ccttgccccg tgggacggtg ggattgcaga gcgcgtgggc ggaaacaaca 1860

gtttagtacc acctcgctca cgcaacgacg cgaccacttg cttataagct gctgcgctga 1920gtttagtacc acctcgctca cgcaacgacg cgaccacttg cttataagct gctgcgctga 1920

ggctcaggga tccaacaaag caccagtggt ctagtggtag aatagtaccc tgccacggta 1980ggctcaggga tccaacaaag caccagtggt ctagtggtag aatagtaccc tgccacggta 1980

cagacccggg ttcgattccc ggctggtgca acgacgcaac cacggtaaga gtttcagagc 2040cagacccggg ttcgattccc ggctggtgca acgacgcaac cacggtaaga gtttcagagc 2040

tatgctggaa acagcatagc aagttgaaat aaggctagtc cgttatcaac ttgaaaaagt 2100tatgctggaa acagcatagc aagttgaaat aaggctagtc cgttatcaac ttgaaaaagt 2100

ggcaccgagt cggtgctttt tttttttt 2128ggcaccgagt cggtgctttttttttttt 2128

<210> 6<210> 6

<211> 661<211> 661

<212> DNA<212> DNA

<213> Artificial Sequence<213> Artificial Sequence

<400> 6<400> 6

gcactacctc agccacaacg ctggtttcag agctatgctg gaaacagcat agcaagttga 600gcactacctc agccacaacg ctggtttcag agctatgctg gaaacagcat agcaagttga 600

aataaggcta gtccgttatc aacttgaaaa agtggcaccg agtcggtgct tttttttttt 660aataaggcta gtccgttatc aacttgaaaa agtggcaccg agtcggtgct tttttttttt 660

t 661t 661

Claims

1. A method of mutating C to T in a plant genome target sequence, as follows 1) or 2):

Described 1) includes the following steps: introducing SpRYn, cytosine deaminase, sgRNA and UGI into plants to mutate C to T in the plant genome target sequence;

Described 2) includes the following steps: introducing the coding gene of SpRYn, the coding gene of cytosine deaminase, the DNA molecule for transcribing sgRNA and the coding gene of UGI into the plant, so that the SpRYn, the cytosine deaminase, Both the sgRNA and the UGI are expressed, thereby mutating the C in the plant genome target sequence to T;

The SpRYn is a protein whose amino acid sequence is shown in sequence 2;

The cytosine deaminase is PmCDA1; the PmCDA1 is a protein whose amino acid sequence is shown in sequence 3;

The UGI is a protein whose amino acid sequence is shown in sequence 4;

The sgRNA is tRNA-esgRNA; the tRNA is an RNA molecule obtained by replacing T in positions 597-673 of sequence 1 with U; the esgRNA skeleton is obtained by replacing T with U in positions 694-779 of sequence 1 The resulting RNA molecule;

The sgRNA targeting target sequence;

The PAM sequence of the target sequence is NAN or NCN or NTN; N is A, T, C or G;

The plant is rice.

2. The method according to claim 1, characterized in that the coding gene of SpRYn is the DNA molecule shown at positions 3167-7267 of Sequence 1.

3. The method according to claim 1, characterized in that: the coding gene of PmCDA1 is the DNA molecule shown at positions 7553-8176 of Sequence 1.

4. The method according to claim 1, characterized in that: the encoding gene of UGI is the DNA molecule shown at positions 8210-8458 of Sequence 1.

5. Application of the method of any one of claims 1 to 4 in base editing of plant genomes; the plant is rice.

6. Application of the method of any one of claims 1 to 4 in preparing plant mutants; the plant is rice.

7. Application of complete set of reagents in mutating C to T in plant genome target sequence or preparing products to mutate C in plant genome target sequence to T;

The set of reagents includes the SpRYn described in any one of claims 1-4, the cytosine deaminase described in any one of claims 1-4, the sgRNA described in any one of claims 1-4, and the sgRNA described in any one of claims 1-4. UGI as described in any one of 1-4;

The plant is rice.

8. Application of complete sets of reagents in plant genome base editing or preparation of plant genome base editing products;

The plant is rice.

9. Application of complete set of reagents in preparing plant mutants;

The plant is rice.