CN115369098A

CN115369098A - A novel CRISPR-associated transposase

Info

Publication number: CN115369098A
Application number: CN202110532731.0A
Authority: CN
Inventors: 杨晟; 杨思琪; 张译文; 徐佳琪; 张姣
Original assignee: Center for Excellence in Molecular Plant Sciences of CAS
Current assignee: Center for Excellence in Molecular Plant Sciences of CAS
Priority date: 2021-05-17
Filing date: 2021-05-17
Publication date: 2022-11-22
Also published as: WO2022242464A1

Abstract

本发明公开了一种新型CRISPR相关转座酶，其来源于半透明假交替单胞菌KMM520，可识别16种PAM，可以一次将货物基因插入至8个位点，效率100％，高于来源于霍乱弧菌Tn6677的CRISPR相关转座酶；转座15.4kb货物基因至对应靶点的效率为100％，能够与来源于霍乱弧菌Tn6677的CRISPR相关转座酶在同一大肠杆菌内使用，且互不干扰地发挥功能，应用前景广阔。The invention discloses a novel CRISPR-related transposase, which is derived from Pseudoalteromonas translucent KMM520, can recognize 16 kinds of PAMs, and can insert cargo genes into 8 sites at one time, with an efficiency of 100%, which is higher than that of the source CRISPR-associated transposase from Vibrio cholerae Tn6677; the efficiency of transposing the 15.4kb cargo gene to the corresponding target site is 100%, and can be used in the same E. coli with the CRISPR-associated transposase from Vibrio cholerae Tn6677, and They can function without interfering with each other, and have broad application prospects.

Description

A novel CRISPR-associated transposase

技术领域technical field

本发明属于基因编辑技术领域，具体地说，涉及一种新型CRISPR相关转座酶及其在基因编辑中的应用。The invention belongs to the technical field of gene editing, and in particular relates to a novel CRISPR-related transposase and its application in gene editing.

背景技术Background technique

在代谢工程中，可以通过调整酶基因的剂量与比值使代谢途径不同酶或不同代谢途径之间协调一致最大化总体反应速度。在细菌中提高基因表达盒剂量的最常用方式是质粒，但会遇到遗传稳定性问题。将基因表达盒逐个整合到细菌染色体上则耗时长，难以快速测试诸多构建方案。CRISPR-Cas虽然可以切割基因组多拷贝序列或以crRNA阵列同时靶向多个不同序列，但受限于双链断裂的修复效率，难以一次性插入3个拷贝以上的基因表达盒。因此，需要一种更好的方法获得不同基因剂量的组合。In metabolic engineering, by adjusting the dosage and ratio of enzyme genes, different enzymes in metabolic pathways or different metabolic pathways can be coordinated to maximize the overall reaction speed. The most common means of increasing the dosage of gene expression cassettes in bacteria are plasmids, but suffer from genetic stability issues. Individual integration of gene expression cassettes into bacterial chromosomes is time-consuming, making rapid testing of many constructions difficult. Although CRISPR-Cas can cut multiple copies of the genome or simultaneously target multiple different sequences with crRNA arrays, it is limited by the repair efficiency of double-strand breaks and it is difficult to insert more than 3 copies of gene expression cassettes at one time. Therefore, a better way to obtain combinations of different gene dosages is needed.

2019年，Broad研究所张锋研究组和哥伦比亚大学Sam Sternberg研究组公布了两项相似的研究成果：利用细菌转座基因，将DNA序列精确地插入基因组而不切割DNA。其中张锋研究组从蓝藻中抽提了一种转座酶，将其命名为CAST，即CRISPR相关转座酶(CRISPR-associated transposase)。Sternberg研究组在霍乱弧菌中发现了一个独特的转座基因后，开发了一种名为INTEGRATE(Insertion of transposable elements by guide RNA-assisted targeting，引导RNA辅助靶向的转座元件插入)的基因编辑工具，可以在基因组中插入大片段基因而不引入DNA断裂。该新技术INTEGRATE利用转座基因将DNA序列插入基因组而不切割DNA。CAST系统和INTEGRATE系统都可不依赖于同源重组将DNA片段通过转座整合到大肠杆菌染色体预设的位点，无抗性标记残留，无双链DNA断口。In 2019, the Zhang Feng research group of the Broad Institute and the Sam Sternberg research group of Columbia University announced two similar research results: using bacterial transposable genes to precisely insert DNA sequences into the genome without cutting DNA. Among them, Zhang Feng's research group extracted a transposase from cyanobacteria and named it CAST, which stands for CRISPR-associated transposase (CRISPR-associated transposase). After discovering a unique transposable gene in Vibrio cholerae, the Sternberg research group developed a gene called INTEGRATE (Insertion of transposable elements by guide RNA-assisted targeting, guide RNA-assisted targeted transposable element insertion) Editing tools that can insert large stretches of genes in the genome without introducing DNA breaks. The new technology, INTEGRATE, utilizes transposable genes to insert DNA sequences into the genome without cutting the DNA. Both the CAST system and the INTEGRATE system can integrate DNA fragments into preset sites on the E. coli chromosome through transposition without relying on homologous recombination, without residual resistance markers and double-stranded DNA breaks.

发明人以CAST系统和INTEGRATE系统为基础，开发出了一种的多拷贝基因插入系统MUCICAT(参见专利文献CN202010083919.7)可以在5天得到染色体插入10拷贝货物基因(Cargo gene)的菌株，具有可编辑、快速、无marker、定点等优势，目前MUCICAT已被成功应用于酶工程菌株与代谢工程菌株的构建(Zhang,Y.,Multicopy chromosomal integrationusing CRISPR-associated transposases.Acs Synthetic Biology,2020.)。然而，代谢工程与合成生物学通常涉及多酶或多条途径表达水平的优化。在一轮转座中，基于单一CRISPR相关转座酶的MUCICAT只能探索负载单或多基因的单一货物基因最优剂量，不能筛选多基因或途径的基因剂量的最优配比。目前还没有平行扩增多基因/途径的染色体拷贝数技术的报道。包含多个彼此正交的CRISPR相关转座酶的MUCICAT技术则有希望实现单一细胞中多基因或途径的同时独立扩增，形成含不同基因剂量配比的菌株文库从而筛选最优多基因或途径的剂量配比。Based on the CAST system and the INTEGRATE system, the inventor developed a multi-copy gene insertion system MUCICAT (see patent document CN202010083919.7), which can obtain a strain with 10 copies of the cargo gene (Cargo gene) inserted into the chromosome in 5 days, with Editable, fast, marker-free, fixed-point and other advantages, MUCICAT has been successfully applied to the construction of enzyme engineering strains and metabolic engineering strains (Zhang, Y., Multicopy chromosome integration using CRISPR-associated transposases.Acs Synthetic Biology, 2020.). However, metabolic engineering and synthetic biology usually involve optimization of the expression levels of multiple enzymes or pathways. In a round of transposition, MUCICAT based on a single CRISPR-related transposase can only explore the optimal dosage of a single cargo gene carrying a single or multiple genes, and cannot screen the optimal ratio of gene dosages for multiple genes or pathways. There are no reports of chromosomal copy number techniques for parallel amplification of multiple genes/pathways. MUCICAT technology containing multiple CRISPR-related transposases that are orthogonal to each other is expected to achieve simultaneous and independent amplification of multiple genes or pathways in a single cell, forming a strain library with different gene dosage ratios to screen the optimal multiple genes or pathways dosage ratio.

已知活性的CRISPR转座酶中，仅来源于霍乱弧菌Tn6677的I-F3型CRISPR转座酶插入效率和中靶率均高(Klompe,S.E.,et al.,Transposon-encoded CRISPR–Cas systemsdirect RNA-guided DNA integration.Nature,2019.571(7764):p.219-225.)，来源于霍夫曼尼斯藻和柱状鱼腥藻的V-K型(Jonathan Strecker et al.,RNA-guided DNAinsertion withCRISPR-associated transposases.Science，2019)、杀鲑气单胞菌S44的I-F3型(Michael T.Petassi et al.,Guide RNA Categorization Enables Target SiteChoice in Tn7-CRISPR-Cas Transposons.Cell,2020)、多变鱼腥藻和Peltigeramembranacea cyanobiont 210A的I-B型均效率低或/和脱靶率高(Makoto Saito,A.L.andJonathan Strecker,Han Altae-Tran,Rhiannon K.Macrae,Feng Zhang,Dual modes ofCRISPR-associated transposon homing.Cell,2021.)。Among the CRISPR transposases with known activity, only the I-F3 type CRISPR transposase derived from Vibrio cholerae Tn6677 had high insertion efficiency and on-target rate (Klompe, S.E., et al., Transposon-encoded CRISPR–Cas systemsdirect RNA-guided DNA integration.Nature, 2019.571(7764):p.219-225.), derived from the V-K type of Hofmannis sp. and Anabaena cylindrica (Jonathan Strecker et al., RNA-guided DNAinsertion with CRISPR-associated transposases.Science, 2019), I-F3 types of Aeromonas salmonicida S44 (Michael T.Petassi et al., Guide RNA Categorization Enables Target SiteChoice in Tn7-CRISPR-Cas Transposons.Cell, 2020), changeable fish Both types I-B of Anabaena and Peltigeramembranacea cyanobiont 210A have low efficiency or/and high off-target rate (Makoto Saito, A.L. and Jonathan Strecker, Han Altae-Tran, Rhiannon K. Macrae, Feng Zhang, Dual modes of CRISPR-associated transposon homing. Cell, 2021 .).

发明内容Contents of the invention

根据现有技术的CRISPR转座酶的特点，理想的是，若有其他新型高效的CRISPR相关转座酶，或可与MUCICAT组成正交的CRISPR相关转座系统，应用于复杂的代谢工程与合成生物学设计中。为了实现这一目的，发明人对于其他微生物来源的CRISPR转座酶进行了广泛筛选，发现了来源于半透明假交替单胞菌KMM520(Pseudoalteromonas translucidaKMM520)的I-F3型CRISPR转座系统，其在大肠杆菌中具有不亚于、甚至更优于霍乱弧菌Tn6677的插入效率，且能与霍乱弧菌Tn6677互不干扰地分别靶向插入eda-purT间和lacZ位点。该新型CRISPR转座系统可识别所有16种PAM，没有PAM依赖性。According to the characteristics of the existing CRISPR transposases, ideally, if there are other new and efficient CRISPR-related transposases, or an orthogonal CRISPR-related transposase system can be formed with MUCICAT, it can be applied to complex metabolic engineering and synthesis in biological design. In order to achieve this goal, the inventors conducted extensive screening of CRISPR transposases from other microorganisms, and found a type I-F3 CRISPR transposition system derived from Pseudoalteromonas translucida KMM520, which was found in The insertion efficiency in Escherichia coli is no less than or even better than that of Vibrio cholerae Tn6677, and can be inserted into the eda-purT and lacZ sites without interfering with Vibrio cholerae Tn6677, respectively. This novel CRISPR transposition system recognizes all 16 PAMs without PAM dependence.

因此，本发明的第一个方面在于提供一种CRISPR相关转座酶，其包括选自下组的多肽：来源于假交替单胞菌属细菌的转座酶蛋白tnsA、来源于假交替单胞菌属细菌的转座酶蛋白tnsB、来源于假交替单胞菌属细菌的转座酶蛋白tnsC、来源于假交替单胞菌属细菌的转座酶蛋白tniQ、来源于假交替单胞菌属细菌的核酸酶蛋白Cas5/8、来源于假交替单胞菌属细菌的核酸酶蛋白Cas6、来源于假交替单胞菌属细菌的核酸酶蛋白Cas7。Therefore, the first aspect of the present invention is to provide a CRISPR-related transposase, which includes a polypeptide selected from the group consisting of the transposase protein tnsA derived from Pseudoalteromonas bacteria, the transposase protein tnsA derived from Pseudoalteromonas Transposase protein tnsB from bacteria of the genus Pseudoalteromonas, transposase protein tnsC from bacteria of the genus Pseudoalteromonas, transposase protein tniQ from bacteria of the genus Pseudoalteromonas, Bacterial nuclease protein Cas5/8, nuclease protein Cas6 derived from bacteria of the genus Pseudoalteromonas, nuclease protein Cas7 derived from bacteria of the genus Pseudoalteromonas.

优选地，所述假交替单胞菌属细菌是半透明假交替单胞菌。更优选所述半透明假交替单胞菌是半透明假交替单胞菌KMM520(Pseudoalteromonas translucida KMM520)。Preferably, the bacteria of the genus Pseudoalteromonas is Pseudoalteromonas translucidum. More preferably, the Pseudoalteromonas translucida is Pseudoalteromonas translucida KMM520.

在一种具体的实施方式中，上述CRISPR相关转座酶优选包括选自下组的多肽：In a specific embodiment, the above-mentioned CRISPR-related transposase preferably includes a polypeptide selected from the group consisting of:

tnsA：其为具有SEQ ID NO:1氨基酸序列的多肽，或者与SEQ ID NO:1有95％以上同源性、优选地98％以上同源性、更优地99％以上同源性、且功能相同的多肽；tnsA: it is a polypeptide having the amino acid sequence of SEQ ID NO: 1, or has more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology with SEQ ID NO: 1, and Functionally identical polypeptides;

tnsB：其为具有SEQ ID NO:2氨基酸序列的多肽，或者与SEQ ID NO:2有95％以上同源性、优选地98％以上同源性、更优地99％以上同源性、且功能相同的多肽；tnsB: it is a polypeptide having the amino acid sequence of SEQ ID NO: 2, or has more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology with SEQ ID NO: 2, and Functionally identical polypeptides;

tnsC：其为具有SEQ ID NO:3氨基酸序列的多肽，或者与SEQ ID NO:3有95％以上同源性、优选地98％以上同源性、更优地99％以上同源性、且功能相同的多肽；tnsC: it is a polypeptide having the amino acid sequence of SEQ ID NO:3, or has more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology with SEQ ID NO:3, and Functionally identical polypeptides;

tniQ：其为具有SEQ ID NO:4氨基酸序列的多肽，或者与SEQ ID NO:4有95％以上同源性、优选地98％以上同源性、更优地99％以上同源性、且功能相同的多肽；tniQ: it is a polypeptide having the amino acid sequence of SEQ ID NO:4, or has more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology with SEQ ID NO:4, and Functionally identical polypeptides;

Cas5/8：其为具有SEQ ID NO:5氨基酸序列的多肽，或者与SEQ ID NO:5有95％以上同源性、优选地98％以上同源性、更优地99％以上同源性、且功能相同的多肽；Cas5/8: It is a polypeptide having the amino acid sequence of SEQ ID NO:5, or has more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology with SEQ ID NO:5 , and a polypeptide with the same function;

Cas6：其为具有SEQ ID NO:6氨基酸序列的多肽，或者与SEQ ID NO:6有95％以上同源性、优选地98％以上同源性、更优地99％以上同源性、且功能相同的多肽；Cas6: it is a polypeptide having the amino acid sequence of SEQ ID NO:6, or has more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology with SEQ ID NO:6, and Functionally identical polypeptides;

Cas7：其为具有SEQ ID NO:7氨基酸序列的多肽，或者与SEQ ID NO:7有95％以上同源性、优选地98％以上同源性、更优地99％以上同源性、且功能相同的多肽。Cas7: it is a polypeptide having the amino acid sequence of SEQ ID NO: 7, or has more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology with SEQ ID NO: 7, and functionally identical peptides.

上述核酸酶Cas5/8、Cas7、Cas6是I型CRISPR系统的Cascade复合物，与转座酶tnsABC和tniQ关联变成CRISPR相关转座酶。这些多肽均来源于半透明假交替单胞菌KMM520(Pseudoalteromonas translucida KMM520)。The above-mentioned nucleases Cas5/8, Cas7, and Cas6 are the Cascade complexes of the type I CRISPR system, which are associated with the transposases tnsABC and tniQ to become CRISPR-associated transposases. These polypeptides are all derived from Pseudoalteromonas translucida KMM520 (Pseudoalteromonas translucida KMM520).

本发明的第二个方面提供了编码上述多肽的基因。The second aspect of the present invention provides a gene encoding the above-mentioned polypeptide.

在一种具体的实施方式中，编码具有SEQ ID NO:1氨基酸序列的多肽tnsA的基因是核苷酸序列SEQ ID NO:8，或者与SEQ ID NO:8有80％以上同源性、优选地85％以上同源性、更优地90％以上同源性、更优地95％以上同源性的多核苷酸；In a specific embodiment, the gene encoding the polypeptide tnsA having the amino acid sequence of SEQ ID NO: 1 is the nucleotide sequence of SEQ ID NO: 8, or has more than 80% homology with SEQ ID NO: 8, preferably Polynucleotides with more than 85% homology, more preferably more than 90% homology, more preferably more than 95% homology;

编码具有SEQ ID NO:2氨基酸序列的多肽tnsB的基因是核苷酸序列SEQ ID NO:9，或者与SEQ ID NO:9有80％以上同源性、优选地85％以上同源性、更优地90％以上同源性、更优地95％以上同源性的多核苷酸；The gene encoding the polypeptide tnsB having the amino acid sequence of SEQ ID NO: 2 is the nucleotide sequence of SEQ ID NO: 9, or has more than 80% homology, preferably more than 85% homology, more Polynucleotides with preferably more than 90% homology, more preferably more than 95% homology;

编码具有SEQ ID NO:3氨基酸序列的多肽tnsC的基因是核苷酸序列SEQ ID NO:10，或者与SEQ ID NO:10有80％以上同源性、优选地85％以上同源性、更优地90％以上同源性、更优地95％以上同源性的多核苷酸；The gene encoding the polypeptide tnsC having the amino acid sequence of SEQ ID NO:3 is the nucleotide sequence of SEQ ID NO:10, or has more than 80% homology, preferably more than 85% homology, more homology with SEQ ID NO:10 Polynucleotides with preferably more than 90% homology, more preferably more than 95% homology;

编码具有SEQ ID NO:4氨基酸序列的多肽tniQ的基因是核苷酸序列SEQ ID NO:11，或者与SEQ ID NO:11有80％以上同源性、优选地85％以上同源性、更优地90％以上同源性、更优地95％以上同源性的多核苷酸；The gene encoding the polypeptide tniQ having the amino acid sequence of SEQ ID NO: 4 is the nucleotide sequence of SEQ ID NO: 11, or has more than 80% homology, preferably more than 85% homology, more Polynucleotides with preferably more than 90% homology, more preferably more than 95% homology;

编码具有SEQ ID NO:5氨基酸序列的多肽Cas5/8的基因是核苷酸序列SEQ ID NO:12，或者与SEQ ID NO:12有80％以上同源性、优选地85％以上同源性、更优地90％以上同源性、更优地95％以上同源性的多核苷酸；The gene encoding the polypeptide Cas5/8 having the amino acid sequence of SEQ ID NO:5 is the nucleotide sequence of SEQ ID NO:12, or has more than 80% homology, preferably more than 85% homology with SEQ ID NO:12 , more preferably a polynucleotide with more than 90% homology, more preferably more than 95% homology;

编码具有SEQ ID NO:6氨基酸序列的多肽Cas6的基因是核苷酸序列SEQ ID NO:13，或者与SEQ ID NO:13有80％以上同源性、优选地85％以上同源性、更优地90％以上同源性、更优地95％以上同源性的多核苷酸；The gene encoding the polypeptide Cas6 having the amino acid sequence of SEQ ID NO: 6 is the nucleotide sequence SEQ ID NO: 13, or has more than 80% homology, preferably more than 85% homology, more homology with SEQ ID NO: 13 Polynucleotides with preferably more than 90% homology, more preferably more than 95% homology;

编码具有SEQ ID NO:7氨基酸序列的多肽Cas7的基因是核苷酸序列SEQ ID NO:14，或者与SEQ ID NO:14有80％以上同源性、优选地85％以上同源性、更优地90％以上同源性、更优地95％以上同源性的多核苷酸。The gene encoding the polypeptide Cas7 having the amino acid sequence of SEQ ID NO: 7 is the nucleotide sequence SEQ ID NO: 14, or has more than 80% homology, preferably more than 85% homology, more homology with SEQ ID NO: 14 It is preferably a polynucleotide with a homology of more than 90%, more preferably a homology of more than 95%.

本发明的第三个方面在于提供一种CRISPR转座子系统。如同转座系统INTEGRATE系统或者CAST系统，本发明的CRISPR转座子系统也包括质粒pQCascade、辅助质粒pTns、和携带货物基因的辅助质粒pDonor。这三种质粒中的任意两种甚至可以合并成一种质粒、甚至这三种质粒可以合并一起形成一种质粒。具体而言，The third aspect of the present invention is to provide a CRISPR transposon system. Like the transposition system INTEGRATE system or CAST system, the CRISPR transposon system of the present invention also includes plasmid pQCascade, helper plasmid pTns, and helper plasmid pDonor carrying cargo genes. Any two of these three plasmids can even be combined into one plasmid, and even these three plasmids can be combined together to form one plasmid. in particular,

一种用于CRISPR转座子系统的质粒pQCascade，其包括选自下组的基因片段：上述的Cas5/8编码基因；上述的Cas6编码基因；上述的Cas7编码基因；上述的tniQ编码基因。A plasmid pQCascade for the CRISPR transposon system, which includes a gene segment selected from the group consisting of: the above-mentioned Cas5/8 coding gene; the above-mentioned Cas6 coding gene; the above-mentioned Cas7 coding gene; the above-mentioned tniQ coding gene.

由于上述这些多肽均来源于半透明假交替单胞菌KMM520(Pseudoalteromonastranslucida KMM520)，本文中将该质粒pQCascade命名为pQCascadePtr。类似地，下文中可以将本发明的质粒pTns和pDonor分别表示为pTnsPtr和pDonorPtr。Since the above polypeptides are all derived from Pseudoalteromonas translucida KMM520 ( Pseudoalteromonas tr anslucida KMM520), the plasmid pQCascade is named pQCascadePtr herein. Similarly, the plasmids pTns and pDonor of the present invention may be denoted as pTnsPtr and pDonorPtr, respectively, hereinafter.

优选地，上述的质粒pQCascadePtr还可以包括下述基因：靶向基因组目标位点的crRNA序列，CloDF13复制子，启动子例如脱水四环素诱导型启动子，链霉素抗性基因。Preferably, the above-mentioned plasmid pQCascadePtr may also include the following genes: crRNA sequence targeting the target site in the genome, CloDF13 replicon, promoter such as anhydrocycline-inducible promoter, streptomycin resistance gene.

上述CRISPR转座酶中的crRNA可以呈array的形式发挥功能。The crRNA in the above CRISPR transposase can function in the form of an array.

在一种实施方式中，上述crRNA系列的间隔序列spacer可以是靶向待处理细胞基因组中单个位点的spacer。In one embodiment, the above-mentioned spacer of the crRNA series may be a spacer targeting a single site in the genome of the cell to be treated.

在另一种实施方式中，上述crRNA系列可以是靶向待处理细胞基因组中多个位点的crRNA阵列(又称CRISPRarray、crRNA array或者array)。In another embodiment, the above-mentioned crRNA series may be a crRNA array (also known as CRISPRarray, crRNA array or array) targeting multiple sites in the genome of the cell to be treated.

优选上述crRNA序列是靶向基因组中多个位点的crRNA阵列，其中重复序列区repeat包括选自下组序列中的一种以上：核苷酸序列为SEQ ID NO:15的repeat1、核苷酸序列为SEQ ID NO:16的repeat2、核苷酸序列为SEQ ID NO:17的repeat3、核苷酸序列为SEQID NO:18的repeat4，这些核苷酸序列SEQ ID NOs:15-18中的32个N(即[N32])是任意的碱基A、T、G或C。Preferably, the above-mentioned crRNA sequence is a crRNA array targeting multiple sites in the genome, wherein the repeat sequence region repeat includes more than one sequence selected from the following group: the nucleotide sequence is repeat1 of SEQ ID NO: 15, nucleotide The sequence is repeat2 of SEQ ID NO: 16, the nucleotide sequence is repeat3 of SEQ ID NO: 17, the nucleotide sequence is repeat4 of SEQ ID NO: 18, and these nucleotide sequences are 32 in SEQ ID NOs: 15-18 Each N (ie [N32]) is any base A, T, G or C.

一种用于CRISPR转座子系统的辅助质粒pTnsPtr，其与上述的质粒pQCascadePtr配合使用，其包括选自下组的基因片段：上述的tnsA编码基因，上述的tnsB编码基因，上述的tnsC编码基因。A helper plasmid pTnsPtr for the CRISPR transposon system, which is used in conjunction with the above-mentioned plasmid pQCascadePtr, which includes a gene segment selected from the group consisting of the above-mentioned tnsA-encoding gene, the above-mentioned tnsB-encoding gene, and the above-mentioned tnsC-encoding gene .

上述CRISPR转座酶基因序列用于CRISPR转座时，需与宿主内(革兰氏阴性菌如大肠杆菌、需钠弧菌、柠檬塔特姆氏菌；革兰氏阳性菌如谷氨酸棒杆菌)可用的启动子，Leftend(LE)-cargo-Right end(RE)一起发挥作用，不论是以质粒的形式或是整合的形式。When the above-mentioned CRISPR transposase gene sequence is used for CRISPR transposition, it needs to be combined with the host (gram-negative bacteria such as Escherichia coli, natrivibrio, and Tatumella citronella; gram-positive bacteria such as glutamic acid rod Bacillus) available promoters, Leftend (LE)-cargo-Right end (RE) work together, either in the form of a plasmid or in an integrated form.

优选地，上述的辅助质粒pTnsPtr还包括下述基因：ColA复制子、启动子例如脱水四环素诱导型启动子、卡那霉素抗性基因。Preferably, the above-mentioned helper plasmid pTnsPtr also includes the following genes: ColA replicon, promoter such as anhydrocycline-inducible promoter, and kanamycin resistance gene.

作为两种质粒合并成一种质粒的实施方式，本发明提供了一种用于CRISPR转座子系统的质粒pQCasTnsPtr，其由上所述的质粒pQCascadePtr和辅助质粒pTnsPtr合并而成，包括：上述的Cas5/8、Cas6、Cas7、tniQ、tnsA、tnsB和tnsC基因，靶向基因组目标位点的crRNA序列，ColA复制子，启动子例如脱水四环素诱导型启动子，卡那霉素抗性基因。As an embodiment in which two plasmids are combined into one plasmid, the present invention provides a plasmid pQCasTnsPtr for the CRISPR transposon system, which is formed by merging the above-mentioned plasmid pQCascadePtr and the helper plasmid pTnsPtr, including: the above-mentioned Cas5 /8, Cas6, Cas7, tniQ, tnsA, tnsB, and tnsC genes, crRNA sequences targeting genomic target sites, ColA replicons, promoters such as anhydrocycline-inducible promoters, kanamycin resistance genes.

一种用于CRISPR转座子系统的辅助质粒pDonorPtr，其与上述的质粒pQCascadePtr和上述的辅助质粒pTnsPtr配合使用，其包括选自下组的基因片段：序列Leftend(LE)，其核苷酸序列为SEQ ID NO:19，或者与SEQ ID NO:19有80％以上同源性、优选地85％以上同源性、更优地90％以上同源性、更优地95％以上同源性且包含SEQ ID NO:19的3’端33bp序列；序列Right end(RE)，其核苷酸序列为SEQ ID NO:20，或者与SEQ ID NO:20有80％以上同源性、优选地85％以上同源性、更优地90％以上同源性、更优地95％以上同源性且包含SEQ ID NO:20的5’端27bp序列；目标货物基因(Cargo gene)。A helper plasmid pDonorPtr for the CRISPR transposon system, which is used in conjunction with the above-mentioned plasmid pQCascadePtr and the above-mentioned helper plasmid pTnsPtr, which includes a gene segment selected from the group: sequence Leftend (LE), its nucleotide sequence It is SEQ ID NO:19, or has more than 80% homology, preferably more than 85% homology, more preferably more than 90% homology, more preferably more than 95% homology with SEQ ID NO:19 And comprise the 3' end 33bp sequence of SEQ ID NO:19; Sequence Right end (RE), its nucleotide sequence is SEQ ID NO:20, or there is more than 80% homology with SEQ ID NO:20, preferably More than 85% homology, more preferably more than 90% homology, more preferably more than 95% homology and containing the 27bp sequence of the 5' end of SEQ ID NO: 20; the target cargo gene (Cargo gene).

由于核苷酸序列为SEQ ID NO:19的LE和核苷酸序列为SEQ ID NO:20的RE来源于半透明假交替单胞菌KMM520(Pseudoalteromonas translucida KMM520)，本文中将该质粒pDonor命名为pDonorPtr。Since the LE with the nucleotide sequence of SEQ ID NO: 19 and the RE with the nucleotide sequence of SEQ ID NO: 20 are derived from Pseudoalteromonas translucidus KMM520 ( P seudoalteromonas tr anslucida KMM520), the plasmid pDonor Name it pDonorPtr.

优选地，上述辅助质粒pDonorPtr还可以包括下述基因：pMB1复制子、氨苄青霉素抗性基因、目标货物基因(Cargo gene)；或者下述基因：p15A复制子、氯霉素抗性基因、目标货物基因(Cargo gene)。Preferably, the above-mentioned auxiliary plasmid pDonorPtr can also include the following genes: pMB1 replicon, ampicillin resistance gene, target cargo gene (Cargo gene); or the following genes: p15A replicon, chloramphenicol resistance gene, target cargo Gene (Cargo gene).

作为上述三种质粒合并成一种质粒的实施方式，本发明提供了一种用于CRISPR转座子系统的质粒pEffectorPtr，其由上述的质粒pQCascadePtr、上述的辅助质粒pTnsPtr和上述的辅助质粒pDonorPtr合并而成，包括：上述的Cas5/8、Cas6、Cas7、tniQ、tnsA、tnsB和tnsC基因、Left end(LE)和Right end(RE)序列，目标货物基因，靶向基因组目标位点的crRNA序列、ColA复制子、启动子例如脱水四环素诱导型启动子、卡那霉素抗性基因。As an embodiment in which the above three plasmids are combined into one plasmid, the present invention provides a plasmid pEffectorPtr for the CRISPR transposon system, which is formed by merging the above-mentioned plasmid pQCascadePtr, the above-mentioned helper plasmid pTnsPtr and the above-mentioned helper plasmid pDonorPtr components, including: the above-mentioned Cas5/8, Cas6, Cas7, tniQ, tnsA, tnsB and tnsC genes, Left end (LE) and Right end (RE) sequences, target cargo genes, crRNA sequences targeting genomic target sites, ColA replicon, promoters such as anhydrocycline-inducible promoters, kanamycin resistance genes.

基于上述的质粒，本发明的第四个方面提供了一种CRISPR转座子系统，其包括：质粒pQCascadePtr、pTnsPtr和pDonorPtr；或者质粒pQCasTnsPtr和pDonorPtr；或者质粒pEffectorPtr。Based on the above plasmids, the fourth aspect of the present invention provides a CRISPR transposon system, which includes: plasmids pQCascadePtr, pTnsPtr and pDonorPtr; or plasmids pQCasTnsPtr and pDonorPtr; or plasmid pEffectorPtr.

本发明的第五个方面提供了一种上述来源于半透明假交替单胞菌KMM520的CRISPR转座子系统与现有技术的来源于霍乱弧菌(Vibrio cholerae)Tn6677的CRISPR转座子系统的组合使用方式，具体而言，A fifth aspect of the present invention provides a combination of the above-mentioned CRISPR transposon system derived from Pseudomonas translucidus KMM520 and the prior art CRISPR transposon system derived from Vibrio cholerae (Vibrio cholerae) Tn6677 Combinations use, specifically,

一种CRISPR转座子系统，其除了包括上述的CRISPR转座子系统(包括：质粒pQCascadePtr、pTnsPtr和pDonorPtr；或者质粒pQCasTnsPtr和pDonorPtr；或者质粒pEffectorPtr)之外，还包括霍乱弧菌(Vibrio cholerae)Tn6677来源的CRISPR转座酶相关质粒。A CRISPR transposon system, which, in addition to including the above-mentioned CRISPR transposon system (including: plasmids pQCascadePtr, pTnsPtr and pDonorPtr; or plasmids pQCasTnsPtr and pDonorPtr; or plasmid pEffectorPtr), also includes Vibrio cholerae ( Vibrio ch olerae) Tn6677-derived CRISPR transposase-associated plasmid.

霍乱弧菌Tn6677的I-F3型CRISPR转座酶的基因序列是NCBI：NZ_ALED01000027.1。The gene sequence of the type I-F3 CRISPR transposase of Vibrio cholerae Tn6677 is NCBI: NZ_ALED01000027.1.

上述霍乱弧菌Tn6677来源的CRISPR转座酶相关质粒包括质粒pQCasTnsVch和辅助质粒pDonorVch，其中The above-mentioned CRISPR transposase-related plasmids derived from Vibrio cholerae Tn6677 include plasmid pQCasTnsVch and helper plasmid pDonorVch, wherein

质粒pQCasTnsVch包括：霍乱弧菌Tn6677来源的Cas5/8、Cas6、Cas7、tniQ、tnsA、tnsB和tnsC基因，CloDF13复制子，启动子例如脱水四环素诱导型启动子，链霉素抗性基因；Plasmid pQCasTnsVch includes: Cas5/8, Cas6, Cas7, tniQ, tnsA, tnsB and tnsC genes derived from Vibrio cholerae Tn6677, CloDF13 replicator, promoter such as anhydrocycline-inducible promoter, streptomycin resistance gene;

质粒pDonorVch包括霍乱弧菌Tn6677来源的CRISPR阵列、Left end(LE)和Rightend(RE)，pMB1复制子，氨苄青霉素抗性基因，目标货物基因(Cargo gene)。Plasmid pDonorVch includes CRISPR array derived from Vibrio cholerae Tn6677, Left end (LE) and Right end (RE), pMB1 replicon, ampicillin resistance gene, target cargo gene (Cargo gene).

在一种实施方式中，与本发明的两种或三种质粒合并成一种质粒的组合方式pQCasTnsPtr和pEffectorPtr相类似，上述的质粒pQCasTnsVch和pDonorVch可以合并为质粒pEffectorVch。In one embodiment, similar to the combination of two or three plasmids of the present invention into one plasmid pQCasTnsPtr and pEffectorPtr, the above-mentioned plasmids pQCasTnsVch and pDonorVch can be combined into plasmid pEffectorVch.

在上述启动子采用脱水四环素诱导型启动子的情况下，当上述来源于半透明假交替单胞菌KMM520的CRISPR转座子系统用于转化大肠杆菌BL21(DE3)、BL21Star^TM(DE3)或W3110(DE3)等菌株时，可以使用脱水四环素进行诱导。即，将质粒pQCasTnsPtr和pDonorPtr用于转化大肠杆菌BL21(DE3)、BL21Star^TM(DE3)或W3110(DE3)等菌株时，使用脱水四环素进行诱导。In the case that the above-mentioned promoter adopts anhydrotetracycline-inducible promoter, when the above-mentioned CRISPR transposon system derived from Pseudomonas translucidum KMM520 is used to transform Escherichia coli BL21 (DE3), BL21Star ^TM (DE3) or W3110 (DE3) and other strains can be induced with anhydrotetracycline. That is, when the plasmids pQCasTnsPtr and pDonorPtr are used to transform strains such as Escherichia coli BL21(DE3), BL21Star ^™ (DE3) or W3110(DE3), anhydrotetracycline is used for induction.

类似地，在上述启动子采用脱水四环素诱导型启动子的情况下，当上述来源于半透明假交替单胞菌KMM520的CRISPR转座子系统与来源于霍乱弧菌Tn6677的CRISPR转座子系统组合一起使用时，也可以使用脱水四环素进行诱导。即，将质粒pEffectorPtr和pEffectorVch，或将质粒pQCasTnsVch、pDonorVch、pQCasTnsPtr和pDonorPtr用于转化大肠杆菌BL21(DE3)、BL21Star^TM(DE3)或W3110(DE3)等菌株时，使用脱水四环素进行诱导。Similarly, when the above-mentioned promoter adopts anhydrocycline-inducible promoter, when the above-mentioned CRISPR transposon system derived from Pseudomonas translucidus KMM520 is combined with the CRISPR transposon system derived from Vibrio cholerae Tn6677 When used together, anhydrocycline can also be used for induction. That is, when plasmids pEffectorPtr and pEffectorVch, or plasmids pQCasTnsVch, pDonorVch, pQCasTnsPtr, and pDonorPtr are used to transform strains such as Escherichia coli BL21 (DE3), BL21Star ^TM (DE3) or W3110 (DE3), anhydrotetracycline is used for induction.

这两种微生物来源的CRISPR转座子系统能够在同一大肠杆菌内使用，且可以互不干扰的发挥功能，两者正交发挥作用。These two microbial-derived CRISPR transposon systems can be used in the same Escherichia coli, and can function without interfering with each other, and the two function orthogonally.

本发明的第六个方面提供了上述的CRISPR相关转座酶、上述的编码多肽的基因、上述的质粒pQCascadePtr、上述的质粒pTnsPtr、质粒pQCasTnsPtr、质粒pDonorPtr、质粒pEffectorPtr、上述的CRISPR转座子系统在基因编辑中的应用。The sixth aspect of the present invention provides the above-mentioned CRISPR-related transposase, the above-mentioned gene encoding a polypeptide, the above-mentioned plasmid pQCascadePtr, the above-mentioned plasmid pTnsPtr, the plasmid pQCasTnsPtr, the plasmid pDonorPtr, the plasmid pEffectorPtr, the above-mentioned CRISPR transposon system Applications in gene editing.

所述的基因编辑可以是任何细胞内的基因编辑，尤其是在微生物(包括真菌和细菌)细胞内的基因编辑，特别是工业微生物细胞内的基因编辑。The gene editing can be gene editing in any cell, especially gene editing in microorganism (including fungi and bacteria) cells, especially gene editing in industrial microorganism cells.

优选地，进行基因编辑的细菌包括革兰氏阴性菌例如大肠杆菌、需钠弧菌、柠檬塔特姆氏菌、革兰氏阳性菌例如谷氨酸棒杆菌等，但不限于此。Preferably, the bacteria for gene editing include Gram-negative bacteria such as Escherichia coli, Narvibrio natrium, Tatumella citrum, Gram-positive bacteria such as Corynebacterium glutamicum, etc., but are not limited thereto.

实验证明，本发明开发的新型CRISPR相关转座酶系统可以一次将货物基因插入至8个位点，效率100％，效率高于来源于霍乱弧菌Tn6677的CRISPR相关转座酶；转座15.4kb货物基因至对应靶点的效率为100％；该新型CRISPR相关转座酶可识别16种PAM。本发明首次将两种CRISPR相关转座酶系统在同一大肠杆菌内使用，且可以互不干扰的发挥功能，因此为加速代谢工程菌株构建提供了一种选择。Experiments have proved that the new CRISPR-associated transposase system developed by the present invention can insert cargo genes into 8 sites at one time, with an efficiency of 100%, which is higher than that of the CRISPR-associated transposase derived from Vibrio cholerae Tn6677; transposition 15.4kb The efficiency of the cargo gene to the corresponding target is 100%; the novel CRISPR-associated transposase can recognize 16 PAMs. In the present invention, two CRISPR-related transposase systems are used in the same Escherichia coli for the first time, and they can function without interfering with each other, thus providing an option for accelerating the construction of metabolic engineering strains.

附图说明Description of drawings

图1显示了实施例1中靶向crRNA3(lacZ)位点的凝胶电泳图照片。Fig. 1 has shown the gel electrophoresis photograph of targeting crRNA3 (lacZ) site in embodiment 1.

图2显示了实施例2中菌株基因组中8个位点的货物基因GFP插入情况的凝胶电泳图。其中NC是阴性对照(Negative Control)。Fig. 2 shows the gel electrophoresis of the cargo gene GFP insertion situation at 8 sites in the genome of the strain in Example 2. Where NC is a negative control (Negative Control).

图3是实施例2中通过菌落PCR与核酸凝胶电泳验证各克隆基因组8个位点插入情况的统计柱形图。其中横坐标为货物基因GFP的拷贝数，纵坐标是货物基因GFP插入率。Fig. 3 is a statistical bar chart of the 8-site insertion status of each cloned genome verified by colony PCR and nucleic acid gel electrophoresis in Example 2. The abscissa is the copy number of the cargo gene GFP, and the ordinate is the insertion rate of the cargo gene GFP.

图4显示了实施例3中菌株基因组中6个位点的Ptr携带的货物基因GFP和Vch携带的货物基因“终止子序列”插入情况的凝胶电泳图。其中NC是阴性对照。Figure 4 shows the gel electrophoresis images of the insertion of the "terminator sequence" of the cargo gene GFP carried by Ptr and the cargo gene "terminator sequence" carried by Vch at 6 sites in the genome of the strain in Example 3. where NC is the negative control.

图5是质粒pQCascadePtr的结构示意图。Fig. 5 is a schematic diagram of the structure of plasmid pQCascadePtr.

图6是质粒pDonorPtr的结构示意图。Fig. 6 is a schematic diagram of the structure of plasmid pDonorPtr.

图7是质粒pTnsPtr的结构示意图。Fig. 7 is a schematic diagram of the structure of plasmid pTnsPtr.

图8是质粒pQCasTnsPtr的结构示意图。Fig. 8 is a schematic diagram of the structure of plasmid pQCasTnsPtr.

图9是质粒pQCasTnsVch的结构示意图。Fig. 9 is a schematic diagram of the structure of plasmid pQCasTnsVch.

图10是质粒pEffectorPtr的结构示意图。Fig. 10 is a schematic diagram of the structure of plasmid pEffectorPtr.

图11是质粒pEffectorVch的结构示意图。Fig. 11 is a schematic diagram of the structure of plasmid pEffectorVch.

图12是质粒pVnQCasTnsPtr的结构示意图。Fig. 12 is a schematic diagram of the structure of plasmid pVnQCasTnsPtr.

图13是质粒pCgQCasTnsVch的结构示意图。Fig. 13 is a schematic diagram of the structure of plasmid pCgQCasTnsVch.

图14是质粒pCgDonorPtr的结构示意图。Fig. 14 is a schematic diagram of the structure of plasmid pCgDonorPtr.

图15显示了实施例5中靶向谷氨酸棒杆菌ATCC13032菌株crtYf基因位点的凝胶电泳图照片。Fig. 15 shows the photograph of the gel electrophoresis image targeting the crtYf gene locus of the Corynebacterium glutamicum ATCC13032 strain in Example 5.

具体实施方式Detailed ways

本发明的新型CRISPR相关转座酶系统来源于半透明假交替单胞菌KMM520，是对基因编辑工具CAST系统和INTEGRATE系统以及MUCICAT系统的进一步发展和完善，尤其是基因拷贝量和货物基因插入效率的提高。The novel CRISPR-associated transposase system of the present invention is derived from Pseudoalteromonas translucenta KMM520, which is a further development and improvement of the gene editing tools CAST system, INTEGRATE system and MUCICAT system, especially the gene copy amount and cargo gene insertion efficiency improvement.

在本文中，为了描述简便，有时会将术语“CRISPR相关转座酶系统”简称为“CRISPR相关转座酶”、“CRISPR转座酶”、或者“CRISPR转座子系统”等，它们表示相同的含义，可以互换使用。In this paper, for simplicity of description, the term "CRISPR-associated transposase system" is sometimes abbreviated as "CRISPR-associated transposase", "CRISPR transposase", or "CRISPR transposase system", etc., which mean the same can be used interchangeably.

在本文中，为了描述简便，有时会将某种蛋白比如Cas6与其编码基因(DNA)名称混用，本领域技术人员应能理解它们在不同描述场合表示不同的物质。本领域技术人员根据语境和上下文容易理解它们的含义。例如，对于tnsA，用于描述转座酶功能或类别时，指的是蛋白质；在作为一种基因描述时，指的是编码该转座酶tnsA蛋白的基因。In this paper, for the sake of simplicity of description, sometimes a certain protein such as Cas6 and the name of its encoding gene (DNA) are mixed, and those skilled in the art should understand that they represent different substances in different description occasions. Their meanings are easily understood by those skilled in the art depending on the context and context. For example, when tnsA is used to describe the function or class of transposase, it refers to the protein; when described as a gene, it refers to the gene encoding the transposase tnsA protein.

类似地，为了描述简便，有时会将RNA比如crRNA与其编码基因名称混用，本领域技术人员应能理解它们在不同描述场合表示不同的物质。本领域技术人员根据语境和上下文容易理解它们的含义。Similarly, for simplicity of description, sometimes RNA such as crRNA and the name of its coding gene are mixed, and those skilled in the art should understand that they represent different substances in different description occasions. Their meanings are easily understood by those skilled in the art depending on the context and context.

本发明提供的每种质粒pQCascadePtr、pTnsPtr、pDonorPtr、pQCasTnsPtr和pEffectorPtr等都分别包含多个基因元件，例如质粒pQCascadePtr中包含Cas5/8基因、Cas6基因、Cas7基因、tniQ编码基因、crRNA基因、CloDF13复制子、启动子例如脱水四环素诱导型启动子、链霉素抗性基因，这些基因元件的排列顺序可以是任意的，本领域技术人员可以根据习惯进行安排、并且容易地制备出质粒。Each of the plasmids pQCascadePtr, pTnsPtr, pDonorPtr, pQCasTnsPtr, and pEffectorPtr provided by the present invention contains multiple genetic elements, for example, the plasmid pQCascadePtr contains Cas5/8 gene, Cas6 gene, Cas7 gene, tniQ coding gene, crRNA gene, CloDF13 replication promoters, promoters such as anhydrocycline-inducible promoters, and streptomycin resistance genes, the sequence of these gene elements can be arbitrary, and those skilled in the art can arrange them according to their habits and easily prepare plasmids.

应理解，对于本发明的多肽Cas5/8、Cas7、Cas6、tnsABC(即tnsA、tnsB和tnsC)和tniQ的编码基因，本领域技术人员可以根据待处理细胞的具体种类比如大肠杆菌进行密码子优化，而仅仅不限于上述的核苷酸序列SEQ ID NOs:8-14。It should be understood that for the coding genes of the polypeptides Cas5/8, Cas7, Cas6, tnsABC (i.e. tnsA, tnsB and tnsC) and tniQ of the present invention, those skilled in the art can perform codon optimization according to the specific species of the cells to be treated, such as Escherichia coli , but not limited to the above-mentioned nucleotide sequence SEQ ID NOs:8-14.

密码子优化的目的在于，使得这些多肽能够在待处理细胞中实现最佳表达。密码子优化是可用于通过增加感兴趣基因的翻译效率使生物体中蛋白质表达最大化的一种技术。不同的生物体由于突变倾向和天然选择而通常示出对于编码相同氨基酸的一些密码子之一的特殊偏好性。例如，在生长快速的微生物如大肠杆菌中，优化密码子反映出其各自的基因组tRNA库的组成。因此，在生长快速的微生物中，氨基酸的低频率密码子可以被用于相同氨基酸的但高频率的密码子置换。因此，优化的DNA序列的表达在快速生长的微生物中得以改良。The purpose of codon optimization is to enable optimal expression of these polypeptides in the cells to be treated. Codon optimization is a technique that can be used to maximize protein expression in an organism by increasing the translation efficiency of a gene of interest. Different organisms often show a particular preference for one of several codons encoding the same amino acid due to mutation propensity and natural selection. For example, in fast-growing microorganisms such as E. coli, codons are optimized to reflect the composition of their respective genomic tRNA pools. Thus, in fast-growing microorganisms, low-frequency codons for amino acids can be replaced by high-frequency codons for the same amino acids. Thus, expression of optimized DNA sequences is improved in fast growing microorganisms.

下面结合具体实施例，进一步阐述本发明。应理解，这些实施例仅用于举例说明目的，而不是对本发明的限制。此外应理解，在阅读了本发明的构思之后，本领域技术人员对其作出的各种改变或调整，均应落入本发明的保护范围内，这些等价形式同样属于本申请所附权利要求书限定的范围。Below in conjunction with specific embodiment, further illustrate the present invention. It should be understood that these examples are for the purpose of illustration only and not limitation of the present invention. In addition, it should be understood that after reading the concept of the present invention, various changes or adjustments made by those skilled in the art should fall within the protection scope of the present invention, and these equivalent forms also belong to the appended claims of the present application The scope of the book is limited.

本文中涉及到多种物质的添加量、含量及浓度，其中所述的百分含量，除特别说明外，皆指质量百分含量。This article involves the addition amount, content and concentration of various substances, and the percentage content mentioned therein refers to the mass percentage content unless otherwise specified.

本文的实施例中，如果对于反应温度或操作温度没有做出具体说明，则该温度通常指室温(15-30℃)。In the examples herein, if there is no specific statement about the reaction temperature or operating temperature, the temperature usually refers to room temperature (15-30° C.).

实施例Example

材料和方法Materials and methods

实施例中的全基因合成由南京金斯瑞生物科技有限公司完成，引物合成由铂尚生物技术(上海)有限公司完成，测序由擎科生物有限公司完成。The whole gene synthesis in the examples was completed by Nanjing GenScript Biotechnology Co., Ltd., the primer synthesis was completed by Bosun Biotechnology (Shanghai) Co., Ltd., and the sequencing was completed by Qingke Biotechnology Co., Ltd.

实施例中的分子生物学实验包括质粒构建、酶切、连接、感受态细胞制备、转化、培养基配制等等，主要参照《分子克隆实验指南》(第三版)，J.萨姆布鲁克，D.W.拉塞尔(美)编著，黄培堂等译，科学出版社，北京，2002)进行。必要时可以通过简单试验确定具体实验条件。The molecular biology experiments in the examples include plasmid construction, enzyme digestion, connection, competent cell preparation, transformation, medium preparation, etc., mainly referring to "Molecular Cloning Experiment Guide" (third edition), J. Sambrook, Edited by D.W. Russell (US), translated by Huang Peitang et al., Science Press, Beijing, 2002). The specific experimental conditions can be determined by simple experiments if necessary.

PCR扩增实验根据质粒或DNA模板供应商提供的反应条件或试剂盒说明书进行。必要时可以通过简单试验予以调整。PCR amplification experiments were carried out according to the reaction conditions or kit instructions provided by the plasmid or DNA template suppliers. It can be adjusted by simple experiment if necessary.

LB培养基：10g/L胰蛋白胨、5g/L酵母提取物、10g/L氯化钠，pH7.2，121℃高温高压灭菌20min。固体培养基另加20g/L琼脂粉。LB medium: 10g/L tryptone, 5g/L yeast extract, 10g/L sodium chloride, pH7.2, sterilized under high temperature and high pressure at 121°C for 20min. Add 20g/L agar powder to the solid medium.

LBv2培养基：LB培养基中添加v2盐(21.7204g mmol/L NaCl，0.3 4.2g mmol/LKCl，4.723.14g mmol/L MgCl₂)。LBv2 medium: add v2 salt (21.7204g mmol/L NaCl, 0.34.2g mmol/LKCl, 4.723.14g mmol/L MgCl ₂ ) to LB medium.

BHIS培养基：37g/L BHI，91g/L山梨醇。固体培养基另加20g/L琼脂粉。BHIS medium: 37g/L BHI, 91g/L sorbitol. Add 20g/L agar powder to the solid medium.

实施例中所使用的质粒pQCascadePtr、pTnsPtr和pCgQCasTnsPtr委托南京金斯瑞生物科技有限公司构建合成，质粒pDonorPtr、pQCasTnsPtr、pEffectorPtr、pCgDonorPtr、pQCasTnsVch、pDonorVch和pEffectorVch由中国科学院分子植物科学卓越创新中心杨晟课题组构建，任何单位和个人都可以获得这些质粒用于验证本发明，但未经中国科学院分子植物科学卓越创新中心允许不得用作其他用途，包括开发利用、科学研究和教学。其中Plasmids pQCascadePtr, pTnsPtr, and pCgQCasTnsPtr used in the examples were commissioned by Nanjing GenScript Biotechnology Co., Ltd. to construct and synthesize them. Any unit or individual can obtain these plasmids to verify the present invention, but they cannot be used for other purposes, including development and utilization, scientific research and teaching, without the permission of the Center for Excellence in Molecular Plant Science, Chinese Academy of Sciences. in

质粒pQCascadePtr包含来源于Pseudoalteromonas translucida KMM520的基因tniQ(SEQ ID NO:11)、Cas5/8(SEQ ID NO:12)、Cas7(SEQ ID NO:14)和Cas6(SEQ ID NO:13)、靶向基因组目标位点的crRNA序列、CloDF13复制子、启动子例如脱水四环素诱导型启动子、链霉素抗性基因，委托南京金斯瑞生物科技有限公司合成，其核苷酸序列为SEQ IDNO:21，其结构如图5所示。Plasmid pQCascadePtr comprises genes tniQ (SEQ ID NO:11), Cas5/8 (SEQ ID NO:12), Cas7 (SEQ ID NO:14) and Cas6 (SEQ ID NO:13) derived from Pseudoalteromonas translucida KMM520, targeting The crRNA sequence of the genomic target site, CloDF13 replicon, promoter such as anhydrocycline-inducible promoter, streptomycin resistance gene was commissioned to synthesize by Nanjing GenScript Biotechnology Co., Ltd., and its nucleotide sequence is SEQ ID NO: 21 , whose structure is shown in Figure 5.

质粒pDonorPtr包含来源于Pseudoalteromonas translucida KMM520的基因LE(SEQ ID NO:19)和RE(SEQ ID NO:20)序列，pMB1复制子、氨苄青霉素抗性基因、目标货物基因Cargo例如无启动子的氯霉素抗性CmR基因片段，其结构如图6所示。Plasmid pDonorPtr contains gene LE (SEQ ID NO:19) and RE (SEQ ID NO:20) sequences derived from Pseudoalteromonas translucida KMM520, pMB1 replicon, ampicillin resistance gene, target cargo gene Cargo such as chloramphenicol without promoter The protein-resistant CmR gene fragment, the structure of which is shown in Figure 6.

质粒pTnsPtr包含来源于Pseudoalteromonas translucida KMM520的基因tnsA(SEQ ID NO:8)、tnsB(SEQ ID NO:9)和tnsC(SEQ ID NO:10)、ColA复制子、启动子例如脱水四环素诱导型启动子、卡那霉素抗性基因，委托南京金斯瑞生物科技有限公司合成，其核苷酸序列为SEQ ID NO:22，其结构如图7所示。Plasmid pTnsPtr comprises the genes tnsA (SEQ ID NO:8), tnsB (SEQ ID NO:9) and tnsC (SEQ ID NO:10) derived from Pseudoalteromonas translucida KMM520, a ColA replicon, a promoter such as an anhydrocycline-inducible promoter 1. Kanamycin resistance gene, commissioned by Nanjing GenScript Biotechnology Co., Ltd. to synthesize, its nucleotide sequence is SEQ ID NO: 22, and its structure is shown in Figure 7.

质粒pQCasTnsPtr包含来源于Pseudoalteromonas translucida KMM520的基因tnsA(SEQ ID NO:8)、tnsB(SEQ ID NO:9)、tnsC(SEQ ID NO:10)、tniQ(SEQ ID NO:11)、Cas5/8(SEQ ID NO:12)、Cas7(SEQ ID NO:14)和Cas6(SEQ ID NO:13)、靶向基因组目标位点的crRNA序列、ColA复制子、启动子例如脱水四环素诱导型启动子、卡那霉素抗性基因，其结构如图8所示。Plasmid pQCasTnsPtr comprises gene tnsA (SEQ ID NO:8), tnsB (SEQ ID NO:9), tnsC (SEQ ID NO:10), tniQ (SEQ ID NO:11), Cas5/8 ( SEQ ID NO: 12), Cas7 (SEQ ID NO: 14) and Cas6 (SEQ ID NO: 13), crRNA sequences targeting genomic target sites, ColA replicon, promoters such as anhydrocycline-inducible promoters, card The structure of the namycin resistance gene is shown in FIG. 8 .

质粒pQCasTnsVch包含来源于霍乱弧菌Tn6677的基因Cas5/8、Cas7、Cas6、tniQ、tnsA、tnsB、tnsC及CRISPR array、CloDF13复制子、启动子例如脱水四环素诱导型启动子、链霉素抗性基因，其结构如图9所示，由中国科学院分子植物科学卓越创新中心杨晟课题组构建。Plasmid pQCasTnsVch contains genes Cas5/8, Cas7, Cas6, tniQ, tnsA, tnsB, tnsC and CRISPR array derived from Vibrio cholerae Tn6677, CloDF13 replicon, promoter such as anhydrocycline-inducible promoter, streptomycin resistance gene , whose structure is shown in Figure 9, constructed by Yang Sheng's research group of Center for Excellence in Molecular Plant Science, Chinese Academy of Sciences.

质粒pEffectorPtr包含来源于Pseudoalteromonas translucida KMM520的基因tnsA(SEQ ID NO:8)、tnsB(SEQ ID NO:9)、tnsC(SEQ ID NO:10)、tniQ(SEQ ID NO:11)、Cas5/8(SEQ ID NO:12)、Cas7(SEQ ID NO:14)和Cas6(SEQ ID NO:13)、靶向基因组目标位点的crRNA序列、LE(SEQ ID NO:19)和RE(SEQ ID NO:20)序列、目标货物基因、ColA复制子、启动子例如脱水四环素诱导型启动子、卡那霉素抗性基因，其结构如图10所示。Plasmid pEffectorPtr comprises gene tnsA (SEQ ID NO:8), tnsB (SEQ ID NO:9), tnsC (SEQ ID NO:10), tniQ (SEQ ID NO:11), Cas5/8 ( SEQ ID NO: 12), Cas7 (SEQ ID NO: 14) and Cas6 (SEQ ID NO: 13), crRNA sequences targeting genomic target sites, LE (SEQ ID NO: 19) and RE (SEQ ID NO: 20) Sequence, target cargo gene, ColA replicon, promoter such as anhydrocycline-inducible promoter, kanamycin resistance gene, the structure of which is shown in FIG. 10 .

质粒pEffectorVch包含来源于霍乱弧菌Tn6677的基因Cas5/8、Cas7、Cas6、tniQ、tnsA、tnsB、tnsC及CRISPR array、LE和RE序列、目标货物基因、CloDF13复制子、启动子例如脱水四环素诱导型启动子、链霉素抗性基因，其结构如图11所示，由中国科学院分子植物科学卓越创新中心杨晟课题组构建。Plasmid pEffectorVch contains genes Cas5/8, Cas7, Cas6, tniQ, tnsA, tnsB, tnsC and CRISPR array, LE and RE sequences from Vibrio cholerae Tn6677, target cargo gene, CloDF13 replicon, promoter such as anhydrocycline-inducible The structure of the promoter and streptomycin resistance gene is shown in Figure 11, which was constructed by Yang Sheng's research group of the Center for Excellence in Molecular Plant Science, Chinese Academy of Sciences.

实施例1验证CRISPR相关转座酶的转座活性Example 1 Verifying the transposition activity of CRISPR-associated transposases

通过靶向大肠杆菌BL21Star^TM(DE3)基因组lacZ和T7RNA聚合酶前lacZ，证实来源于半透明假交替单胞菌KMM520的CRISPR相关酶具有可编程的转座活性。A CRISPR-associated enzyme from Pseudomonas translucidus KMM520 was demonstrated to have programmable transposition activity by targeting the lacZ and T7 RNA polymerase pro-lacZ of the Escherichia coli BL21Star ^TM (DE3) genome.

构建靶向基因组lacZ和T7RNA聚合酶前lacZ的pQCascadePtr-cr3质粒及验证引物列于表1。其中质粒名称pQCascadePtr-cr3中的后缀“-cr3”代表构建靶向基因组lacZ。The pQCascadePtr-cr3 plasmid targeting genomic lacZ and pre-lacZ of T7 RNA polymerase and the verified primers are listed in Table 1. The suffix "-cr3" in the plasmid name pQCascadePtr-cr3 represents the construction of targeting genome lacZ.

表1：引物序列Table 1: Primer sequences

注：表中引物名称后缀F代表正向引物，R代表反向引物。Note: The suffix F in the primer name in the table represents the forward primer, and R represents the reverse primer.

1.1酶切获得pQCascadePtr骨架片段1.1 Enzyme digestion to obtain pQCascadePtr backbone fragment

NcoI和BamHI酶切pQCascadePtr质粒，获得pQCascadePtr骨架片段。The pQCascadePtr plasmid was digested with NcoI and BamHI to obtain the backbone fragment of pQCascadePtr.

酶切反应体系(50μL)Enzyme digestion reaction system (50μL)

酶切反应条件：37℃，1h。通过凝胶电泳分离酶切后质粒片段并胶回收。Enzyme digestion reaction conditions: 37°C, 1h. The digested plasmid fragments were separated by gel electrophoresis and gel recovered.

限制性内切酶试剂盒购自Thermofisher公司，DNA凝胶回收试剂盒购自上海吐露港生物科技有限公司。The restriction endonuclease kit was purchased from Thermofisher Company, and the DNA gel recovery kit was purchased from Shanghai Tolo Harbor Biotechnology Co., Ltd.

1.2引物退火自搭1.2 Self-annealing of primers

表1中的引物对PtrDR-F和PtrDR-R通过退火自搭，可获得含有4bp粘性末端的DR片段，该粘性末端与pQCascadePtr质粒骨架NcoI和BamHI酶切后的粘性末端互补。The primer pair PtrDR-F and PtrDR-R in Table 1 can be annealed and self-assembled to obtain a DR fragment containing a 4bp sticky end, which is complementary to the sticky end of the pQCascadePtr plasmid backbone after NcoI and BamHI digestion.

退火自搭体系(50μL)Annealed self-build system (50μL)

95℃保温5min，每分钟减低5-10℃，16℃保温10min，用ddH₂O稀释20倍，备用。Incubate at 95°C for 5 minutes, lower the temperature by 5-10°C per minute, and incubate at 16°C for 10 minutes, dilute 20 times with ddH ₂ O, and set aside.

1.3连接构建质粒pQCascadePtr-DR1.3 Connection construction plasmid pQCascadePtr-DR

T4连接体系(10μL)T4 connection system (10μL)

16℃连接1h。T4连接酶试剂盒购自TAKARA公司。Connect at 16°C for 1h. T4 ligase kit was purchased from TAKARA company.

将上述连接产物全部转化至DH5α感受态细胞中(购自深圳康体生命科技有限公司)，在含有链霉素的LB固体平板上进行筛选，获得质粒pQCascadePtr-DR。用引物8测序验证正确。All the above ligation products were transformed into DH5α competent cells (purchased from Shenzhen Kangti Life Technology Co., Ltd.), and screened on LB solid plates containing streptomycin to obtain plasmid pQCascadePtr-DR. It was verified correct by sequencing with primer 8.

1.4酶切获得pQCascadePtr-DR骨架片段1.4 Enzyme digestion to obtain pQCascadePtr-DR backbone fragment

BsaI酶切pQCascadePtr-DR质粒，获得pQCascadePtr-DR骨架片段。The pQCascadePtr-DR plasmid was digested with BsaI to obtain the pQCascadePtr-DR backbone fragment.

酶切反应体系(50μL)Enzyme digestion reaction system (50μL)

酶切反应条件：37℃，1h。通过凝胶电泳分离酶切后质粒片段并胶回收。Enzyme digestion reaction conditions: 37°C, 1h. The digested plasmid fragments were separated by gel electrophoresis and recovered.

限制性内切酶试剂盒购自Thermofisher公司，DNA凝胶回收试剂盒购自吐露港公司。The restriction endonuclease kit was purchased from Thermofisher Company, and the DNA gel recovery kit was purchased from Tolo Harbor Company.

1.5引物退火自搭1.5 Self-annealing of primers

表1中的引物对Ptrcr3-F和Ptrcr3-R通过退火自搭可获得含有4bp粘性末端的DR片段，该粘性末端与pQCascadePtr-DR质粒骨架BsaI酶切后的粘性末端互补。The primer pair Ptrcr3-F and Ptrcr3-R in Table 1 can be annealed and self-assembled to obtain a DR fragment containing a 4bp sticky end, which is complementary to the sticky end of the pQCascadePtr-DR plasmid backbone after BsaI digestion.

退火自搭体系(50μL)Annealed self-build system (50μL)

1.6连接构建质粒pQCascadePtr-cr31.6 Connection construction plasmid pQCascadePtr-cr3

T4连接体系(10μL)T4 connection system (10μL)

将上述连接产物全部转化至DH5α感受态细胞中(购自深圳康体生命科技有限公司)，在含有链霉素的LB固体平板上进行筛选，获得质粒pQCascadePtr-cr3。用引物8测序验证正确。All the above ligation products were transformed into DH5α competent cells (purchased from Shenzhen Kangti Life Technology Co., Ltd.), and screened on LB solid plates containing streptomycin to obtain plasmid pQCascadePtr-cr3. It was verified correct by sequencing with primer 8.

1.7转化转座工具质粒与诱导转座1.7 Transformation of transposition tool plasmid and induction of transposition

电转化pDonorPtr与pTnsPtr至大肠杆菌BL21Star^TM(DE3)，37℃条件下在含氨苄青霉素与卡那霉素的LB固体平板上进行筛选。挑选阳性克隆子，制备成电转感受态细胞后，电转化pQCascadePtr-cr3至上述菌中，37℃条件下在含氨苄青霉素、卡那霉素与链霉素的LB固体平板上进行筛选，获得含有pDonorPtr、pTnsPtr与pQCascadePtr-cr3的大肠杆菌BL21Star^TM(DE3)菌株。将上述平板上的克隆刮取一部分重悬于液体LB培养基中，重新涂布于含有终浓度100ng/ml脱水四环素、氨苄青霉素、卡那霉素与链霉素的LB固体平板上，脱水四环素负责诱导转座相关酶的表达。37℃条件下培养16h，可能会形成一层菌膜，属正常情况。将上述含有100ng/ml脱水四环素平板上的克隆刮取一部分重悬于液体LB培养基中，调整OD₆₀₀至约0.5后，用液体LB培养基稀释50倍，吸取100μL涂布于添加了终浓度1000ng/ml脱水四环素、氨苄青霉素、卡那霉素与链霉素的LB固体平板上，37℃条件下培养24h。Electrotransform pDonorPtr and pTnsPtr into Escherichia coli BL21Star ^TM (DE3), and select on LB solid plates containing ampicillin and kanamycin at 37°C. The positive clones were selected and made into competent cells for electroporation, and pQCascadePtr-cr3 was electrotransformed into the above-mentioned bacteria, and screened on LB solid plates containing ampicillin, kanamycin and streptomycin at 37°C to obtain cells containing E. coli BL21Star ^™ (DE3) strains of pDonorPtr, pTnsPtr and pQCascadePtr-cr3. Scrape a part of the clones on the above plate, resuspend them in liquid LB medium, reapply on LB solid plates containing anhydrotetracycline, ampicillin, kanamycin and streptomycin at a final concentration of 100ng/ml, anhydrotetracycline Responsible for inducing the expression of transposition-associated enzymes. After culturing at 37°C for 16 hours, a layer of bacterial film may form, which is normal. Scrape a part of the clones on the plate containing 100ng/ml anhydrotetracycline and resuspend in liquid LB medium, adjust the OD ₆₀₀ to about 0.5, dilute 50 times with liquid LB medium, draw 100 μL and apply to the final concentration 1000ng/ml anhydrotetracycline, ampicillin, kanamycin and streptomycin on LB solid plates, cultured at 37°C for 24h.

1.8菌落PCR鉴定靶向crRNA3的效率1.8 Efficiency of colony PCR identification targeting crRNA3

使用表1中位于插入位点上下游的引物对crRNA3-F/crRNA3-R和crRNA3-R/T7lacZcr3-R，通过菌落PCR，验证靶向crRNA3的两个位点的效率。Using the primer pairs crRNA3-F/crRNA3-R and crRNA3-R/T7lacZcr3-R located upstream and downstream of the insertion site in Table 1, verify the efficiency of targeting the two sites of crRNA3 by colony PCR.

菌落PCR反应体系(10μL)：Colony PCR reaction system (10μL):

PCRMix购自诺维赞公司。PCRMix was purchased from Novizan.

PCR反应条件：PCR reaction conditions:

质粒pDonorPtr上的供体插入片段包括LE(Left end)、RE(Right end)和货物基因CmR片段(无启动子的氯霉素抗性基因片段)，共1433bp，阳性条带1601/1759bp，阴性条带168/326bp。经统计，16个克隆均在两位点有插入。凝胶电泳图如图1所示。The donor insert on plasmid pDonorPtr includes LE (Left end), RE (Right end) and cargo gene CmR fragment (chloramphenicol resistance gene fragment without promoter), a total of 1433bp, positive band 1601/1759bp, negative Band 168/326bp. According to statistics, all 16 clones had insertions at two sites. The gel electrophoresis picture is shown in Figure 1.

图1清楚地显示了货物基因CmR片段的条带，通过将该CmR片段靶向克隆于大肠杆菌BL21Star^TM(DE3)基因组lacZ和T7RNA聚合酶前lacZ，证实了来源于半透明假交替单胞菌KMM520的CRISPR相关转座酶具有可编程的转座活性。Figure 1 clearly shows the band of the CmR fragment of the cargo gene, which was confirmed to be derived from Pseudoalteromonas translucidus by targeted cloning of the CmR fragment into the lacZ of the Escherichia coli BL21Star ^TM (DE3) genome and the pre-lacZ of T7 RNA polymerase The CRISPR-associated transposase KMM520 has programmable transposition activity.

实施例2用array靶向基因组8个不同位点实现多拷贝整合Example 2 Using array to target 8 different sites in the genome to achieve multi-copy integration

2.1质粒pQCasTnsPtr-array8的构建2.1 Construction of plasmid pQCasTnsPtr-array8

参照专利文献CN202010083919.7中实施例2的方法，将靶向大肠杆菌BL21Star^TM(DE3)基因组8个不同位点的crRNA组合，形成由9个固定正向重复序列与8个靶向不同位点的spacer间隔排列的array序列，插入至pQCasTnsPtr质粒的NcoI和BamHI位点之间，构建质粒pQCasTnsPtr-array8。array序列合成以及上述质粒构建由南京金斯瑞生物科技有限公司完成。Referring to the method of Example 2 in the patent document CN202010083919.7, the crRNA targeting 8 different sites of the Escherichia coli BL21Star ^TM (DE3) genome was combined to form 9 fixed direct repeat sequences and 8 targeting different sites The array sequences arranged at spacer intervals were inserted between the NcoI and BamHI sites of the pQCasTnsPtr plasmid to construct the plasmid pQCasTnsPtr-array8. The array sequence synthesis and the above plasmid construction were completed by Nanjing GenScript Biotechnology Co., Ltd.

2.2转化转座工具质粒与诱导转座2.2 Transformation of transposition tool plasmid and induction of transposition

电转化质粒pDonorPtr、pQCasTnsPtr-array8至大肠杆菌BL21Star^TM(DE3)，转化操作同步骤1.7。平板菌落诱导转座与传代操作同步骤1.7。质粒pDonorPtr中的货物基因是绿色荧光蛋白GFP基因(约1.29kb)。Electrotransform the plasmids pDonorPtr and pQCasTnsPtr-array8 into Escherichia coli BL21Star ^TM (DE3), and the transformation operation is the same as step 1.7. Plate colony-induced transposition and subculture operations are the same as step 1.7. The cargo gene in the plasmid pDonorPtr is the green fluorescent protein GFP gene (about 1.29kb).

2.3菌落PCR鉴定基因组8位点插入情况2.3 Colony PCR identification of 8-site insertion in the genome

在基因组8个位点的上下游处分别设计正向与反向引物，即验证所需的引物，序列见表2。Forward and reverse primers were designed at the upstream and downstream of the 8 genome sites, namely the primers required for verification. The sequences are shown in Table 2.

表2、验证基因组中8个位点插入所需引物序列Table 2. The sequence of primers required to verify the insertion of 8 sites in the genome

PCR体系与反应条件同步骤1.8。The PCR system and reaction conditions are the same as step 1.8.

通过菌落PCR与核酸凝胶电泳验证各克隆基因组8位点的插入情况，如图2所示，目的菌落中一次即可完成8个位点的货物基因全部插入，货物基因拷贝数具体分布如图3所示。表明利用新型CRISPR相关转座酶，货物基因插入效率为100％，高于来源于霍乱弧菌Tn6677的CRISPR相关转座酶。Colony PCR and nucleic acid gel electrophoresis were used to verify the insertion of 8 loci in the genome of each clone. As shown in Figure 2, all cargo genes at 8 loci can be inserted in the target colony at one time, and the specific distribution of cargo gene copy numbers is shown in the figure 3. It shows that using the novel CRISPR-associated transposase, the cargo gene insertion efficiency is 100%, which is higher than that of the CRISPR-associated transposase derived from Vibrio cholerae Tn6677.

实施例3两种CRISPR相关转座酶的正交实验Example 3 Orthogonal experiments of two CRISPR-related transposases

考察来源于Pseudoalteromonas translucida KMM520的新型CRISPR相关转座酶与和来源于霍乱弧菌Tn6677的CRISPR相关转座酶的正交性。The orthogonality between a novel CRISPR-associated transposase from Pseudoalteromonas translucida KMM520 and a CRISPR-associated transposase from Vibrio cholerae Tn6677 was investigated.

3.1质粒pQCasTnsPtr-nagman和pQCasTnsVch-cr3质粒构建3.1 Plasmid pQCasTnsPtr-nagman and pQCasTnsVch-cr3 plasmid construction

将靶向大肠杆菌BL21Star^TM(DE3)基因组nagB、nagE和manX的crRNA组合，形成由5个固定正向重复序列与4个靶向不同位点的spacer间隔排列的array序列，插入至pQCasTnsPtr质粒的NcoI和BamHI位点之间，构建质粒pQCasTnsPtr-nagman。array序列合成以及上述质粒构建由南京金斯瑞生物科技有限公司完成。Combine the crRNA targeting Escherichia coli BL21Star ^TM (DE3) genome nagB, nagE and manX to form an array sequence consisting of 5 fixed direct repeats and 4 spacers targeting different sites, and insert it into the pQCasTnsPtr plasmid Between the NcoI and BamHI sites, the plasmid pQCasTnsPtr-nagman was constructed. The array sequence synthesis and the above plasmid construction were completed by Nanjing GenScript Biotechnology Co., Ltd.

pQCasTnsVch-cr3质粒构建方法同步骤1.4、1.5、1.6，质粒pQCasTnsVch来源于本实验室。The pQCasTnsVch-cr3 plasmid construction method is the same as steps 1.4, 1.5, and 1.6, and the plasmid pQCasTnsVch comes from our laboratory.

质粒构建及鉴定所用引物列于表3。The primers used for plasmid construction and identification are listed in Table 3.

表3、质粒pQCasTnsPtr-nagman和pQCasTnsVch-cr3构建及鉴定所用引物Table 3. Primers used for construction and identification of plasmids pQCasTnsPtr-nagman and pQCasTnsVch-cr3

序号serial number 引物Primer 序列(5'→3')Sequence (5'→3') 11 Vchcr3-FVchcr3-F ATAACTATCCCATTACGGTCAATCCGCCGTTTGTTCCGATAACTATCCCATTACGGTCAATCCGCCGTTTGTTCCG 22 Vchcr3-RVchcr3-R TTCACGGAACAAACGGCGGATTGACCGTAATGGGATAGTTCACGGAACAAACGGCGGATTGACCGTAATGGGATAG 33 tetR-seqtetR-seq AGTGAGTATGGTGCCTATCTAGTGAGTATGGTGCCTATCT 44 crRNA3-FcrRNA3-F CACCACAGATGAAACGCCGCACCACAGATGAAACGCCG 55 crRNA3-RcrRNA3-R CATCTACACCAACGTGACCTATCCCATCTACACCAACGTGACCTATCC 66 T7lacZcr3-RT7lacZcr3-R CACCCATCTCGTAAGACTCATGCACCCATCTCGTAAGACTCATG 77 nagEFwxya AACCTCGCATTAATCTTCGCAACCTCGCATTAATCTTCGC 88 nagERnagER GCAGTAACAGCAACAGAAAGCAGCAGTAACAGCAACAGAAAGCA 99 nagBFwxya TGCAATATGACCGTCGTTACCTGCAATATGACCGTCGTTACC 1010 nagBRwxya TGAGACTGATCCCCCTGACTTGAGACTGATCCCCCCTGACT 1111 manXFman XF GAAATGCTGTTAGGCGAGCAGAAATGCTGTTAGGCGAGCA 1212 manXRmanXR TGAATCAGACGGTCGTCGATTGAATCAGACGGTCGTCGAT 1313 manXF2manXF2 GATGAAGTGGCTGCGGATACGATGAAGTGGCTGCGGATAC 1414 manXR2manXR2 CCTTACGGACTTCCAGCTCACCTTACGGACTTCCAGCTCCA

3.2转化转座工具质粒与诱导转座3.2 Transformation of transposition tool plasmid and induction of transposition

电转化质粒pDonorPtr、pQCasTnsPtr-nagman、pDonorVch和pQCasTnsVch-cr3至大肠杆菌BL21Star^TM(DE3)，转化操作同步骤1.7。平板菌落诱导转座与传代操作同步骤1.7。其中pDonorPtr携带的货物基因是绿色荧光蛋白GFP基因(约1.29kb)，pDonorVch携带的货物基因是终止子序列(约0.64kb)。Electrotransform plasmids pDonorPtr, pQCasTnsPtr-nagman, pDonorVch and pQCasTnsVch-cr3 into Escherichia coli BL21Star ^TM (DE3), and the transformation operation was the same as step 1.7. Plate colony-induced transposition and subculture operations are the same as step 1.7. The cargo gene carried by pDonorPtr is the green fluorescent protein GFP gene (about 1.29kb), and the cargo gene carried by pDonorVch is the terminator sequence (about 0.64kb).

3.3菌落PCR鉴定基因组插入情况3.3 Identification of Genome Insertion by Colony PCR

用来源于霍乱弧菌Tn6677的CRISPR相关转座酶靶向基因组lacZ和T7RNA聚合酶前lacZ，货物基因大小0.64kb。设计pQCasTnsPtr的crRNA阵列靶向基因组nagB、nagE和manX，货物基因绿色荧光蛋白GFP基因(约1.29kb)。经过大肠杆菌转座实验后，用lacZ上靶位点上下游引物PCR检测，阳性条带大小应为1.0kb和1.17kb，阴性条带大小应为0.17kb和0.33kb；用nagB、nagE和manX上靶位点上下游引物PCR检测，阳性条带大小约为2.47kb、2.70kb、2.49kb和2.36kb阴性条带大小应为0.43kb、0.66kb、0.46kb和0.33kb。结果如图4所示，霍乱弧菌Tn6677来源的CRISPR相关转座酶与pQCasTnsPtr均分别靶向对应位点，插入对应货物基因(pDonorPtr的货物基因是绿色荧光蛋白GFP基因，pDonorVch携带的货物基因是终止子序列)，获得2*4拷贝的菌株，效率约100％，该结果经过测序验证，参见图4。Targeting of genomic lacZ and T7 RNA polymerase pre-lacZ with a CRISPR-associated transposase derived from Vibrio cholerae Tn6677 with a cargo gene size of 0.64 kb. The crRNA array of pQCasTnsPtr was designed to target the genomes nagB, nagE and manX, and the cargo gene green fluorescent protein GFP gene (about 1.29kb). After the E. coli transposition experiment, use the upstream and downstream primers of the target site on lacZ for PCR detection, the positive band size should be 1.0kb and 1.17kb, and the negative band size should be 0.17kb and 0.33kb; use nagB, nagE and manX PCR detection of primers upstream and downstream of the upper target site, the positive band size is about 2.47kb, 2.70kb, 2.49kb and 2.36kb, and the negative band size should be 0.43kb, 0.66kb, 0.46kb and 0.33kb. The results are shown in Figure 4. Both the CRISPR-related transposase and pQCasTnsPtr derived from Vibrio cholerae Tn6677 target the corresponding sites, and the corresponding cargo genes are inserted (the cargo gene of pDonorPtr is the green fluorescent protein GFP gene, and the cargo gene carried by pDonorVch is terminator sequence) to obtain 2*4 copies of the strain with an efficiency of about 100%, which was verified by sequencing, see Figure 4.

图4显示了菌株基因组中6个位点的Ptr携带的货物基因GFP和Vch携带的货物基因“终止子序列”插入情况，表明两种CRISPR相关转座酶可以在同一大肠杆菌内使用，且可以互不干扰的发挥功能，从而为加速代谢工程菌株构建提供了一种选择方案。Figure 4 shows the insertion of the cargo gene GFP carried by Ptr and the "terminator sequence" of the cargo gene carried by Vch at 6 sites in the strain genome, indicating that the two CRISPR-related transposases can be used in the same E. coli, and can Function without interfering with each other, thus providing an option for accelerating the construction of metabolic engineering strains.

实施例4验证CRISPR相关转座酶在需钠弧菌的转座活性Example 4 Verifying the transposition activity of CRISPR-associated transposases in Narophilicus

通过靶向需钠弧菌ATCC14048基因组wbfF基因，证实来源于半透明假交替单胞菌KMM520的CRISPR相关酶在需钠弧菌具有可编程的转座活性。By targeting the genome wbfF gene of Narophilus ATCC14048, it was confirmed that a CRISPR-associated enzyme derived from Pseudoalteromonas translucidus KMM520 has programmable transposition activity in Narrative Vibrio.

质粒pVnQCasTnsPtr包含来源于Pseudoalteromonas translucida KMM520的基因tnsA(SEQ ID NO:8)、tnsB(SEQ ID NO:9)、tnsC(SEQ ID NO:10)、tniQ(SEQ ID NO:11)、Cas5/8(SEQ ID NO:12)、Cas7(SEQ ID NO:14)和Cas6(SEQ ID NO:13)、靶向基因组目标位点的crRNA序列、p15A复制子、启动子例如脱水四环素诱导型启动子、氯霉素抗性基因，质粒结构如图12所示。Plasmid pVnQCasTnsPtr comprises gene tnsA (SEQ ID NO:8), tnsB (SEQ ID NO:9), tnsC (SEQ ID NO:10), tniQ (SEQ ID NO:11), Cas5/8 ( SEQ ID NO: 12), Cas7 (SEQ ID NO: 14) and Cas6 (SEQ ID NO: 13), crRNA sequences targeting genomic target sites, p15A replicon, promoters such as anhydrocycline-inducible promoters, chloride Mycin resistance gene, the plasmid structure is shown in Figure 12.

质粒pDonorPtr-GFP包含来源于Pseudoalteromonas translucida KMM520的基因LE(SEQ ID NO:19)和RE(SEQ ID NO:20)序列，pMB1复制子、氨苄青霉素抗性基因、目标货物基因(Cargo)无启动子的绿色荧光蛋白基因片段。Plasmid pDonorPtr-GFP contains gene LE (SEQ ID NO:19) and RE (SEQ ID NO:20) sequences derived from Pseudoalteromonas translucida KMM520, pMB1 replicon, ampicillin resistance gene, target cargo gene (Cargo) without promoter The green fluorescent protein gene fragment.

构建靶向基因组wbfF的pVnQCasTnsPtr-wbfF质粒及验证引物列于表4。其中质粒名称pVnQCasTnsPtr-wbfF中的后缀“-wbfF”代表构建靶向基因组wbfF。pVnQCasTnsPtr-wbfF质粒构建方法同步骤1.4、1.5、1.6。The construction of pVnQCasTnsPtr-wbfF plasmid targeting genome wbfF and the verified primers are listed in Table 4. The suffix "-wbfF" in the plasmid name pVnQCasTnsPtr-wbfF represents the construction of targeted genome wbfF. The pVnQCasTnsPtr-wbfF plasmid construction method is the same as steps 1.4, 1.5, and 1.6.

表4：引物序列Table 4: Primer sequences

编号Numbering 引物Primer 序列(5'→3')Sequence (5'→3') 11 wbfF-FwbfF-F GAAATCTCTCTTTGGTATCTCTATCAGTCTACTCTAAGGAAATCTCTCTTTGGTATCTCTATCAGTCTACTCTAAG 22 wbfF-Rwxya TTCACTTAGAGTAGACTGATAGAGATACCAAAGAGAGATTCACTTAGAGTAGACTGATAGAGATACCAAAAGAGAGA 33 wbfF-test-FwbfF-test-F TCAGCAGAACAACGACACTCAGTCAGCAGAACAACGACACTCAG 44 wbfF-test-RwbfF-test-R TATTGCCTTGGTTATCTACCGTGACTATTGCCTTGGTTATTCTACCGTGAC 55 tetR-seqtetR-seq AGTGAGTATGGTGCCTATCTAGTGAGTATGGTGCCTATCT

4.1转化转座工具质粒与诱导转座4.1 Transformation of transposition tool plasmid and induction of transposition

电转化pDonorPtr-GFP至需钠弧菌ATCC14048，30℃条件下在含氨苄青霉素的LBv2固体平板上进行筛选。挑选阳性克隆子，制备成电转感受态细胞后，电转化pVnQCasTnsPtr-wbfF至上述菌中，30℃条件下在含氨苄青霉素、氯霉素的LBv2固体平板上进行筛选，获得含有pDonorPtr-GFP与pVnQCasTnsPtr-wbfF的需钠弧菌ATCC14048菌株。将上述平板上的克隆刮取一部分重悬于液体LBv2培养基中，重新涂布于含有终浓度100ng/ml脱水四环素、氨苄青霉素与氯霉素的LBv2固体平板上，脱水四环素负责诱导转座相关酶的表达。30℃条件下培养12h，可能会形成一层菌膜，属正常情况。将上述含有100ng/ml脱水四环素平板上的克隆刮取一部分重悬于液体LBv2培养基中，调整OD₆₀₀至约0.5后，用液体LBv2培养基稀释50倍，吸取100μL涂布于添加了终浓度1000ng/ml脱水四环素、氨苄青霉素与氯霉素的LBv2固体平板上，30℃条件下培养12h。Electrotransform pDonorPtr-GFP into sodium vibrio ATCC14048, and select on LBv2 solid plate containing ampicillin at 30°C. Select the positive clones and make them into competent cells for electroporation, then electrotransform pVnQCasTnsPtr-wbfF into the above bacteria, screen on LBv2 solid plates containing ampicillin and chloramphenicol at 30°C, and obtain pDonorPtr-GFP and pVnQCasTnsPtr -wbfF of the Narophilicus ATCC14048 strain. Scraped a part of the clones on the above plate, resuspended in liquid LBv2 medium, and reapplied on LBv2 solid plate containing anhydrotetracycline, ampicillin and chloramphenicol at a final concentration of 100ng/ml. Anhydrotetracycline is responsible for inducing transposition association Enzyme expression. After culturing at 30°C for 12 hours, a layer of bacterial film may form, which is normal. Scrape a part of the clones on the plate containing 100ng/ml anhydrotetracycline and resuspend in liquid LBv2 medium, adjust OD ₆₀₀ to about 0.5, dilute 50 times with liquid LBv2 medium, draw 100μL and spread to the final concentration 1000ng/ml anhydrotetracycline, ampicillin and chloramphenicol on LBv2 solid plate, cultured at 30°C for 12h.

4.2菌落PCR鉴定靶向wbfF基因的效率4.2 Efficiency of colony PCR identification targeting wbfF gene

使用表4中位于插入位点上下游的引物对wbfF-test-F/wbfF-test-R，通过菌落PCR，验证靶向wbfF的两个位点的效率。菌落PCR方法同步骤1.8。The efficiency of targeting the two sites of wbfF was verified by colony PCR using the primer pair wbfF-test-F/wbfF-test-R located upstream and downstream of the insertion site in Table 4. The colony PCR method is the same as step 1.8.

质粒pDonorPtr-GFP上的供体插入片段包括LE(Left end)、RE(Right end)和货物基因GFP片段(无启动子的绿色荧光蛋白基因片段)，共720bp，阳性条带1220bp，阴性条带500bp。经统计，16个克隆均有插入。证实了来源于半透明假交替单胞菌KMM520的CRISPR相关转座酶在需钠弧菌中也具有可编程的转座活性。The donor insert on the plasmid pDonorPtr-GFP includes LE (Left end), RE (Right end) and cargo gene GFP fragment (green fluorescent protein gene fragment without promoter), a total of 720bp, positive band 1220bp, negative band 500bp. According to statistics, all 16 clones had insertions. We demonstrate that a CRISPR-associated transposase derived from Pseudoalteromonas translucidum KMM520 also has programmable transposition activity in Narophilicus.

实施例5验证CRISPR相关转座酶在谷氨酸棒杆菌中的转座活性Example 5 verifies the transposition activity of CRISPR-related transposase in Corynebacterium glutamicum

实施例中所使用的质粒pCgQCasTnsPtr包含来源于Pseudoalteromonastranslucida KMM520的基因tnsA(SEQ ID NO:8)、tnsB(SEQ ID NO:9)、tnsC(SEQ ID NO:10)、tniQ(SEQ ID NO:11)、Cas5/8(SEQ ID NO:12)、Cas7(SEQ ID NO:14)和Cas6(SEQ IDNO:13)、上述基因全部进行谷氨酸棒杆菌密码子优化，靶向基因组目标位点的crRNA序列、ColA复制子、pBL1ts复制子、启动子例如脱水四环素诱导型启动子、卡那霉素抗性基因，其结构如图13所示。The plasmid pCgQCasTnsPtr used in the examples comprises genes tnsA (SEQ ID NO:8), tnsB (SEQ ID NO:9), tnsC (SEQ ID NO:10), tniQ (SEQ ID NO:11) derived from Pseudoalteromonastranslucida KMM520 , Cas5/8 (SEQ ID NO: 12), Cas7 (SEQ ID NO: 14) and Cas6 (SEQ ID NO: 13), the above-mentioned genes are all codon-optimized in Corynebacterium glutamicum, targeting the crRNA at the genomic target site The structure of sequence, ColA replicon, pBL1ts replicon, promoter such as anhydrocycline-inducible promoter, kanamycin resistance gene is shown in FIG. 13 .

质粒pCgDonorPtr包含来源于Pseudoalteromonas translucida KMM520的基因LE(SEQ ID NO:19)和RE(SEQ ID NO:20)序列，pMB1复制子、pGA1复制子、壮观霉素抗性基因、目标货物基因Cargo例如无启动子的氯霉素抗性CmR基因片段，其结构如图14所示。Plasmid pCgDonorPtr contains gene LE (SEQ ID NO: 19) and RE (SEQ ID NO: 20) sequences derived from Pseudoalteromonas translucida KMM520, pMB1 replicon, pGA1 replicon, spectinomycin resistance gene, target cargo gene Cargo, for example, no The structure of the chloramphenicol-resistant CmR gene fragment of the promoter is shown in FIG. 14 .

通过靶向谷氨酸棒杆菌ATCC13032基因组crtYf基因，证实来源于半透明假交替单胞菌KMM520的CRISPR相关酶在谷氨酸棒杆菌中的转座活性。The transposition activity of CRISPR-associated enzymes from Pseudomonas translucidum KMM520 in Corynebacterium glutamicum was confirmed by targeting the crtYf gene of Corynebacterium glutamicum ATCC13032 genome.

构建靶向基因组crtYf基因的crRNA序列为：The crRNA sequence targeting the genome crtYf gene is constructed as follows:

AGGCAACCATAGGGCAGGAATCAGAAGTACTG。AGGCAACCATAGGGCAGGAATCAGAAGTACTG.

5.1转化转座工具质粒与诱导转座5.1 Transformation of transposition tool plasmid and induction of transposition

电转化pCgDonorPtr与pCgQCasTnsPtr至谷氨酸棒杆菌ATCC13032，30℃条件下在含壮观霉素与卡那霉素的BHIS固体平板上进行筛选，获得含有pCgDonorPtr与pCgQCasTnsPtr的谷氨酸棒杆菌ATCC13032菌株。将上述平板上的克隆刮取一部分重悬于液体BHIS培养基中，重新涂布于含有终浓度100ng/ml脱水四环素、壮观霉素与卡那霉素的BHIS固体平板上，脱水四环素负责诱导转座相关酶的表达。30℃条件下培养24h，可能会形成一层菌膜，属正常情况。将上述含有100ng/ml脱水四环素平板上的克隆刮取一部分重悬于液体BHIS培养基中，调整OD₆₀₀至约0.5后，用液体BHIS培养基稀释50倍，吸取100μL涂布于添加了终浓度1000ng/ml脱水四环素、壮观霉素与卡那霉素的BHIS固体平板上，30℃条件下培养48h。Electrotransformation of pCgDonorPtr and pCgQCasTnsPtr into Corynebacterium glutamicum ATCC13032 was performed at 30°C on a BHIS solid plate containing spectinomycin and kanamycin to obtain Corynebacterium glutamicum ATCC13032 strains containing pCgDonorPtr and pCgQCasTnsPtr. Scraped a part of the clones on the above plate, resuspended in liquid BHIS medium, and reapplied on the BHIS solid plate containing anhydrotetracycline, spectinomycin and kanamycin at a final concentration of 100ng/ml. Anhydrotetracycline is responsible for inducing transformation. Expression of loci-associated enzymes. After culturing at 30°C for 24 hours, a layer of bacterial film may form, which is normal. Scrape a part of the clones on the plate containing 100ng/ml anhydrotetracycline and resuspend in liquid BHIS medium, adjust the OD ₆₀₀ to about 0.5, dilute 50 times with liquid BHIS medium, draw 100μL and apply to the final concentration 1000ng/ml anhydrotetracycline, spectinomycin and kanamycin on the BHIS solid plate, cultured at 30°C for 48h.

5.2菌落PCR鉴定靶向crtYf的效率5.2 Efficiency of colony PCR identification targeting crtYf

使用位于插入位点上下游的引物对F-crtYf(TGCTGTGGGAACTTTTCGGT)和R-crtYf(ACTACCACTCCCGAGGTTGA)，通过菌落PCR，验证靶向crtYf的效率。The efficiency of targeting crtYf was verified by colony PCR using the primer pair F-crtYf (TGCTGTGGGAACTTTTCGGT) and R-crtYf (ACTACCACTCCCGAGGTTGA) located upstream and downstream of the insertion site.

质粒pDonorPtr上的供体插入片段包括LE(Left end)、RE(Right end)和货物基因CmR片段(无启动子的氯霉素抗性基因片段)，共1433bp，阳性条带2110bp，阴性条带677bp。经统计，6个克隆均成功插入。凝胶电泳图如图15所示，证实了来源于半透明假交替单胞菌KMM520的CRISPR相关转座酶在谷氨酸棒杆菌中的转座活性。The donor insert on plasmid pDonorPtr includes LE (Left end), RE (Right end) and cargo gene CmR fragment (chloramphenicol resistance gene fragment without promoter), a total of 1433bp, positive band 2110bp, negative band 677bp. According to statistics, all 6 clones were inserted successfully. The gel electrophoresis image is shown in Figure 15, confirming the transposition activity of the CRISPR-associated transposase derived from Pseudoalteromonas translucenta KMM520 in Corynebacterium glutamicum.

序列表sequence listing

<110> 中国科学院分子植物科学卓越创新中心<110> Center for Excellence in Molecular Plant Science, Chinese Academy of Sciences

<120> 一种新型CRISPR相关转座酶<120> A novel CRISPR-associated transposase

<130> SHPI2110093<130> SHPI2110093

<160> 22<160> 22

<170> SIPOSequenceListing 1.0<170> SIPOSequenceListing 1.0

<210> 1<210> 1

<211> 209<211> 209

<212> PRT<212> PRT

<213> Pseudoalteromonas translucida KMM520<213> Pseudoalteromonas translucida KMM520

<400> 1<400> 1

Met Tyr Arg Arg Lys Leu Lys Tyr Ser Arg Val Lys Asn Leu His LysMet Tyr Arg Arg Lys Leu Lys Tyr Ser Arg Val Lys Asn Leu His Lys

1 5 10 151 5 10 15

Phe Ala Ser Gln Lys Asn Lys Ser Thr Cys Leu Val Glu Ser Ser LeuPhe Ala Ser Gln Lys Asn Lys Ser Thr Cys Leu Val Glu Ser Ser Ser Leu

20 25 30 20 25 30

Glu Phe Asp Ala Cys Phe His Phe Glu Phe Ser Pro Pro Ile Ala AlaGlu Phe Asp Ala Cys Phe His Phe Glu Phe Ser Pro Pro Ile Ala Ala

35 40 45 35 40 45

Phe Glu Ala Gln Pro Leu Gly Tyr Glu Tyr Glu Phe Asp Asn Arg IlePhe Glu Ala Gln Pro Leu Gly Tyr Glu Tyr Glu Phe Asp Asn Arg Ile

50 55 60 50 55 60

Cys Arg Tyr Thr Pro Asp Phe Leu Leu Thr His Thr Asp Gly Thr GlnCys Arg Tyr Thr Pro Asp Phe Leu Leu Thr His Thr Asp Gly Thr Gln

65 70 75 8065 70 75 80

Lys Phe Ile Glu Val Lys Pro Gln Ser Lys Ile Ala Asp Glu Asp PheLys Phe Ile Glu Val Lys Pro Gln Ser Lys Ile Ala Asp Glu Asp Phe

85 90 95 85 90 95

Arg Ala Arg Phe Ile Glu Lys Gln Ala Ile Ala Lys Gln Asp Gly ArgArg Ala Arg Phe Ile Glu Lys Gln Ala Ile Ala Lys Gln Asp Gly Arg

100 105 110 100 105 110

Asp Leu Ile Leu Val Thr Asp Lys Gln Ile Arg Val Tyr Pro Thr LeuAsp Leu Ile Leu Val Thr Asp Lys Gln Ile Arg Val Tyr Pro Thr Leu

115 120 125 115 120 125

Asn Asn Leu Lys Leu Leu His Arg Tyr Ser Gly Phe Gln Ser Leu ThrAsn Asn Leu Lys Leu Leu His Arg Tyr Ser Gly Phe Gln Ser Leu Thr

130 135 140 130 135 140

Glu Leu Gln Ala Ser Val Leu Glu Leu Val Lys Gln Tyr Gly Ser IleGlu Leu Gln Ala Ser Val Leu Glu Leu Val Lys Gln Tyr Gly Ser Ile

145 150 155 160145 150 155 160

Lys Val Gly Gln Leu Ile Arg Tyr Leu Lys Val Thr Ala Gly Glu LeuLys Val Gly Gln Leu Ile Arg Tyr Leu Lys Val Thr Ala Gly Glu Leu

165 170 175 165 170 175

Leu Ala Thr Val Leu Arg Leu Leu Ser Leu Gly Gln Leu Phe Ala AspLeu Ala Thr Val Leu Arg Leu Leu Ser Leu Gly Gln Leu Phe Ala Asp

180 185 190 180 185 190

Leu Thr Thr Asn Glu Ile Ser Ile Glu Thr Ala Ile Trp Ser Asn AsnLeu Thr Thr Asn Glu Ile Ser Ile Glu Thr Ala Ile Trp Ser Asn Asn

195 200 205 195 200 205

ValVal

<210> 2<210> 2

<211> 607<211> 607

<212> PRT<212> PRT

<400> 2<400> 2

Met Phe Asn Asn Asp Leu Phe Asp Asp Glu Phe Asn Gln Pro Leu ProMet Phe Asn Asn Asp Leu Phe Asp Asp Glu Phe Asn Gln Pro Leu Pro

1 5 10 151 5 10 15

Lys Ala Glu Thr Lys Leu Pro Gln Asn Tyr Thr Lys Asp Leu Gln AlaLys Ala Glu Thr Lys Leu Pro Gln Asn Tyr Thr Lys Asp Leu Gln Ala

20 25 30 20 25 30

Leu Pro Glu Lys Ile Lys Thr Thr Thr Phe Ala Lys Leu Lys Tyr IleLeu Pro Glu Lys Ile Lys Thr Thr Thr Phe Ala Lys Leu Lys Tyr Ile

35 40 45 35 40 45

Gln Trp Leu Glu Ala Asn Ile Gln Gly Gly Trp Thr Gln Lys Asn LeuGln Trp Leu Glu Ala Asn Ile Gln Gly Gly Trp Thr Gln Lys Asn Leu

50 55 60 50 55 60

Glu Pro Leu Leu Lys Leu Met Pro Asp Val Glu Gly Glu Lys Lys ProGlu Pro Leu Leu Lys Leu Met Pro Asp Val Glu Gly Glu Lys Lys Pro

65 70 75 8065 70 75 80

Ser Trp Arg Thr Ala Ala Arg Trp Tyr Ser Ala Tyr Thr Asn Ala AspSer Trp Arg Thr Ala Ala Arg Trp Tyr Ser Ala Tyr Thr Asn Ala Asp

85 90 95 85 90 95

Lys Asn Ile Met Ala Leu Ile Pro Ser His Gln Lys Lys Gly Asn ArgLys Asn Ile Met Ala Leu Ile Pro Ser His Gln Lys Lys Gly Asn Arg

100 105 110 100 105 110

Glu Arg Asp Thr Thr Thr Asp Lys Phe Phe Glu Lys Ala Leu Glu ArgGlu Arg Asp Thr Thr Thr Thr Asp Lys Phe Phe Glu Lys Ala Leu Glu Arg

115 120 125 115 120 125

Tyr Leu Val Lys Glu Lys Pro Ser Val Ala Ser Ala Tyr Lys Phe TyrTyr Leu Val Lys Glu Lys Pro Ser Val Ala Ser Ala Tyr Lys Phe Tyr

130 135 140 130 135 140

Lys Asp Leu Val Ile Ile Glu Asn Asp Ser Val Val Asp Ser Val LeuLys Asp Leu Val Ile Ile Glu Asn Asp Ser Val Val Asp Ser Val Leu

145 150 155 160145 150 155 160

Lys Pro Leu Thr Tyr Lys Ala Phe Lys Asn Arg Ile Asp Asn Leu ProLys Pro Leu Thr Tyr Lys Ala Phe Lys Asn Arg Ile Asp Asn Leu Pro

165 170 175 165 170 175

Gln Tyr Glu Val Met Ile Ala Arg Tyr Gly Lys Arg Leu Ala Asp IleGln Tyr Glu Val Met Ile Ala Arg Tyr Gly Lys Arg Leu Ala Asp Ile

180 185 190 180 185 190

Ala Tyr Asn Lys Val Glu Gly His Lys Arg Pro Ile Arg Val Leu GluAla Tyr Asn Lys Val Glu Gly His Lys Arg Pro Ile Arg Val Leu Glu

195 200 205 195 200 205

Lys Val Glu Ile Asp His Thr Pro Leu Asp Leu Ile Leu Leu Asp AspLys Val Glu Ile Asp His Thr Pro Leu Asp Leu Ile Leu Leu Asp Asp

210 215 220 210 215 220

Glu Leu His Ile Pro Leu Gly Arg Pro Thr Leu Thr Met Leu Val AspGlu Leu His Ile Pro Leu Gly Arg Pro Thr Leu Thr Met Leu Val Asp

225 230 235 240225 230 235 240

Val Tyr Ser His Cys Ile Val Gly Tyr Tyr Phe Ser Phe Ser Glu ProVal Tyr Ser His Cys Ile Val Gly Tyr Tyr Phe Ser Phe Ser Glu Pro

245 250 255 245 250 255

Ser Tyr Asp Ala Val Arg Arg Ala Met Leu Asn Ala Met Lys Pro LysSer Tyr Asp Ala Val Arg Arg Ala Met Leu Asn Ala Met Lys Pro Lys

260 265 270 260 265 270

Ser Glu Val Ala Lys Leu Tyr Pro Asp Thr Ile Asn Glu Trp Lys CysSer Glu Val Ala Lys Leu Tyr Pro Asp Thr Ile Asn Glu Trp Lys Cys

275 280 285 275 280 285

Ala Gly Lys Ile Glu Thr Leu Val Val Asp Asn Gly Ala Glu Phe TrpAla Gly Lys Ile Glu Thr Leu Val Val Asp Asn Gly Ala Glu Phe Trp

290 295 300 290 295 300

Ser Asn Ser Leu Glu Leu Ala Cys Glu Glu Ile Gly Ile Asn Thr GlnSer Asn Ser Leu Glu Leu Ala Cys Glu Glu Ile Gly Ile Asn Thr Gln

305 310 315 320305 310 315 320

Tyr Asn Pro Val Ala Lys Pro Trp Leu Lys Pro Phe Val Glu Arg MetTyr Asn Pro Val Ala Lys Pro Trp Leu Lys Pro Phe Val Glu Arg Met

325 330 335 325 330 335

Phe Gly Thr Ile Asn Thr Glu Leu Leu Asp Pro Val Pro Gly Lys ThrPhe Gly Thr Ile Asn Thr Glu Leu Leu Asp Pro Val Pro Gly Lys Thr

340 345 350 340 345 350

Phe Ser Asn Ile Leu Gln Lys His Glu Tyr Asn Pro Lys Lys Asp AlaPhe Ser Asn Ile Leu Gln Lys His Glu Tyr Asn Pro Lys Lys Asp Ala

355 360 365 355 360 365

Ile Met Arg Phe Thr Thr Phe Met Gln Leu Phe His Lys Trp Val ValIle Met Arg Phe Thr Thr Phe Met Gln Leu Phe His Lys Trp Val Val

370 375 380 370 375 380

Asp Val Tyr His Gln Asp Ala Asp Ser Arg Phe Lys Tyr Ile Pro SerAsp Val Tyr His Gln Asp Ala Asp Ser Arg Phe Lys Tyr Ile Pro Ser

385 390 395 400385 390 395 400

Gln Leu Trp Asp Gln Gly Phe Asn Thr Leu Pro Pro Thr Met Leu SerGln Leu Trp Asp Gln Gly Phe Asn Thr Leu Pro Pro Thr Met Leu Ser

405 410 415 405 410 415

Asp Ala Asp Leu Gln Gln Leu Asp Val Val Leu Ser Ile Ser Asn HisAsp Ala Asp Leu Gln Gln Leu Asp Val Val Leu Ser Ile Ser Asn His

420 425 430 420 425 430

Arg Val Leu Arg Lys Gly Gly Ile Arg Leu Glu Asn Leu Ser Tyr AspArg Val Leu Arg Lys Gly Gly Ile Arg Leu Glu Asn Leu Ser Tyr Asp

435 440 445 435 440 445

Ser Thr Glu Leu Ala Asn Tyr Arg Lys Gln Phe Ser His Lys Val SerSer Thr Glu Leu Ala Asn Tyr Arg Lys Gln Phe Ser His Lys Val Ser

450 455 460 450 455 460

Gln Glu Val Leu Ile Lys Leu Asn Pro Asp Asp Ile Ser Tyr Ile TyrGln Glu Val Leu Ile Lys Leu Asn Pro Asp Asp Ile Ser Tyr Ile Tyr

465 470 475 480465 470 475 480

Val Tyr Leu Asp Lys Leu Glu His Tyr Ile Lys Val Pro Cys Ile AspVal Tyr Leu Asp Lys Leu Glu His Tyr Ile Lys Val Pro Cys Ile Asp

485 490 495 485 490 495

Pro Asn Gly Tyr Thr Gln Asn Leu Ser Leu Asn Gln His Lys Ile AsnPro Asn Gly Tyr Thr Gln Asn Leu Ser Leu Asn Gln His Lys Ile Asn

500 505 510 500 505 510

Ile Arg Ile His Arg Asp Phe Ile Ser Gly Ser Ile Asp Asn Val GlyIle Arg Ile His Arg Asp Phe Ile Ser Gly Ser Ile Asp Asn Val Gly

515 520 525 515 520 525

Leu Ala Lys Ala Arg Met Phe Ile His Asn Lys Ile Gln Asn Glu PheLeu Ala Lys Ala Arg Met Phe Ile His Asn Lys Ile Gln Asn Glu Phe

530 535 540 530 535 540

Glu Glu Leu Lys Asn Ala Pro Lys His Ser Lys Val Lys Gly Gly LysGlu Glu Leu Lys Asn Ala Pro Lys His Ser Lys Val Lys Gly Gly Lys

545 550 555 560545 550 555 560

Ala Leu Ala Lys His Gln Asn Ile Ser Ser Asp Ser Gln Lys Ser IleAla Leu Ala Lys His Gln Asn Ile Ser Ser Asp Ser Gln Lys Ser Ile

565 570 575 565 570 575

Thr His Ser Lys Pro Val Glu Ala Lys Lys Val Thr Pro Lys Glu GlnThr His Ser Lys Pro Val Glu Ala Lys Lys Val Thr Pro Lys Glu Gln

580 585 590 580 585 590

Pro Thr Asp Ser Trp Asp Asp Phe Ile Ser Asp Leu Asp Gly PhePro Thr Asp Ser Trp Asp Asp Phe Ile Ser Asp Leu Asp Gly Phe

595 600 605 595 600 605

<210> 3<210> 3

<211> 333<211> 333

<212> PRT<212> PRT

<400> 3<400> 3

Met Leu Thr Asp Lys Gln Lys Glu Lys Leu Asn Glu Phe Arg Asp ValMet Leu Thr Asp Lys Gln Lys Glu Lys Leu Asn Glu Phe Arg Asp Val

1 5 10 151 5 10 15

Phe Ile Glu Tyr Pro Ile Ile Thr Thr Ile Phe Asn Asp Phe Asp ArgPhe Ile Glu Tyr Pro Ile Ile Thr Thr Ile Phe Asn Asp Phe Asp Arg

20 25 30 20 25 30

Leu Arg Leu Gly Lys Gly Leu Thr Gly Glu Lys Pro Cys Met Leu LeuLeu Arg Leu Gly Lys Gly Leu Thr Gly Glu Lys Pro Cys Met Leu Leu

35 40 45 35 40 45

Asn Gly Asp Thr Gly Thr Gly Lys Thr Ala Leu Ile Lys Gln Tyr LysAsn Gly Asp Thr Gly Thr Gly Lys Thr Ala Leu Ile Lys Gln Tyr Lys

50 55 60 50 55 60

Glu Arg His Leu Pro Gln Phe Ile Asn Gly Val Met Asn His Pro ValGlu Arg His Leu Pro Gln Phe Ile Asn Gly Val Met Asn His Pro Val

65 70 75 8065 70 75 80

Leu Val Ser Arg Ile Pro Ser Asn Pro Thr Leu Glu Ser Thr Leu AlaLeu Val Ser Arg Ile Pro Ser Asn Pro Thr Leu Glu Ser Thr Leu Ala

85 90 95 85 90 95

Glu Leu Leu Lys Asp Leu Gly Gln Val Gly Ser Thr Glu Arg Lys LeuGlu Leu Leu Lys Asp Leu Gly Gln Val Gly Ser Thr Glu Arg Lys Leu

100 105 110 100 105 110

Arg Ile Asn Gly Thr Arg Leu Thr Thr Ser Leu Ile Lys Cys Leu LysArg Ile Asn Gly Thr Arg Leu Thr Thr Ser Leu Ile Lys Cys Leu Lys

115 120 125 115 120 125

Thr Cys Gly Thr Glu Leu Ile Ile Ile Asp Glu Phe Gln Glu Leu IleThr Cys Gly Thr Glu Leu Ile Ile Ile Asp Glu Phe Gln Glu Leu Ile

130 135 140 130 135 140

Glu His Asn Gln Gly Lys Lys Arg Arg Glu Ile Ala Asn Arg Leu LysGlu His Asn Gln Gly Lys Lys Arg Arg Glu Ile Ala Asn Arg Leu Lys

145 150 155 160145 150 155 160

Tyr Ile Asn Asp Glu Ala Gly Val Ser Ile Val Leu Val Gly Met ProTyr Ile Asn Asp Glu Ala Gly Val Ser Ile Val Leu Val Gly Met Pro

165 170 175 165 170 175

Trp Ala Glu Lys Ile Ala Asp Glu Pro Gln Trp Ser Ser Arg Leu LeuTrp Ala Glu Lys Ile Ala Asp Glu Pro Gln Trp Ser Ser Arg Leu Leu

180 185 190 180 185 190

Ile Arg Arg Gln Leu Pro Tyr Phe Lys Leu Ser Glu Asn Pro Lys HisIle Arg Arg Gln Leu Pro Tyr Phe Lys Leu Ser Glu Asn Pro Lys His

195 200 205 195 200 205

Phe Val Gln Leu Ile Ile Gly Leu Ala Asn Arg Met Pro Phe Ala GluPhe Val Gln Leu Ile Ile Gly Leu Ala Asn Arg Met Pro Phe Ala Glu

210 215 220 210 215 220

Lys Pro Asn Leu Ser Glu Gln Ala Thr Val Phe Thr Leu Phe Ser LeuLys Pro Asn Leu Ser Glu Gln Ala Thr Val Phe Thr Leu Phe Ser Leu

225 230 235 240225 230 235 240

Ser Lys Gly Cys Phe Arg Thr Leu Lys Tyr Phe Leu Asp Asp Ala ValSer Lys Gly Cys Phe Arg Thr Leu Lys Tyr Phe Leu Asp Asp Ala Val

245 250 255 245 250 255

Leu Tyr Ala Leu Met Asp Asn Ala Lys Thr Leu Thr Thr Lys His LeuLeu Tyr Ala Leu Met Asp Asn Ala Lys Thr Leu Thr Thr Lys His Leu

260 265 270 260 265 270

Val Lys Ala Phe Glu Val Leu Phe Pro Asp Val Pro Asn Leu Phe ThrVal Lys Ala Phe Glu Val Leu Phe Pro Asp Val Pro Asn Leu Phe Thr

275 280 285 275 280 285

Leu Pro Val Ala Glu Ile Thr Ala Ser Glu Val Glu Arg Tyr Ser LeuLeu Pro Val Ala Glu Ile Thr Ala Ser Glu Val Glu Arg Tyr Ser Leu

290 295 300 290 295 300

Tyr Lys Pro Glu Ser Ser Gln Asp Glu Asp Pro Phe Ile Ala Thr LysTyr Lys Pro Glu Ser Ser Gln Asp Glu Asp Pro Phe Ile Ala Thr Lys

305 310 315 320305 310 315 320

Phe Thr Asp Arg Met Pro Ile Ser Gln Leu Leu Arg LysPhe Thr Asp Arg Met Pro Ile Ser Gln Leu Leu Arg Lys

325 330 325 330

<210> 4<210> 4

<211> 391<211> 391

<212> PRT<212> PRT

<400> 4<400> 4

Met His Phe Leu Val Gln Thr Lys Ser Tyr Pro Asp Glu Ala Leu GluMet His Phe Leu Val Gln Thr Lys Ser Tyr Pro Asp Glu Ala Leu Glu

1 5 10 151 5 10 15

Ser Tyr Leu Leu Arg Leu Ala Arg Asp Asn Ser Tyr Asn Gly Tyr SerSer Tyr Leu Leu Arg Leu Ala Arg Asp Asn Ser Tyr Asn Gly Tyr Ser

20 25 30 20 25 30

Glu Leu Ala Asp Ile Leu Trp Gln Trp Leu Ala Glu Gln Asp Asn GluGlu Leu Ala Asp Ile Leu Trp Gln Trp Leu Ala Glu Gln Asp Asn Glu

35 40 45 35 40 45

Leu Glu Gly Ala Leu Pro Leu Ala Leu Ser Lys Val Asp Val Tyr HisLeu Glu Gly Ala Leu Pro Leu Ala Leu Ser Lys Val Asp Val Tyr His

50 55 60 50 55 60

Ala Arg Gln Ala Ser Ser Phe Arg Ile Arg Ala Leu Lys Leu Val AlaAla Arg Gln Ala Ser Ser Phe Arg Ile Arg Ala Leu Lys Leu Val Ala

65 70 75 8065 70 75 80

Gln Leu Ala Asp Val Asn Ala Gly Asp Ile Leu Ala Leu Ala Trp ArgGln Leu Ala Asp Val Asn Ala Gly Asp Ile Leu Ala Leu Ala Trp Arg

85 90 95 85 90 95

Arg Ser Asn Phe Lys Phe Gly Asn Leu Ala Ala Val Ser Arg Asn GluArg Ser Asn Phe Lys Phe Gly Asn Leu Ala Ala Val Ser Arg Asn Glu

100 105 110 100 105 110

Leu Ala Ile Pro Leu Glu Leu Leu Arg Thr Asp Asn Ile Pro Val CysLeu Ala Ile Pro Leu Glu Leu Leu Arg Thr Asp Asn Ile Pro Val Cys

115 120 125 115 120 125

Ile Lys Cys Leu Ser Glu Ser Ser His Ile Pro Phe Tyr Trp His LeuIle Lys Cys Leu Ser Glu Ser Ser Ser His Ile Pro Phe Tyr Trp His Leu

130 135 140 130 135 140

Lys Pro Tyr Lys Ala Cys His Lys His Lys Ser Gln Leu Ile Thr ArgLys Pro Tyr Lys Ala Cys His Lys His Lys Ser Gln Leu Ile Thr Arg

145 150 155 160145 150 155 160

Cys Lys Glu Cys Tyr Asp Leu Ile Asp Tyr Arg Ala Ser Glu Ala PheCys Lys Glu Cys Tyr Asp Leu Ile Asp Tyr Arg Ala Ser Glu Ala Phe

165 170 175 165 170 175

Leu Glu Cys Val Cys Gly Cys Lys Ile Thr Asn Ser Glu Gln Leu AsnLeu Glu Cys Val Cys Gly Cys Lys Ile Thr Asn Ser Glu Gln Leu Asn

180 185 190 180 185 190

Asp Ala Asp Phe Lys Ile Ala Ile Ala Leu Ala Ser Ser Asn Ser GlnAsp Ala Asp Phe Lys Ile Ala Ile Ala Leu Ala Ser Ser Asn Ser Gln

195 200 205 195 200 205

Lys Ile Val Gly Leu Ile Ser Trp Phe Ala Lys Val Lys Gln Leu AspLys Ile Val Gly Leu Ile Ser Trp Phe Ala Lys Val Lys Gln Leu Asp

210 215 220 210 215 220

Val Ser Asp Ala Asp Phe Asn Cys Ala Phe Val Asp Tyr Phe Asn ThrVal Ser Asp Ala Asp Phe Asn Cys Ala Phe Val Asp Tyr Phe Asn Thr

225 230 235 240225 230 235 240

Trp Pro Glu Ser Leu Thr Thr Glu Leu Asp Leu Leu Thr Asn Asn AlaTrp Pro Glu Ser Leu Thr Thr Glu Leu Asp Leu Leu Thr Asn Asn Ala

245 250 255 245 250 255

Arg Leu Lys Gln Leu Asn Pro Phe Asn Lys Thr Lys Phe Ser Ser ValArg Leu Lys Gln Leu Asn Pro Phe Asn Lys Thr Lys Phe Ser Ser Val

260 265 270 260 265 270

Tyr Gly Asp Leu Ile Arg Asp Gly Gln Ile Ala Ala Thr Ser Asn ArgTyr Gly Asp Leu Ile Arg Asp Gly Gln Ile Ala Ala Thr Ser Asn Arg

275 280 285 275 280 285

Lys Asn Lys Val Ile Asp Glu Ile Ile Ser Tyr Phe Val Glu Leu ValLys Asn Lys Val Ile Asp Glu Ile Ile Ser Tyr Phe Val Glu Leu Val

290 295 300 290 295 300

Asp Ser Asn Pro Lys Ala Lys His Pro Asn Ile Gly Asp Leu Leu LeuAsp Ser Asn Pro Lys Ala Lys His Pro Asn Ile Gly Asp Leu Leu Leu

305 310 315 320305 310 315 320

Cys Thr Phe Asp Ala Ala Val Leu Leu Asn Thr Thr Thr Glu Gln ValCys Thr Phe Asp Ala Ala Val Leu Leu Asn Thr Thr Thr Glu Gln Val

325 330 335 325 330 335

Tyr Arg Leu His Gln Glu Ala Phe Leu Asn Cys Ala Tyr Ser Gln LysTyr Arg Leu His Gln Glu Ala Phe Leu Asn Cys Ala Tyr Ser Gln Lys

340 345 350 340 345 350

Lys His Glu Gln Leu Arg Ala Asp Ser His Val Phe Tyr Leu Arg GlnLys His Glu Gln Leu Arg Ala Asp Ser His Val Phe Tyr Leu Arg Gln

355 360 365 355 360 365

Val Ile Glu Leu Gln Gln Ala Phe Ala Ala Glu Lys Pro Leu Thr LysVal Ile Glu Leu Gln Gln Ala Phe Ala Ala Glu Lys Pro Leu Thr Lys

370 375 380 370 375 380

Lys Gln Phe Ile Ala Pro TrpLys Gln Phe Ile Ala Pro Trp

385 390385 390

<210> 5<210> 5

<211> 683<211> 683

<212> PRT<212> PRT

<400> 5<400> 5

Met Asn Leu Gln Asp Ala Leu Ala Ile Glu Pro Leu Lys Glu Lys ThrMet Asn Leu Gln Asp Ala Leu Ala Ile Glu Pro Leu Lys Glu Lys Thr

1 5 10 151 5 10 15

Thr Ala Leu Arg Lys Leu Phe Val Pro Tyr Thr Ser His Val Glu ValThr Ala Leu Arg Lys Leu Phe Val Pro Tyr Thr Ser His Val Glu Val

20 25 30 20 25 30

Asp Gly Phe Glu Glu Leu Ala Leu Thr Val Leu Ile Asn Leu Val TyrAsp Gly Phe Glu Glu Leu Ala Leu Thr Val Leu Ile Asn Leu Val Tyr

35 40 45 35 40 45

Lys Arg Ser Glu Ile Asp Asp Leu Thr Ser Ala Arg Thr Ala Lys SerLys Arg Ser Glu Ile Asp Asp Leu Thr Ser Ala Arg Thr Ala Lys Ser

50 55 60 50 55 60

Val Leu Arg Asp Glu Val Leu Leu Ser Lys Cys Ile Asn Glu Val LysVal Leu Arg Asp Glu Val Leu Leu Ser Lys Cys Ile Asn Glu Val Lys

65 70 75 8065 70 75 80

Trp Phe His Thr His Asn Leu Lys Tyr Pro Asp Ile Arg Val Ser HisTrp Phe His Thr His Asn Leu Lys Tyr Pro Asp Ile Arg Val Ser His

85 90 95 85 90 95

Gln Arg Leu Ile Ser Glu Val Val Ser Glu Asp Ile Ala Gly Ile CysGln Arg Leu Ile Ser Glu Val Val Ser Glu Asp Ile Ala Gly Ile Cys

100 105 110 100 105 110

Ser Arg Ser Leu Pro Leu Ser Phe Gly Trp Ser His Asn Ser Ala GluSer Arg Ser Leu Pro Leu Ser Phe Gly Trp Ser His Asn Ser Ala Glu

115 120 125 115 120 125

Ile Asn His Ala Lys Leu Phe Leu Thr Ser Phe Asn Trp Gln Gly GluIle Asn His Ala Lys Leu Phe Leu Thr Ser Phe Asn Trp Gln Gly Glu

130 135 140 130 135 140

Val Thr Cys Leu Ala Arg Leu Leu Ile Asn Glu Glu Pro Val Trp IleVal Thr Cys Leu Ala Arg Leu Leu Ile Asn Glu Glu Pro Val Trp Ile

145 150 155 160145 150 155 160

Asn Leu Ile Arg Ala Tyr Gly Phe Thr Lys Lys Ala Val Leu Glu IleAsn Leu Ile Arg Ala Tyr Gly Phe Thr Lys Lys Ala Val Leu Glu Ile

165 170 175 165 170 175

Ser Gly Lys Ile Lys Gln Gln Leu Pro Val Ala Glu Phe Pro Leu GluSer Gly Lys Ile Lys Gln Gln Leu Pro Val Ala Glu Phe Pro Leu Glu

180 185 190 180 185 190

Val Ser Ser Phe Ser Pro Gln Leu Gln Met Pro Phe Gln Gln Ser TyrVal Ser Ser Phe Ser Pro Gln Leu Gln Met Pro Phe Gln Gln Ser Tyr

195 200 205 195 200 205

Leu Val Val Thr Pro Val Val Ser His Ala Met Leu Ala Lys Ile GlnLeu Val Val Thr Pro Val Val Ser His Ala Met Leu Ala Lys Ile Gln

210 215 220 210 215 220

Gln Leu Thr Thr Asp Arg Lys Leu Asn Phe Ala Leu Val Glu His SerGln Leu Thr Thr Asp Arg Lys Leu Asn Phe Ala Leu Val Glu His Ser

225 230 235 240225 230 235 240

Arg Pro Ala Asn Val Gly Asp Leu Ala Ser Ser Val Gly Gly Asn IleArg Pro Ala Asn Val Gly Asp Leu Ala Ser Ser Val Gly Gly Asn Ile

245 250 255 245 250 255

Arg Val Leu Arg Tyr Phe Pro Lys Thr Tyr Ser Lys Ala Val Asn ArgArg Val Leu Arg Tyr Phe Pro Lys Thr Tyr Ser Lys Ala Val Asn Arg

260 265 270 260 265 270

Ser Lys Val Ala Asn Asn Asp Ile Glu Lys Ala Phe Lys Ile Arg AlaSer Lys Val Ala Asn Asn Asp Ile Glu Lys Ala Phe Lys Ile Arg Ala

275 280 285 275 280 285

Leu Leu Ser Ser Gln Phe Gln Gln Ala Leu Leu Val Leu Val Gly IleLeu Leu Ser Ser Gln Phe Gln Gln Ala Leu Leu Val Leu Val Gly Ile

290 295 300 290 295 300

Lys Gln Phe Asn Thr Leu Arg Gln Lys Arg Leu Ala Arg Val Ala AlaLys Gln Phe Asn Thr Leu Arg Gln Lys Arg Leu Ala Arg Val Ala Ala

305 310 315 320305 310 315 320

Ile Arg Gln Val Arg Val Ser Leu Gln Leu Trp Leu Asp Asn Ile LeuIle Arg Gln Val Arg Val Ser Leu Gln Leu Trp Leu Asp Asn Ile Leu

325 330 335 325 330 335

Glu Ala Lys Asn Asn Ala Gln Asn Gln Val Tyr Pro Glu Trp Val ArgGlu Ala Lys Asn Asn Ala Gln Asn Gln Val Tyr Pro Glu Trp Val Arg

340 345 350 340 345 350

His Tyr Leu Asp Gln Ser Ile Thr Asn Cys Ile Ser Gln Phe Ser AsnHis Tyr Leu Asp Gln Ser Ile Thr Asn Cys Ile Ser Gln Phe Ser Asn

355 360 365 355 360 365

Val Leu Asn Glu Ser Leu Gly Asn Leu Ser Lys Leu Lys Arg Phe AlaVal Leu Asn Glu Ser Leu Gly Asn Leu Ser Lys Leu Lys Arg Phe Ala

370 375 380 370 375 380

Tyr His Pro Asn Leu Met Gly Leu Phe Lys Ala Gln Leu Asn Tyr ValTyr His Pro Asn Leu Met Gly Leu Phe Lys Ala Gln Leu Asn Tyr Val

385 390 395 400385 390 395 400

Phe Thr His Cys Ala Ala Glu Gln Glu Ile Leu Asn Asp Glu Gln IlePhe Thr His Cys Ala Ala Glu Gln Glu Ile Leu Asn Asp Glu Gln Ile

405 410 415 405 410 415

Val Tyr Val His Cys Gln Asp Met Arg Val Phe Asp Ala Glu Ala MetVal Tyr Val His Cys Gln Asp Met Arg Val Phe Asp Ala Glu Ala Met

420 425 430 420 425 430

Ala Asn Pro Tyr Ile Gln Gly Met Pro Ser Leu Thr Ala Leu Asn GlyAla Asn Pro Tyr Ile Gln Gly Met Pro Ser Leu Thr Ala Leu Asn Gly

435 440 445 435 440 445

Leu Ala His Asn Phe Glu Arg Lys Leu Lys Asn Phe Ile Asp Pro SerLeu Ala His Asn Phe Glu Arg Lys Leu Lys Asn Phe Ile Asp Pro Ser

450 455 460 450 455 460

Ile Lys Cys Ile Gly Ser Ala Ile Tyr Ile Glu Asn Tyr Gln Leu HisIle Lys Cys Ile Gly Ser Ala Ile Tyr Ile Glu Asn Tyr Gln Leu His

465 470 475 480465 470 475 480

Thr Gly Lys Pro Leu Pro Glu Pro Ser Lys Leu Lys Gln Val Ala GlyThr Gly Lys Pro Leu Pro Glu Pro Ser Lys Leu Lys Gln Val Ala Gly

485 490 495 485 490 495

Arg Ser His Val Ile Arg Ser Gly Ile Ile Asp Lys Pro Lys Cys AspArg Ser His Val Ile Arg Ser Gly Ile Ile Asp Lys Pro Lys Cys Asp

500 505 510 500 505 510

Ile Thr Leu Asp Leu Val Phe Arg Leu Phe Val Pro Asn Thr Glu LeuIle Thr Leu Asp Leu Val Phe Arg Leu Phe Val Pro Asn Thr Glu Leu

515 520 525 515 520 525

Leu Asp Lys Leu Asn Ser Gln Leu Ile Lys Pro Ala Leu Pro Ser SerLeu Asp Lys Leu Asn Ser Gln Leu Ile Lys Pro Ala Leu Pro Ser Ser

530 535 540 530 535 540

Phe Ala Gly Gly Thr Met His Pro Pro Ser Leu Tyr Gln Asn Ile AspPhe Ala Gly Gly Thr Met His Pro Pro Ser Leu Tyr Gln Asn Ile Asp

545 550 555 560545 550 555 560

Trp Cys His Val His Thr Lys Pro Ser Glu Leu Phe Lys Lys Leu LysTrp Cys His Val His Thr Lys Pro Ser Glu Leu Phe Lys Lys Leu Lys

565 570 575 565 570 575

Ala Lys Ser Ser Asn Gly Ser Trp Leu Tyr Pro Ser Lys Lys Val ValAla Lys Ser Ser Asn Gly Ser Trp Leu Tyr Pro Ser Lys Lys Val Val

580 585 590 580 585 590

Lys Ser Phe Glu Gln Leu Ile Asp Ala Leu Asn Ser Asn Phe Asn LeuLys Ser Phe Glu Gln Leu Ile Asp Ala Leu Asn Ser Asn Phe Asn Leu

595 600 605 595 600 605

Arg Pro Ala Ala Ile Gly Leu Ala Ala Leu Glu Glu Pro Val Lys ArgArg Pro Ala Ala Ile Gly Leu Ala Ala Leu Glu Glu Pro Val Lys Arg

610 615 620 610 615 620

Asp Ala Ala Leu His Glu Tyr His Cys Tyr Ala Glu Pro Val Ile GlyAsp Ala Ala Leu His Glu Tyr His Cys Tyr Ala Glu Pro Val Ile Gly

625 630 635 640625 630 635 640

Leu Leu Glu Cys Val Ser Asn Thr Ser Val Lys Tyr Ala Gly Ala LysLeu Leu Glu Cys Val Ser Asn Thr Ser Val Lys Tyr Ala Gly Ala Lys

645 650 655 645 650 655

Gln Phe Phe His Asp Ala Phe Trp Val Met Asp Val Gln Lys Glu SerGln Phe Phe His Asp Ala Phe Trp Val Met Asp Val Gln Lys Glu Ser

660 665 670 660 665 670

Met Leu Met Lys Lys Ser Lys Phe Glu Tyr GluMet Leu Met Lys Lys Ser Lys Phe Glu Tyr Glu

675 680 675 680

<210> 6<210> 6

<211> 200<211> 200

<212> PRT<212> PRT

<400> 6<400> 6

Leu Lys Arg Tyr Tyr Phe Thr Ile Thr Tyr Leu Pro Gln Ser Cys AspLeu Lys Arg Tyr Tyr Phe Thr Ile Thr Tyr Leu Pro Gln Ser Cys Asp

1 5 10 151 5 10 15

Val Ser Leu Leu Ala Gly Arg Cys Ile Gly Ile Leu His Gly Phe MetVal Ser Leu Leu Ala Gly Arg Cys Ile Gly Ile Leu His Gly Phe Met

20 25 30 20 25 30

Ser Ser Arg Glu Ile Ser Asn Ile Gly Val Cys Phe Pro Lys Trp AsnSer Ser Arg Glu Ile Ser Asn Ile Gly Val Cys Phe Pro Lys Trp Asn

35 40 45 35 40 45

Glu Gln Thr Ile Gly Asn Glu Leu Ala Phe Val Ser Thr Asn Lys LysGlu Gln Thr Ile Gly Asn Glu Leu Ala Phe Val Ser Thr Asn Lys Lys

50 55 60 50 55 60

Gln Leu Thr Asn Leu Ser Gln Gln Ser Tyr Phe Glu Met Met Ala HisGln Leu Thr Asn Leu Ser Gln Gln Ser Tyr Phe Glu Met Met Ala His

65 70 75 8065 70 75 80

Asp Lys Leu Phe Gly Leu Ser Lys Ile Leu Glu Val Pro Val Asn GlnAsp Lys Leu Phe Gly Leu Ser Lys Ile Leu Glu Val Pro Val Asn Gln

85 90 95 85 90 95

Ser Glu Val Met Phe Val Arg Asn Gln Ser Val Ala Lys Ala Phe ValSer Glu Val Met Phe Val Arg Asn Gln Ser Val Ala Lys Ala Phe Val

100 105 110 100 105 110

Gly Glu Lys Gln Arg Arg Leu Lys Arg Ala Lys Lys Arg Ala Glu AlaGly Glu Lys Gln Arg Arg Leu Lys Arg Ala Lys Lys Arg Ala Glu Ala

115 120 125 115 120 125

Arg Gly Glu Val Tyr Asn Pro Glu Tyr Lys Phe Glu Ala Lys Asp IleArg Gly Glu Val Tyr Asn Pro Glu Tyr Lys Phe Glu Ala Lys Asp Ile

130 135 140 130 135 140

Gly His Phe His Ser Ile Pro Val Ser Ser Lys Gly Asn Gly Gln SerGly His Phe His Ser Ile Pro Val Ser Ser Lys Gly Asn Gly Gln Ser

145 150 155 160145 150 155 160

Tyr Val Leu His Ile Gln Lys Asn Glu Asn Ala Glu Ser Ile Lys AsnTyr Val Leu His Ile Gln Lys Asn Glu Asn Ala Glu Ser Ile Lys Asn

165 170 175 165 170 175

Gln Phe Asn Asn Tyr Gly Phe Ala Thr Asn Gln Ile Phe Leu Gly ThrGln Phe Asn Asn Tyr Gly Phe Ala Thr Asn Gln Ile Phe Leu Gly Thr

180 185 190 180 185 190

Val Pro Ser Leu Asn Thr Leu LeuVal Pro Ser Leu Asn Thr Leu Leu

195 200 195 200

<210> 7<210> 7

<211> 342<211> 342

<212> PRT<212> PRT

<400> 7<400> 7

Met Gln Leu Pro Arg His Leu Ser Tyr Thr Arg Ser Leu Ser Pro SerMet Gln Leu Pro Arg His Leu Ser Tyr Thr Arg Ser Leu Ser Pro Ser

1 5 10 151 5 10 15

Lys Ala Val Phe Phe Tyr Lys Thr Pro Glu Ser Asp Phe Glu Pro LeuLys Ala Val Phe Phe Tyr Lys Thr Pro Glu Ser Asp Phe Glu Pro Leu

20 25 30 20 25 30

Gln Ile Glu Gln Asn Lys Leu Val Gly Gln Lys Ser Gly Phe Gly AspGln Ile Glu Gln Asn Lys Leu Val Gly Gln Lys Ser Gly Phe Gly Asp

35 40 45 35 40 45

Ala Tyr Gln Lys Gln Asn Val Ala Lys Asn Leu Ala Pro Gln Asp LeuAla Tyr Gln Lys Gln Asn Val Ala Lys Asn Leu Ala Pro Gln Asp Leu

50 55 60 50 55 60

Ala Phe Gly Asn Pro Gln Thr Ile Asp Val Cys Tyr Val Pro Pro ThrAla Phe Gly Asn Pro Gln Thr Ile Asp Val Cys Tyr Val Pro Pro Thr

65 70 75 8065 70 75 80

Val Asn Glu Leu Phe Cys Arg Phe Ser Leu Arg Val Glu Ala Asn CysVal Asn Glu Leu Phe Cys Arg Phe Ser Leu Arg Val Glu Ala Asn Cys

85 90 95 85 90 95

Ile Glu Pro His Val Cys Asp Asp Pro Lys Val Ile Tyr Trp Leu LysIle Glu Pro His Val Cys Asp Asp Pro Lys Val Ile Tyr Trp Leu Lys

100 105 110 100 105 110

Arg Phe Phe Glu Thr Tyr Lys Lys His Asn Gly Leu Asn Glu Val AlaArg Phe Phe Glu Thr Tyr Lys Lys His Asn Gly Leu Asn Glu Val Ala

115 120 125 115 120 125

Thr Arg Tyr Ala Lys Asn Ile Leu Met Gly Asn Trp Leu Trp Arg AsnThr Arg Tyr Ala Lys Asn Ile Leu Met Gly Asn Trp Leu Trp Arg Asn

130 135 140 130 135 140

Arg Gln Ser Pro Asn Val Asp Ile Glu Ile Leu Thr Glu His Ala AlaArg Gln Ser Pro Asn Val Asp Ile Glu Ile Leu Thr Glu His Ala Ala

145 150 155 160145 150 155 160

Pro Ile Val Val Glu Gly Ala Gln Lys Leu Lys Trp Gln Gly Asn TrpPro Ile Val Val Glu Gly Ala Gln Lys Leu Lys Trp Gln Gly Asn Trp

165 170 175 165 170 175

Gln Asn Asn Gln Thr Ala Leu Leu Thr Leu Ser Glu Ser Ile Gln GluGln Asn Asn Gln Thr Ala Leu Leu Thr Leu Ser Glu Ser Ile Gln Glu

180 185 190 180 185 190

Gly Leu Ser Asn Pro Gln Asn Tyr Cys Tyr Leu Asp Ile Thr Ala LysGly Leu Ser Asn Pro Gln Asn Tyr Cys Tyr Leu Asp Ile Thr Ala Lys

195 200 205 195 200 205

Ile Lys Asn Ala Phe Ser Gln Glu Val His Pro Ser Gln Lys Phe ValIle Lys Asn Ala Phe Ser Gln Glu Val His Pro Ser Gln Lys Phe Val

210 215 220 210 215 220

Asp Asn Val Glu Gln Gly Met Ser Ser Lys Gln Leu Ala Tyr Thr GlnAsp Asn Val Glu Gln Gly Met Ser Ser Lys Gln Leu Ala Tyr Thr Gln

225 230 235 240225 230 235 240

Val Gly Asp Lys Lys Ala Ala Ser Leu Asn Ser Gln Lys Val Gly AlaVal Gly Asp Lys Lys Ala Ala Ser Leu Asn Ser Gln Lys Val Gly Ala

245 250 255 245 250 255

Ala Ile Gln Thr Ile Asp Asp Trp Tyr Glu Glu Gly Tyr Lys Pro LeuAla Ile Gln Thr Ile Asp Asp Trp Tyr Glu Glu Gly Tyr Lys Pro Leu

260 265 270 260 265 270

Arg Thr His Glu Tyr Gly Ala Asp Lys Gln Ile Leu Val Ala His ArgArg Thr His Glu Tyr Gly Ala Asp Lys Gln Ile Leu Val Ala His Arg

275 280 285 275 280 285

Thr Pro Lys Ser His Ser Asp Phe Tyr Ser Leu Leu Pro Arg Ile AlaThr Pro Lys Ser His Ser Asp Phe Tyr Ser Leu Leu Pro Arg Ile Ala

290 295 300 290 295 300

Leu His Ile Lys His Met Glu Lys His Gly Leu Glu Gln Ser Glu GlnLeu His Ile Lys His Met Glu Lys His Gly Leu Glu Gln Ser Glu Gln

305 310 315 320305 310 315 320

Ser Asn Ser Ile His Phe Ile Ala Ala Val Leu Ile Lys Gly Gly LeuSer Asn Ser Ile His Phe Ile Ala Ala Val Leu Ile Lys Gly Gly Leu

325 330 335 325 330 335

Phe Gln Arg Ser Lys GlyPhe Gln Arg Ser Lys Gly

340 340

<210> 8<210> 8

<211> 630<211> 630

<212> DNA<212>DNA

<400> 8<400> 8

atgtacagaa gaaaactaaa atactcccgt gtaaaaaatc ttcataaatt tgctagtcaa 60atgtacagaa gaaaactaaa atactcccgt gtaaaaaatc ttcataaatt tgctagtcaa 60

aaaaataaat ctacttgttt agtcgaatcc tctttagagt ttgatgcgtg tttccatttt 120aaaaataaat ctacttgttt agtcgaatcc tctttagagt ttgatgcgtg tttccatttt 120

gaattttcac caccaatagc cgcatttgaa gcacaacctc taggttacga atatgagttc 180gaattttcac caccaatagc cgcatttgaa gcacaacctc taggttacga atatgagttc 180

gataaccgta tttgccgtta cacacctgac tttttactta cccacacaga cggcacgcaa 240gataaccgta tttgccgtta cacacctgac tttttactta cccacacaga cggcacgcaa 240

aaatttatag aagtaaaacc gcaaagcaaa attgctgacg aagactttcg tgcacgtttt 300aaatttatag aagtaaaacc gcaaagcaaa attgctgacg aagactttcg tgcacgtttt 300

attgaaaagc aagccatagc taagcaagat ggacgcgact taatactggt tactgataaa 360attgaaaagc aagccatagc taagcaagat ggacgcgact taatactggt tactgataaa 360

caaatccgtg tatacccaac actcaataac ttaaagcttt tgcatcgcta ctctggtttt 420caaatccgtg tatacccaac actcaataac ttaaagcttt tgcatcgcta ctctggtttt 420

cagtctttaa cagaattgca agcatcggta ctagaacttg ttaagcagta cggctctatc 480cagtctttaa cagaattgca agcatcggta ctagaacttg ttaagcagta cggctctatc 480

aaagtgggcc agttaatcag atatttaaaa gtaactgccg gtgagctact tgctacggtg 540aaagtgggcc agttaatcag atatttaaaa gtaactgccg gtgagctact tgctacggtg 540

cttcgcttac tatcactagg gcagttattt gccgacttaa ctacaaatga aatatcaata 600cttcgcttac tatcactagg gcagttattt gccgacttaa ctacaaatga aatatcaata 600

gaaacagcaa tttggtctaa caatgtttaa 630gaaacagcaa tttggtctaa caatgtttaa 630

<210> 9<210> 9

<211> 1824<211> 1824

<212> DNA<212>DNA

<400> 9<400> 9

atgtttaata acgatttgtt tgatgatgag tttaaccagc cattaccaaa agctgaaacc 60atgtttaata acgatttgtt tgatgatgag tttaaccagc cattaccaaa agctgaaacc 60

aaactacctc aaaattacac taaagactta caagcccttc ctgaaaaaat aaaaacaaca 120aaactacctc aaaattacac taaagactta caagcccttc ctgaaaaaat aaaaacaaca 120

acatttgcta agcttaaata tattcaatgg cttgaggcta atattcaagg tggttggaca 180acatttgcta agcttaaata tattcaatgg cttgaggcta atattcaagg tggttggaca 180

caaaaaaatc ttgaaccttt attaaaatta atgcctgatg ttgagggtga aaaaaagcca 240caaaaaaatc ttgaaccttt attaaaatta atgcctgatg ttgagggtga aaaaaagcca 240

agttggagaa cagccgcacg atggtatagc gcttacacca atgcggataa aaatattatg 300agttggagaa cagccgcacg atggtatagc gcttacacca atgcggataa aaatattatg 300

gcgctaatac caagccacca aaaaaagggt aatagggagc gcgatacaac cactgataag 360gcgctaatac caagccacca aaaaaagggt aataggggagc gcgatacaac cactgataag 360

ttttttgaaa aagcacttga gcgttactta gtaaaagaaa aaccatcagt ggcttcggct 420ttttttgaaa aagcacttga gcgttactta gtaaaagaaa aaccatcagt ggcttcggct 420

tacaagttct ataaagactt agttattatc gaaaacgaca gtgttgttga cagtgtttta 480tacaagttct ataaagactt agttattatc gaaaacgaca gtgttgttga cagtgtttta 480

aagcctttaa catacaaagc gtttaaaaac agaatagata acttaccgca atacgaagta 540aagcctttaa catacaaagc gtttaaaaac agaaagata acttaccgca atacgaagta 540

atgattgctc gttatggtaa gcgccttgct gatattgctt ataataaggt tgaagggcat 600atgattgctc gttatggtaa gcgccttgct gatattgctt ataataaggt tgaagggcat 600

aaacggccta tccgagtact tgaaaaagtt gaaattgacc atacgccact tgatcttatt 660aaacggccta tccgagtact tgaaaaagtt gaaattgacc atacgccact tgatcttatt 660

ttattagatg atgagctaca tattccacta ggtaggccta cactcaccat gttggtagat 720ttattagatg atgagctaca tattccacta ggtaggccta cactcaccat gttggtagat 720

gtgtatagcc attgtattgt tggctattac tttagcttca gtgagcctag ctatgatgca 780gtgtatagcc attgtattgttggctattac tttagcttca gtgagcctag ctatgatgca 780

gtaaggcgag caatgctaaa tgcgatgaaa cctaaaagtg aagtggcaaa actataccct 840gtaaggcgag caatgctaaa tgcgatgaaa cctaaaagtg aagtggcaaa actataccct 840

gatacgatta atgagtggaa gtgtgctggc aaaattgaaa cactcgttgt tgataatggc 900gatacgatta atgagtggaa gtgtgctggc aaaattgaaa cactcgttgt tgataatggc 900

gctgaatttt ggagcaacag ccttgaactt gcttgtgaag aaataggcat taatactcaa 960gctgaatttt ggagcaacag ccttgaactt gcttgtgaag aaataggcat taatactcaa 960

tataacccag tcgcaaagcc ttggttaaaa ccatttgtag aacgtatgtt tggaacaata 1020tataacccag tcgcaaagcc ttggttaaaa ccatttgtag aacgtatgtt tggaacaata 1020

aatactgagt tattagatcc tgttcccggt aaaacctttt ctaacatttt acaaaagcat 1080aatactgagt tattagatcc tgttcccggt aaaacctttt ctaacatttt acaaaagcat 1080

gaatacaatc caaaaaaaga tgcaatcatg cgctttacga cctttatgca gttatttcat 1140gaatacaatc caaaaaaaga tgcaatcatg cgctttacga cctttatgca gttatttcat 1140

aaatgggtag tagacgttta tcatcaagat gccgacagtc gctttaagta cataccgagt 1200aaatgggtag tagacgttta tcatcaagat gccgacagtc gctttaagta cataccgagt 1200

caactgtggg atcaaggttt taatacgtta ccaccaacaa tgctaagtga tgctgatctt 1260caactgtggg atcaaggttt taatacgtta ccaccaacaa tgctaagtga tgctgatctt 1260

caacaactag atgttgtgct cagtatttca aatcatcggg tacttcgtaa aggtgggata 1320caacaactag atgttgtgct cagtatttca aatcatcggg tacttcgtaa aggtgggata 1320

cggctagaaa acttaagcta cgacagtact gaactggcca attatagaaa gcaatttagc 1380cggctagaaa acttaagcta cgacagtact gaactggcca attatagaaa gcaatttagc 1380

cataaagtat ctcaagaagt tttaattaaa ttaaatcccg atgatatttc ttatatatat 1440cataaagtat ctcaagaagt ttaattaaa ttaaatcccg atgatatttc ttatatat 1440

gtttaccttg ataagctaga gcattacata aaagtgccat gcatagatcc aaacggttac 1500gtttaccttg ataagctaga gcattacata aaagtgccat gcatagatcc aaacggttac 1500

acccaaaatt taagtttgaa tcagcataaa ataaatatac gcatccaccg cgactttatt 1560acccaaaatt taagtttgaa tcagcataaa ataaatatac gcatccaccg cgactttatt 1560

tcgggctcta tcgataatgt aggcttagca aaagcgcgca tgtttattca taacaaaatt 1620tcggggctcta tcgataatgt aggcttagca aaagcgcgca tgtttatca taacaaaatt 1620

caaaacgagt ttgaagagtt aaaaaatgcg ccaaaacact caaaagtaaa gggtggtaaa 1680caaaacgagt ttgaagagtt aaaaaatgcg ccaaaacact caaaagtaaa gggtggtaaa 1680

gcgttagcta aacatcaaaa tatcagtagt gactcacaaa agtcaataac gcatagcaaa 1740gcgttagcta aacatcaaaa tatcagtagt gactcacaaa agtcaataac gcatagcaaa 1740

cccgtagagg ccaaaaaggt tacacctaaa gagcaaccaa ctgatagctg ggatgatttt 1800cccgtagagg ccaaaaaggt tacacctaaa gagcaaccaa ctgatagctg ggatgatttt 1800

atctcagact tagatggatt ttaa 1824atctcagact tagatggatt ttaa 1824

<210> 10<210> 10

<211> 1002<211> 1002

<212> DNA<212>DNA

<400> 10<400> 10

atgctgaccg ataaacaaaa agaaaagctg aatgaatttc gtgatgtatt tattgaatac 60atgctgaccg ataaacaaaa agaaaagctg aatgaatttc gtgatgtatt tattgaatac 60

ccaataataa ccaccatatt taacgacttc gatagattaa gacttggtaa agggctaaca 120ccaataataa ccaccatatt taacgacttc gatagattaa gacttggtaa agggctaaca 120

ggtgaaaagc cttgcatgct cttaaatggc gatacaggca caggtaaaac agcactgatc 180ggtgaaaagc cttgcatgct cttaaatggc gatacaggca caggtaaaac agcactgatc 180

aagcaatata aagaacgaca tttaccgcaa tttattaatg gtgttatgaa ccaccctgta 240aagcaatata aagaacgaca tttaccgcaa tttattaatg gtgttatgaa ccaccctgta 240

ttggtaagcc gcatacctag taacccgaca ttagaatcta ctttagcaga gcttcttaaa 300ttggtaagcc gcatacctag taacccgaca ttagaatcta ctttagcaga gcttcttaaa 300

gatttagggc aagtaggcag cacagagcgt aagctacgaa taaacggcac tcgcttaacg 360gatttagggc aagtaggcag cacagagcgt aagctacgaa taaacggcac tcgcttaacg 360

acatcattaa taaaatgcct aaaaacatgt ggcacagagc ttataattat tgatgagttc 420acatcattaa taaaatgcct aaaaacatgt ggcacagagc ttataattat tgatgagttc 420

caagagctaa ttgagcacaa ccaaggtaaa aagcgccgcg agattgctaa tcgattaaaa 480caagagctaa ttgagcacaa ccaaggtaaa aagcgccgcg agattgctaa tcgattaaaa 480

tatattaacg acgaagcggg tgtatcaatt gtattggtag gtatgccgtg ggcagaaaaa 540tatattaacg acgaagcggg tgtatcaatt gtattggtag gtatgccgtg ggcagaaaaa 540

atagcagacg agccccagtg gtcatctcgt ttattaataa ggcggcagtt gccttatttt 600atagcagacg agccccagtg gtcatctcgt ttattaataa ggcggcagtt gccttatttt 600

aagttgtcag aaaacccaaa gcattttgta caactaataa ttggtctagc caaccgtatg 660aagttgtcag aaaacccaaa gcattttgta caactaataa ttggtctagc caaccgtatg 660

ccatttgccg aaaagccaaa cttaagtgag caagcaacag tgtttacttt gttctcatta 720ccatttgccg aaaagccaaa cttaagtgag caagcaacag tgtttacttt gttctcatta 720

tcaaaaggtt gctttagaac attaaaatac tttttagatg atgccgtact ttatgcatta 780tcaaaaggtt gctttagaac attaaaatac tttttagatg atgccgtact ttatgcatta 780

atggacaacg cgaaaactct cacaaccaag catttagtta aagcatttga ggtactcttt 840atggacaacg cgaaaactct cacaaccaag catttagtta aagcatttga ggtactcttt 840

ccggatgttc ctaatttatt taccttgcct gtagcagaaa taacagcaag cgaagtcgag 900ccggatgttc ctaatttatt taccttgcct gtagcagaaa taacagcaag cgaagtcgag 900

cgctattcac tttataagcc tgaaagctct caagatgaag acccgtttat agcgaccaag 960cgctattcac tttataagcc tgaaagctct caagatgaag acccgtttat agcgaccaag 960

tttactgacc ggatgccgat tagtcagttg ttaaggaaat aa 1002tttactgacc ggatgccgat tagtcagttg ttaaggaaat aa 1002

<210> 11<210> 11

<211> 1176<211> 1176

<212> DNA<212>DNA

<400> 11<400> 11

atgcattttt tagttcaaac aaaatcttac ccagatgagg cgcttgaaag ctatttgctg 60atgcattttt tagttcaaac aaaatcttac ccagatgagg cgcttgaaag ctatttgctg 60

aggcttgcaa gggataactc atacaatggc tatagtgagc ttgctgatat tttgtggcaa 120aggcttgcaa gggataactc atacaatggc tatagtgagc ttgctgatat tttgtggcaa 120

tggcttgcag agcaagataa tgagcttgaa ggtgcgctgc cgttagcgct gagtaaagtt 180tggcttgcag agcaagataa tgagcttgaa ggtgcgctgc cgttagcgct gagtaaagtt 180

gatgtttatc atgctaggca agcgagcagc tttagaataa gagcgcttaa gttggttgct 240gatgtttatc atgctaggca agcgagcagc tttagaataa gagcgcttaa gttggttgct 240

caattagcag atgtaaacgc tggtgacatt cttgcacttg cttggaggcg cagtaatttt 300caattagcag atgtaaacgc tggtgacatt cttgcacttg cttggaggcg cagtaatttt 300

aaatttggca accttgccgc agtaagtcga aatgaactgg ctattcccct tgagctactt 360aaatttggca accttgccgc agtaagtcga aatgaactgg ctattcccct tgagctactt 360

cgtactgata acatacctgt ttgcattaaa tgcttgtctg aatcttccca tattcccttt 420cgtactgata acatacctgt ttgcattaaa tgcttgtctg aatcttccca tattcccttt 420

tattggcatt taaagcccta taaggcgtgt cataagcata agtcacaatt aattacacgt 480tattggcatt taaagcccta taaggcgtgt cataagcata agtcacaatt aattacacgt 480

tgtaaggagt gctatgactt aattgattac agagcctctg aggcgttttt agagtgtgtt 540tgtaaggagt gctatgactt aattgattac agagcctctg aggcgttttt agagtgtgtt 540

tgcggttgta aaataaccaa tagtgaacag ttaaacgatg cagactttaa aattgcaatt 600tgcggttgta aaataaccaa tagtgaacag ttaaacgatg cagactttaa aattgcaatt 600

gcgcttgcaa gtagtaacag ccaaaaaata gtagggttga tatcgtggtt tgcgaaggtt 660gcgcttgcaa gtagtaacag ccaaaaaata gtaggggttga tatcgtggtt tgcgaaggtt 660

aagcaacttg atgtaagtga tgcagacttt aactgcgctt ttgttgatta ctttaatact 720aagcaacttg atgtaagtga tgcagacttt aactgcgctt ttgttgatta ctttaatact 720

tggcctgaaa gccttaccac tgaattagat ttactcacaa ataatgcgcg actcaagcaa 780tggcctgaaa gccttaccac tgaattagat ttactcacaa ataatgcgcg actcaagcaa 780

cttaaccctt ttaataaaac taagttcagc tctgtttatg gcgatttaat ccgtgatggt 840cttaaccctt ttaataaaac taagttcagc tctgtttatg gcgatttaat ccgtgatggt 840

caaatagctg caacaagtaa ccggaaaaac aaagtaattg atgagattat tagttatttt 900caaatagctg caacaagtaa ccggaaaaac aaagtaattg atgagattat tagttatttt 900

gtcgaattag ttgatagtaa ccctaaagct aaacatccaa atattggtga cttactgctt 960gtcgaattag ttgatagtaa ccctaaagct aaacatccaa atattggtga cttactgctt 960

tgtacttttg atgccgcagt attgttaaac actactacag agcaagttta caggcttcat 1020tgtacttttg atgccgcagt attgttaaac actactacag agcaagttta caggcttcat 1020

caagaagcct ttttaaactg tgcttattca caaaaaaagc acgaacagct cagagctgat 1080caagaagcct ttttaaactg tgcttattca caaaaaaagc acgaacagct cagagctgat 1080

agccatgtat tttatttacg ccaagtgatt gaactacaac aagcattcgc agctgaaaag 1140agccatgtat tttattacg ccaagtgatt gaactacaac aagcattcgc agctgaaaag 1140

cctctaacaa aaaaacaatt tatagcgccg tggtaa 1176cctctaacaa aaaaacaatt tatagcgccg tggtaa 1176

<210> 12<210> 12

<211> 2052<211> 2052

<212> DNA<212>DNA

<400> 12<400> 12

atgaacttac aagatgcact tgcaattgaa ccactaaaag aaaaaaccac agcacttaga 60atgaacttac aagatgcact tgcaattgaa ccactaaaag aaaaaaccac agcacttaga 60

aaattgttcg ttccatacac gtctcatgtc gaggtagatg gctttgaaga actagcgctg 120aaattgttcg ttccatacac gtctcatgtc gaggtagatg gctttgaaga actagcgctg 120

actgtgctca ttaatcttgt ttataagcga agtgagattg atgatttaac aagtgcaaga 180actgtgctca ttaatcttgt ttataagcga agtgagattg atgatttaac aagtgcaaga 180

actgctaaaa gtgtactacg cgatgaagtg ttactgagta agtgcattaa cgaagtgaaa 240actgctaaaa gtgtactacg cgatgaagtg ttactgagta agtgcattaa cgaagtgaaa 240

tggtttcata ctcataattt aaaatacccc gatatacgag taagccatca acgtttaatt 300tggtttcata ctcataattt aaaatacccc gatatacgag taagccatca acgtttaatt 300

agtgaagttg taagtgaaga tattgcgggc atttgcagcc ggtcattacc tttaagtttt 360agtgaagttg taagtgaaga tattgcgggc atttgcagcc ggtcattacc tttaagtttt 360

ggctggtcgc acaacagtgc tgaaattaat catgcaaagc tatttttaac ctcgtttaat 420ggctggtcgc acaacagtgc tgaaattaat catgcaaagc tatttttaac ctcgtttaat 420

tggcaaggtg aagtgacttg tttagcaagg ctgttaatta atgaagagcc tgtttggatt 480tggcaaggtg aagtgacttg tttagcaagg ctgttaatta atgaagagcc tgtttggatt 480

aatttaataa gagcatacgg gtttactaaa aaggcggttt tagaaatctc gggtaaaata 540aatttaataa gagcatacgg gtttactaaa aaggcggttt tagaaatctc gggtaaaata 540

aaacagcagt tgccagtggc agagttccca ttagaagtaa gctctttttc accacaatta 600aaacagcagt tgccagtggc agagttccca ttagaagtaa gctctttttc accacaatta 600

caaatgccat ttcagcaaag ctaccttgtg gttacgcctg tagtaagcca cgcaatgctg 660caaatgccat ttcagcaaag ctaccttgtg gttacgcctg tagtaagcca cgcaatgctg 660

gctaaaattc agcaattaac aacagatcgt aagttaaatt ttgctttagt tgagcactca 720gctaaaattc agcaattaac aacagatcgt aagttaaatt ttgctttagt tgagcactca 720

agacctgcca atgttggcga tttagcaagc tcagtaggcg gcaatataag agtgctgcgt 780agacctgcca atgttggcga tttagcaagc tcagtaggcg gcaatataag agtgctgcgt 780

tactttccta aaacatattc aaaggctgtt aaccgctcta aagtagccaa taatgatatt 840tactttccta aaacatattc aaaggctgtt aaccgctcta aagtagccaa taatgatatt 840

gagaaagcat ttaaaattcg tgcgctatta agtagtcaat ttcaacaggc gcttttggtg 900gagaaagcat ttaaaattcg tgcgctatta agtagtcaat ttcaacaggc gcttttggtg 900

ttggtaggca ttaaacagtt taatacgtta aggcaaaaac gattagcgcg agtagcggct 960ttggtaggca ttaaacagtt taatacgtta aggcaaaaac gattagcgcg agtagcggct 960

attaggcaag tacgtgttag cttgcagtta tggcttgata atattcttga agctaaaaat 1020attaggcaag tacgtgttag cttgcagtta tggcttgata atattcttga agctaaaaat 1020

aacgcgcaaa accaagttta ccctgagtgg gtaaggcatt acttagatca gagtattact 1080aacgcgcaaa accaagttta ccctgagtgg gtaaggcatt acttagatca gagtattact 1080

aactgtatta gccaatttag taacgtacta aatgagagcc ttggtaattt aagtaagctc 1140aactgtatta gccaatttag taacgtacta aatgagagcc ttggtaattt aagtaagctc 1140

aaacgctttg cgtatcaccc taatttaatg ggactgttta aagcgcagtt aaactatgta 1200aaacgctttg cgtatcaccc taatttaatg ggactgttta aagcgcagtt aaactatgta 1200

tttactcact gtgcagctga acaagaaata ttaaatgatg agcagatagt gtatgtacat 1260tttactcact gtgcagctga acaagaaata ttaaatgatg agcagatagt gtatgtacat 1260

tgccaagata tgcgagtgtt tgatgctgag gcaatggcta atccgtatat tcaaggcatg 1320tgccaagata tgcgagtgtt tgatgctgag gcaatggcta atccgtatat tcaaggcatg 1320

ccgtcactta ctgctttaaa tgggcttgct cataactttg agcgtaagct aaaaaacttt 1380ccgtcactta ctgctttaaa tgggcttgct cataactttg agcgtaagct aaaaaacttt 1380

atagaccctt caattaagtg tattggcagt gctatttaca ttgaaaacta tcaattacat 1440atagaccctt caattaagtg tattggcagt gctatttaca ttgaaaacta tcaattacat 1440

acaggtaaac cattacctga gccaagcaag ttaaaacaag ttgcagggcg tagtcatgta 1500acaggtaaac cattacctga gccaagcaag ttaaaacaag ttgcagggcg tagtcatgta 1500

ataagatctg gaattatcga taaaccaaaa tgtgacataa cactcgattt agtatttaga 1560ataagatctg gaattatcga taaaccaaaa tgtgacataa cactcgattt agtatttaga 1560

ctttttgtac caaatactga gctgttagat aagttaaata gtcagcttat aaagcccgca 1620ctttttgtac caaatactga gctgttagat aagttaaata gtcagcttat aaagcccgca 1620

ctaccgtctt catttgcagg cgggactatg catccacctt cgttatatca aaatattgac 1680ctaccgtctt catttgcagg cgggactatg catccacctt cgttattca aaatattgac 1680

tggtgccatg tacataccaa accgagcgag ctgtttaaaa aacttaaagc aaaatcgtca 1740tggtgccatg tacataccaa accgagcgag ctgtttaaaa aacttaaagc aaaatcgtca 1740

aatggcagtt ggttatatcc ttcaaaaaaa gtagttaaaa gttttgaaca attaattgat 1800aatggcagtt ggttatatcc ttcaaaaaaa gtagttaaaa gttttgaaca attaattgat 1800

gcccttaaca gtaactttaa tttaagaccc gctgcaattg gcttggctgc gcttgaagaa 1860gcccttaaca gtaactttaa tttaagaccc gctgcaattg gcttggctgc gcttgaagaa 1860

cccgtaaagc gagatgcagc attacatgaa taccattgtt atgcagagcc cgtaattggg 1920cccgtaaagc gagatgcagc attacatgaa taccattgtt atgcagagcc cgtaattggg 1920

ctgttagagt gtgttagcaa tacatcagta aagtacgcag gggctaagca gttctttcat 1980ctgttagagt gtgttagcaa tacatcagta aagtacgcag gggctaagca gttctttcat 1980

gacgcatttt gggttatgga tgttcaaaaa gagtctatgc ttatgaaaaa gtctaagttt 2040gacgcatttt gggttatgga tgttcaaaaa gagtctatgc ttatgaaaaa gtctaagttt 2040

gagtatgaat aa 2052gagtatgaat aa 2052

<210> 13<210> 13

<211> 603<211> 603

<212> DNA<212>DNA

<400> 13<400> 13

ttgaagcgct attattttac cattacttat ttaccccaaa gttgtgatgt aagccttctt 60ttgaagcgct attattttac cattacttat ttaccccaaa gttgtgatgt aagccttctt 60

gctgggcgtt gtatcggtat tttgcatggg tttatgagct cacgtgaaat aagtaatatt 120gctgggcgtt gtatcggtat tttgcatggg tttatgagct cacgtgaaat aagtaatatt 120

ggtgtgtgct ttcctaaatg gaatgagcaa acaataggta atgaattagc gtttgtatca 180ggtgtgtgct ttcctaaatg gaatgagcaa acaataggta atgaattagc gtttgtatca 180

acaaataaaa agcaattaac caatctatct cagcaaagct attttgagat gatggctcat 240acaaataaaa agcaattaac caatctatct cagcaaagct attttgagat gatggctcat 240

gacaagttat ttggcttatc aaaaatactt gaagtaccag taaaccaaag cgaagtcatg 300gacaagttat ttggcttatc aaaaatactt gaagtaccag taaaccaaag cgaagtcatg 300

tttgttcgca accaatcggt agcaaaagca tttgttggcg aaaagcaaag gcgattaaag 360tttgttcgca accaatcggt agcaaaagca tttgttggcg aaaagcaaag gcgattaaag 360

cgagctaaaa aacgagctga agccagaggc gaagtttaca accctgaata taaatttgag 420cgagctaaaa aacgagctga agccagaggc gaagtttaca accctgaata taaatttgag 420

gcaaaggaca taggccattt tcattcaata cccgtatcaa gcaaaggcaa tggtcaaagt 480gcaaaggaca taggccattt tcattcaata cccgtatcaa gcaaaggcaa tggtcaaagt 480

tatgttttgc atatacaaaa aaatgaaaat gctgaatcca taaaaaatca gtttaacaat 540tatgttttgc atatacaaaa aaatgaaaat gctgaatcca taaaaaatca gtttaacaat 540

tatggctttg ctacaaatca aatatttcta ggtacggttc cttctttaaa taccctttta 600tatggctttg ctacaaatca aatatttcta ggtacggttc cttctttaaa taccctttta 600

taa 603taa 603

<210> 14<210> 14

<211> 1029<211> 1029

<212> DNA<212>DNA

<400> 14<400> 14

atgcaattac ctcggcactt aagttacacg cgttcgctct cacccagtaa agcggtgttt 60atgcaattac ctcggcactt aagttacacg cgttcgctct cacccagtaa agcggtgttt 60

ttttataaaa caccagagtc tgactttgaa ccgctacaaa tagagcaaaa taaattagtt 120ttttataaaa caccagagtc tgactttgaa ccgctacaaa tagagcaaaa taaattagtt 120

gggcagaagt cagggtttgg cgatgcgtat caaaagcaaa atgtggctaa aaatttagcg 180gggcagaagt cagggtttgg cgatgcgtat caaaagcaaa atgtggctaa aaatttagcg 180

ccacaagatc tcgcgtttgg aaaccctcaa acaattgatg tgtgttatgt acctccaacg 240ccacaagatc tcgcgtttgg aaaccctcaa acaattgatg tgtgttatgt acctccaacg 240

gtaaatgagc tattttgtcg tttttcactc agggttgagg ctaattgtat tgagccacat 300gtaaatgagc tattttgtcg tttttcactc agggttgagg ctaattgtat tgagccacat 300

gtatgtgatg accctaaagt tatttattgg ttaaaacggt ttttcgaaac ctataaaaaa 360gtatgtgatg accctaaagt tattattgg ttaaaacggt ttttcgaaac ctataaaaaa 360

cacaatggcc ttaatgaagt tgcaacgcgc tatgctaaaa atatactgat gggcaactgg 420cacaatggcc ttaatgaagt tgcaacgcgc tatgctaaaa atatactgat gggcaactgg 420

ctttggcgta accgccaatc accaaatgtt gatattgaaa tccttactga gcacgcagcc 480ctttggcgta accgccaatc accaaatgtt gatattgaaa tccttactga gcacgcagcc 480

ccgattgttg ttgaaggtgc acaaaaacta aaatggcaag gcaactggca aaataatcaa 540ccgattgttg ttgaaggtgc acaaaaacta aaatggcaag gcaactggca aaataatcaa 540

acggcattat taacgttgtc agaatctatt caagaagggc taagcaatcc tcaaaattat 600acggcattta taacgttgtc agaatctatt caagaagggc taagcaatcc tcaaaattat 600

tgttatttag atataaccgc aaaaattaaa aatgcattta gccaagaggt tcatcctagt 660tgttatttag atataaccgc aaaaattaaa aatgcattta gccaagaggt tcatcctagt 660

caaaagtttg tagataatgt tgaacaaggt atgtcatcta aacaacttgc atatactcaa 720caaaagtttg tagataatgt tgaacaaggt atgtcatcta aacaacttgc atatactcaa 720

gtaggcgata aaaaagcagc aagtttgaat tcacaaaaag taggggctgc tatccaaact 780gtaggcgata aaaaagcagc aagtttgaat tcacaaaaag tagggggctgc tatccaaact 780

attgatgatt ggtatgagga aggttacaaa cctttacgca ctcacgagta tggcgcagat 840attgatgatt ggtatgagga aggttacaaa cctttacgca ctcacgagta tggcgcagat 840

aagcaaatat tagttgcaca cagaacacct aagagccatt cagactttta ttcattactc 900aagcaaatat tagttgcaca cagaacacct aagagccatt cagactttta ttcattactc 900

ccgcgcattg ctttgcatat taaacacatg gaaaagcatg gtttagagca aagtgaacaa 960ccgcgcattg ctttgcatat taaacacatg gaaaagcatg gtttagagca aagtgaacaa 960

tcaaactcaa ttcactttat tgcggcagtg ctgatcaaag gtggcttgtt tcaaaggagt 1020tcaaactcaa ttcactttat tgcggcagtg ctgatcaaag gtggcttgtt tcaaaggagt 1020

aaaggttga 1029aaaggttga 1029

<210> 15<210> 15

<211> 88<211> 88

<212> DNA<212>DNA

<220><220>

<221> misc_feature<221> misc_feature

<222> (29)..(60)<222> (29)..(60)

<223> n is a, c, g, or t<223> n is a, c, g, or t

<400> 15<400> 15

gtgaactgcc gagtaggcag ctggaaatnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 60gtgaactgcc gagtaggcag ctggaaatnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 60

gtgaactgcc gagtaggcag ctgaagtt 88gtgaactgcc gagtaggcag ctgaagtt 88

<210> 16<210> 16

<211> 88<211> 88

<212> DNA<212>DNA

<220><220>

<221> misc_feature<221> misc_feature

<222> (29)..(60)<222> (29)..(60)

<223> n is a, c, g, or t<223> n is a, c, g, or t

<400> 16<400> 16

gtgaactgcc gagtaggcag ctggaaat 88gtgaactgcc gagtaggcag ctggaaat 88

<210> 17<210> 17

<211> 88<211> 88

<212> DNA<212>DNA

<220><220>

<221> misc_feature<221> misc_feature

<222> (29)..(60)<222> (29)..(60)

<223> n is a, c, g, or t<223> n is a, c, g, or t

<400> 17<400> 17

gtgaactgcc gagtaggcag ctgaagttnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 60gtgaactgcc gagtaggcag ctgaagttnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 60

gtgaactgcc gagtaggcag ctgaagtt 88gtgaactgcc gagtaggcag ctgaagtt 88

<210> 18<210> 18

<211> 88<211> 88

<212> DNA<212>DNA

<220><220>

<221> misc_feature<221> misc_feature

<222> (29)..(60)<222> (29)..(60)

<223> n is a, c, g, or t<223> n is a, c, g, or t

<400> 18<400> 18

gtgaactgcc gagtaggcag ctggaaat 88gtgaactgcc gagtaggcag ctggaaat 88

<210> 19<210> 19

<211> 632<211>632

<212> DNA<212>DNA

<400> 19<400> 19

ttaattttcc ttaattattt ttaaagttag actgatttag acttggaaaa gcttaatgat 60ttaattttcc ttaattattt ttaaagttag actgatttag acttggaaaa gcttaatgat 60

tggagagcta aattgactaa tagtattcag tcaagtttaa ttagttttaa gcgatccatg 120tggagagcta aattgactaa tagtattcag tcaagtttaa ttagttttaa gcgatccatg 120

cataactatt tctgtacaat gctatatttt caccaattaa taatttttaa tcaagcctac 180cataactatt tctgtacaat gctatatttt caccaattaa taatttttaa tcaagcctac 180

attatgaaat atactatacc catttgaact cttctatttg taccattgtc ggtagcaaaa 240attatgaaat atactatacc catttgaact cttctatttg taccatgtc ggtagcaaaa 240

acttatgggg ttttgaacgt cacttaaatt gtaagcattt gcgatggagg cgcgtttaga 300acttatgggg ttttgaacgt cacttaaatt gtaagcattt gcgatggagg cgcgtttaga 300

gtcaaccttg attctgatat gctccgaatt tttggtaaga atataagtgt gagagtagct 360gtcaaccttg attctgatat gctccgaatt tttggtaaga atataagtgt gagagtagct 360

aatgtggata cgcctgagtt aagggaaaaa tgtgaaaatg aaataactcg ttatcatgca 420aatgtggata cgcctgagtt aagggaaaaa tgtgaaaatg aaataactcg ttatcatgca 420

aagtgactaa ggttataatc ttccgtttat ggcacatagc agccaactaa acttgacagt 480aagtgactaa ggttataatc ttccgtttat ggcacatagc agccaactaa acttgacagt 480

atttttatgt ggttggcttt ataaaaccag catttggtaa cattatgcca atttttactt 540attttatgt ggttggcttt ataaaaccag catttggtaa cattatgcca atttttactt 540

caatattatg ccaacataca ctacactaac ggagctgtag cacaataagc tcgtttgtac 600caatattatg ccaacataca ctacactaac ggagctgtag cacaataagc tcgtttgtac 600

ttatgccaac ttatacttca aacaacattg gg 632ttatgccaac ttatacttca aacaacattg gg 632

<210> 20<210> 20

<211> 113<211> 113

<212> DNA<212>DNA

<400> 20<400> 20

ttgggtgttg tttgaagtat aagttgacat atctgtacta aaagatggca taaattggaa 60ttgggtgttg tttgaagtat aagttgacat atctgtacta aaagatggca taaattggaa 60

gtgtaaggtg gcatagtcta gtatttaacc aaatggttaa atggttgact cac 113gtgtaaggtg gcatagtcta gtatttaacc aaatggttaa atggttgact cac 113

<210> 21<210> 21

<211> 8138<211> 8138

<212> DNA<212>DNA

<213> 人工序列()<213> artificial sequence ()

<400> 21<400> 21

cattaattcc taatttttgt tgacactcta tcattgatag agttatttta ccactcccta 60cattaattcc taatttttgt tgacactcta tcattgatag agttatttta ccactcccta 60

tcagtgatag agaaaagtga actctagaaa taattttgtt taactttaaa aggagatata 120tcagtgatag agaaaagtga actctagaaa taattttgtt taactttaaa aggagatata 120

ccatgggtga actgccgagt aggcagctgg aaatgagacc tctggtctcg tgaactgccg 180ccatgggtga actgccgagt aggcagctgg aaatgagacc tctggtctcg tgaactgccg 180

agtaggcagc tggaaatgga tccgaattcg agctcggcgc gcctgcaggt cgacaagctt 240agtaggcagc tggaaatgga tccgaattcg agctcggcgc gcctgcaggt cgacaagctt 240

gcggccgctc taatctagac atcattaatt cctaattttt gttgacactc tatcattgat 300gcggccgctc taatctagac atcattaatt cctaattttt gttgacactc tatcattgat 300

agagttattt taccactccc tatcagtgat agagaaaagt gaactctaga aataattttg 360agagttatt taccactccc tatcagtgat agagaaaagt gaactctaga aataattttg 360

tttaacttta aaggagatat acatatgttt ttgcaaagac ctaaataacc aacaagaagg 420tttaacttta aaggagatat acatatgttt ttgcaaagac ctaaataacc aacaagaagg 420

agatatacat atgcactttc tggtgcagac caagagctac ccggacgagg cgctggaaag 480agatatacat atgcactttc tggtgcagac caagagctac ccggacgagg cgctggaaag 480

ctatctgctg cgtctggcgc gtgataacag ctacaacggt tatagcgagc tggcggacat 540ctatctgctg cgtctggcgc gtgataacag ctacaacggt tatagcgagc tggcggacat 540

cctgtggcag tggctggcgg aacaagataa cgagctggaa ggtgcgctgc cgctggcgct 600cctgtggcag tggctggcgg aacaagataa cgagctggaa ggtgcgctgc cgctggcgct 600

gagcaaggtg gacgtttacc acgcgcgtca ggcgagcagc ttccgtatcc gtgcgctgaa 660gagcaaggtg gacgtttacc acgcgcgtca ggcgagcagc ttccgtatcc gtgcgctgaa 660

actggtggcg caactggcgg acgttaacgc gggtgatatt ctggcgctgg cgtggcgtcg 720actggtggcg caactggcgg acgttaacgc gggtgatatt ctggcgctgg cgtggcgtcg 720

tagcaacttc aagtttggca acctggcggc ggtgagccgt aacgagctgg cgatcccgct 780tagcaacttc aagtttggca acctggcggc ggtgagccgt aacgagctgg cgatcccgct 780

ggaactgctg cgtaccgata acatcccggt ttgcattaaa tgcctgagcg agagcagcca 840ggaactgctg cgtaccgata acatcccggt ttgcattaaa tgcctgagcg agagcagcca 840

cattccgttt tactggcacc tgaagccgta taaagcgtgc cacaagcaca aaagccagct 900cattccgttt tactggcacc tgaagccgta taaagcgtgc cacaagcaca aaagccagct 900

gatcacccgt tgcaaggagt gctacgacct gattgattat cgtgcgagcg aggcgtttct 960gatcacccgt tgcaaggagt gctacgacct gattgattat cgtgcgagcg aggcgtttct 960

ggaatgcgtt tgcggttgca aaatcaccaa cagcgaacaa ctgaacgacg cggatttcaa 1020ggaatgcgtt tgcggttgca aaatcaccaa cagcgaacaa ctgaacgacg cggatttcaa 1020

gatcgcgatt gcgctggcga gcagcaacag ccagaaaatc gtgggcctga ttagctggtt 1080gatcgcgatt gcgctggcga gcagcaacag ccagaaaatc gtgggcctga ttagctggtt 1080

cgcgaaggtg aaacaactgg acgttagcga cgcggatttc aactgcgcgt ttgttgatta 1140cgcgaaggtg aaacaactgg acgttagcga cgcggatttc aactgcgcgt ttgttgatta 1140

cttcaacacc tggccggaga gcctgaccac cgaactggac ctgctgacca acaacgcgcg 1200cttcaacacc tggccggaga gcctgaccac cgaactggac ctgctgacca acaacgcgcg 1200

tctgaagcag ctgaacccgt ttaacaagac caaattcagc agcgtgtacg gtgacctgat 1260tctgaagcag ctgaacccgt ttaacaagac caaattcagc agcgtgtacg gtgacctgat 1260

ccgtgatggc caaattgcgg cgaccagcaa ccgtaagaac aaagttatcg acgagatcat 1320ccgtgatggc caaattgcgg cgaccagcaa ccgtaagaac aaagttatcg acgagatcat 1320

tagctatttt gtggaactgg ttgatagcaa cccgaaggcg aaacacccga acattggtga 1380tagctatttt gtggaactgg ttgatagcaa cccgaaggcg aaacacccga aattggtga 1380

cctgctgctg tgcaccttcg atgcggcggt gctgctgaac accaccaccg agcaggttta 1440cctgctgctg tgcaccttcg atgcggcggt gctgctgaac accaccaccg agcaggttta 1440

ccgtctgcac caagaagcgt ttctgaactg cgcgtatagc cagaagaaac acgaacaact 1500ccgtctgcac caagaagcgt ttctgaactg cgcgtatagc cagaagaaac acgaacaact 1500

gcgtgcggat agccacgtgt tctatctgcg tcaggttatc gagctgcagc aagcgtttgc 1560gcgtgcggat agccacgtgt tctatctgcg tcaggttatc gagctgcagc aagcgtttgc 1560

ggcggaaaaa ccgctgacca agaaacaatt cattgcgccg tggtaactta tgaacctgca 1620ggcggaaaaa ccgctgacca agaaacaatt cattgcgccg tggtaactta tgaacctgca 1620

ggatgcgctg gcgattgagc cgctgaagga aaaaaccacc gcgctgcgta agctgttcgt 1680ggatgcgctg gcgattgagc cgctgaagga aaaaaccacc gcgctgcgta agctgttcgt 1680

gccgtacacc agccacgttg aggtggatgg ttttgaggaa ctggcgctga ccgtgctgat 1740gccgtacacc agccacgttg aggtggatgg ttttgaggaa ctggcgctga ccgtgctgat 1740

caacctggtt tataagcgta gcgaaattga cgatctgacc agcgcgcgta ccgcgaaaag 1800caacctggtt tataagcgta gcgaaattga cgatctgacc agcgcgcgta ccgcgaaaag 1800

cgtgctgcgt gacgaggttc tgctgagcaa gtgcatcaac gaagtgaaat ggttccacac 1860cgtgctgcgt gacgaggttc tgctgagcaa gtgcatcaac gaagtgaaat ggttccacac 1860

ccacaacctg aagtacccgg acatccgtgt tagccaccaa cgtctgatta gcgaggtggt 1920ccacaacctg aagtacccgg acatccgtgt tagccaccaa cgtctgatta gcgaggtggt 1920

tagcgaagat atcgcgggta tttgcagccg tagcctgccg ctgagctttg gctggagcca 1980tagcgaagat atcgcgggta tttgcagccg tagcctgccg ctgagctttg gctggagcca 1980

caacagcgcg gagatcaacc acgcgaaact gttcctgacc agctttaact ggcagggtga 2040caacagcgcg gagatcaacc acgcgaaact gttcctgacc agctttaact ggcagggtga 2040

agtgacctgc ctggcgcgtc tgctgattaa cgaggaaccg gtttggatca acctgattcg 2100agtgacctgc ctggcgcgtc tgctgattaa cgaggaaccg gtttggatca acctgattcg 2100

tgcgtacggt ttcaccaaga aagcggttct ggagatcagc ggcaagatta aacagcaact 2160tgcgtacggt ttcaccaaga aagcggttct ggagatcagc ggcaagatta aacagcaact 2160

gccggtggcg gagttcccgc tggaagttag cagctttagc ccgcagctgc aaatgccgtt 2220gccggtggcg gagttcccgc tggaagttag cagctttagc ccgcagctgc aaatgccgtt 2220

tcagcaaagc tatctggtgg ttaccccggt ggttagccac gcgatgctgg cgaagatcca 2280tcagcaaagc tatctggtgg ttaccccggt ggttagccac gcgatgctgg cgaagatcca 2280

gcaactgacc accgaccgta aactgaactt cgcgctggtt gagcacagcc gtccggcgaa 2340gcaactgacc accgaccgta aactgaactt cgcgctggtt gagcacagcc gtccggcgaa 2340

cgttggtgat ctggcgagca gcgtgggtgg caacattcgt gttctgcgtt actttccgaa 2400cgttggtgat ctggcgagca gcgtgggtgg caacattcgt gttctgcgtt actttccgaa 2400

gacctatagc aaagcggtga accgtagcaa agttgcgaac aacgatatcg aaaaggcgtt 2460gacctatagc aaagcggtga accgtagcaa agttgcgaac aacgatatcg aaaaggcgtt 2460

caaaattcgt gcgctgctga gcagccagtt tcagcaagcg ctgctggtgc tggttggcat 2520caaaattcgt gcgctgctga gcagccagtt tcagcaagcg ctgctggtgc tggttggcat 2520

caagcagttc aacaccctgc gtcaaaaacg tctggcgcgt gtggcggcga tccgtcaagt 2580caagcagttc aacaccctgc gtcaaaaacg tctggcgcgt gtggcggcga tccgtcaagt 2580

gcgtgttagc ctgcaactgt ggctggacaa cattctggag gcgaagaaca acgcgcagaa 2640gcgtgttagc ctgcaactgt ggctggaca cattctggag gcgaagaaca acgcgcagaa 2640

ccaagtgtac ccggaatggg ttcgtcacta tctggatcaa agcatcacca actgcattag 2700ccaagtgtac ccggaatggg ttcgtcacta tctggatcaa agcatcacca actgcattag 2700

ccagttcagc aacgttctga acgaaagcct gggtaacctg agcaagctga aacgttttgc 2760ccagttcagc aacgttctga acgaaagcct gggtaacctg agcaagctga aacgttttgc 2760

gtaccacccg aacctgatgg gcctgttcaa agcgcaactg aactatgtgt ttacccactg 2820gtaccacccg aacctgatgg gcctgttcaa agcgcaactg aactatgtgt ttacccactg 2820

cgcggcggag caggaaatcc tgaacgacga gcaaattgtg tacgttcact gccaggacat 2880cgcggcggag caggaaatcc tgaacgacga gcaaattgtg tacgttcact gccaggacat 2880

gcgtgttttc gatgcggaag cgatggcgaa cccgtatatc cagggtatgc cgagcctgac 2940gcgtgttttc gatgcggaag cgatggcgaa cccgtatatc cagggtatgc cgagcctgac 2940

cgcgctgaac ggcctggcgc acaacttcga gcgtaagctg aaaaacttta ttgatccgag 3000cgcgctgaac ggcctggcgc acaacttcga gcgtaagctg aaaaacttta ttgatccgag 3000

catcaagtgc attggtagcg cgatctacat tgagaactat caactgcaca ccggcaaacc 3060catcaagtgc attggtagcg cgatctacat tgagaactat caactgcaca ccggcaaacc 3060

gctgccggaa ccgagcaagc tgaaacaggt ggcgggtcgt agccacgtta tccgtagcgg 3120gctgccggaa ccgagcaagc tgaaacaggt ggcgggtcgt agccacgtta tccgtagcgg 3120

catcattgac aagccgaaat gcgacattac cctggatctg gtgttccgtc tgtttgttcc 3180catcattgac aagccgaaat gcgacattac cctggatctg gtgttccgtc tgtttgttcc 3180

gaacaccgaa ctgctggata agctgaacag ccaactgatt aagccggcgc tgccgagcag 3240gaacaccgaa ctgctggata agctgaacag ccaactgatt aagccggcgc tgccgagcag 3240

ctttgcgggt ggcaccatgc acccgccgag cctgtaccag aacattgact ggtgccacgt 3300ctttgcgggt ggcaccatgc acccgccgag cctgtaccag aacattgact ggtgccacgt 3300

gcacaccaag ccgagcgagc tgtttaagaa actgaaggcg aaaagcagca acggtagctg 3360gcacaccaag ccgagcgagc tgtttaagaa actgaaggcg aaaagcagca acggtagctg 3360

gctgtatccg agcaagaaag tggttaaaag cttcgaacag ctgatcgacg cgctgaacag 3420gctgtatccg agcaagaaag tggttaaaag cttcgaacag ctgatcgacg cgctgaacag 3420

caactttaac ctgcgtccgg cggcgattgg cctggcggcg ctggaggaac cggtgaagcg 3480caactttaac ctgcgtccgg cggcgattgg cctggcggcg ctggaggaac cggtgaagcg 3480

tgatgcggcg ctgcacgagt accactgcta tgcggaaccg gttatcggtc tgctggagtg 3540tgatgcggcg ctgcacgagt accactgcta tgcggaaccg gttatcggtc tgctggagtg 3540

cgtgagcaac accagcgtta agtacgcggg cgcgaaacaa ttctttcacg acgcgttctg 3600cgtgagcaac accagcgtta agtacgcggg cgcgaaacaa ttctttcacg acgcgttctg 3600

ggtgatggat gttcagaagg aaagcatgct gatgaagaaa agcaaatttg agtatgaata 3660ggtgatggat gttcagaagg aaagcatgct gatgaagaaa agcaaatttg agtatgaata 3660

atgcagctgc cgcgtcacct gagctacacc cgtagcctga gcccgagcaa ggcggtgttc 3720atgcagctgc cgcgtcacct gagctacacc cgtagcctga gcccgagcaa ggcggtgttc 3720

ttttataaaa ccccggagag cgacttcgaa ccgctgcaga tcgagcaaaa caaactggtg 3780ttttataaaa ccccggagag cgacttcgaa ccgctgcaga tcgagcaaaa caaactggtg 3780

ggtcagaaga gcggttttgg cgatgcgtac cagaagcaaa acgttgcgaa aaacctggcg 3840ggtcagaaga gcggttttgg cgatgcgtac cagaagcaaa acgttgcgaa aaacctggcg 3840

ccgcaggacc tggcgtttgg taacccgcaa accattgatg tgtgctatgt tccgccgacc 3900ccgcaggacc tggcgtttgg taacccgcaa accattgatg tgtgctatgt tccgccgacc 3900

gtgaacgaac tgttctgccg ttttagcctg cgtgttgagg cgaactgcat cgaaccgcac 3960gtgaacgaac tgttctgccg ttttagcctg cgtgttgagg cgaactgcat cgaaccgcac 3960

gtgtgcgacg atccgaaggt tatttactgg ctgaaacgtt tctttgaaac ctataagaaa 4020gtgtgcgacg atccgaaggt tattactgg ctgaaacgtt tctttgaaac ctataagaaa 4020

cacaacggtc tgaacgaagt ggcgacccgt tacgcgaaga acatcctgat gggcaactgg 4080cacaacggtc tgaacgaagt ggcgacccgt tacgcgaaga acatcctgat gggcaactgg 4080

ctgtggcgta accgtcagag cccgaacgtt gacatcgaga ttctgaccga acacgcggcg 4140ctgtggcgta accgtcagag cccgaacgtt gacatcgaga ttctgaccga acacgcggcg 4140

ccgattgtgg ttgagggtgc gcagaagctg aaatggcaag gcaactggca gaacaaccaa 4200ccgattgtgg ttgagggtgc gcagaagctg aaatggcaag gcaactggca gaacaaccaa 4200

accgcgctgc tgaccctgag cgagagcatc caggaaggtc tgagcaaccc gcaaaactac 4260accgcgctgc tgaccctgag cgagagcatc caggaaggtc tgagcaaccc gcaaaactac 4260

tgctatctgg atatcaccgc gaagattaaa aacgcgttca gccaggaagt gcacccgagc 4320tgctatctgg atatcaccgc gaagattaaa aacgcgttca gccaggaagt gcacccgagc 4320

caaaagtttg tggacaacgt tgaacagggt atgagcagca aacagctggc gtatacccaa 4380caaaagtttg tggacaacgt tgaacagggt atgagcagca aacagctggc gtatacccaa 4380

gtgggcgata agaaagcggc gagcctgaac agccagaagg ttggcgcggc gatccaaacc 4440gtgggcgata agaaagcggc gagcctgaac agccagaagg ttggcgcggc gatccaaacc 4440

attgacgatt ggtacgagga aggttataaa ccgctgcgta cccatgagta tggtgcggac 4500attgacgatt ggtacgagga aggttataaa ccgctgcgta cccatgagta tggtgcggac 4500

aagcaaatcc tggtggcgca ccgtaccccg aaaagccaca gcgattttta tagcctgctg 4560aagcaaatcc tggtggcgca ccgtaccccg aaaagccaca gcgattttta tagcctgctg 4560

ccgcgtatcg cgctgcacat taagcacatg gaaaaacacg gtctggagca gagcgaacaa 4620ccgcgtatcg cgctgcacat taagcacatg gaaaaacacg gtctggagca gagcgaacaa 4620

agcaacagca tccacttcat tgcggcggtt ctgattaagg gtggcctgtt tcagcgtagc 4680agcaacagca tccacttcat tgcggcggtt ctgattaagg gtggcctgtt tcagcgtagc 4680

aaaggatgaa gcgttactat ttcaccatca cctacctgcc gcaaagctgc gatgtgagcc 4740aaaggatgaa gcgttactat ttcaccatca cctacctgcc gcaaagctgc gatgtgagcc 4740

tgctggcggg tcgttgcatc ggcattctgc acggtttcat gagcagccgt gagatcagca 4800tgctggcggg tcgttgcatc ggcattctgc acggtttcat gagcagccgt gagatcagca 4800

acattggcgt gtgctttccg aaatggaacg agcagaccat cggtaacgaa ctggcgtttg 4860acattggcgt gtgctttccg aaatggaacg agcagaccat cggtaacgaa ctggcgtttg 4860

ttagcaccaa caagaaacaa ctgaccaacc tgagccagca aagctatttc gagatgatgg 4920ttagcaccaa caagaaacaa ctgaccaacc tgagccagca aagctatttc gagatgatgg 4920

cgcacgacaa gctgtttggc ctgagcaaaa ttctggaagt gccggttaac cagagcgaag 4980cgcacgacaa gctgtttggc ctgagcaaaa ttctggaagt gccggttaac cagagcgaag 4980

tgatgttcgt tcgtaaccaa agcgtggcga aggcgtttgt tggtgaaaag caacgtcgtc 5040tgatgttcgt tcgtaaccaa agcgtggcga aggcgtttgt tggtgaaaag caacgtcgtc 5040

tgaaacgtgc gaagaaacgt gcggaggcgc gtggcgaagt gtacaacccg gagtataagt 5100tgaaacgtgc gaagaaacgt gcggaggcgc gtggcgaagt gtacaacccg gagtataagt 5100

tcgaagcgaa agatatcggt cactttcaca gcattccggt gagcagcaag ggtaacggcc 5160tcgaagcgaa agatatcggt cactttcaca gcattccggt gagcagcaag ggtaacggcc 5160

agagctacgt tctgcacatc caaaagaacg agaacgcgga aagcattaaa aaccagttca 5220agagctacgt tctgcacatc caaaagaacg agaacgcgga aagcattaaa aaccagttca 5220

acaactatgg ctttgcgacc aaccaaattt tcctgggcac cgtgccgagc ctgaacaccc 5280acaactatgg ctttgcgacc aaccaaattt tcctgggcac cgtgccgagc ctgaacaccc 5280

tgctgtaagg taccaccctt aatctgacct aggctgctgc caccgctgag caataactag 5340tgctgtaagg taccaccctt aatctgacct aggctgctgc caccgctgag caataactag 5340

cataacccct tggggcctct aaacgggtct tgaggggttt tttgctgaaa cctcaggcat 5400cataacccct tggggcctct aaacgggtct tgaggggttt tttgctgaaa cctcaggcat 5400

ttgagaagca cacggtcaca ctgcttccgg tagtcaataa accggtaaac cagcaataga 5460ttgagaagca cacggtcaca ctgcttccgg tagtcaataa accggtaaac cagcaataga 5460

cataagcggc tatttaacga ccctgccctg aaccgacgac cgggtcatcg tggccggatc 5520cataagcggc tattaacga ccctgccctg aaccgacgac cgggtcatcg tggccggatc 5520

ttgcggcccc tcggcttgaa cgaattgtta gacattattt gccgactacc ttggtgatct 5580ttgcggcccc tcggcttgaa cgaattgtta gacattattt gccgactacc ttggtgatct 5580

cgcctttcac gtagtggaca aattcttcca actgatctgc gcgcgaggcc aagcgatctt 5640cgcctttcac gtagtggaca aattcttcca actgatctgc gcgcgaggcc aagcgatctt 5640

cttcttgtcc aagataagcc tgtctagctt caagtatgac gggctgatac tgggccggca 5700cttcttgtcc aagataagcc tgtctagctt caagtatgac gggctgatac tgggccggca 5700

ggcgctccat tgcccagtcg gcagcgacat ccttcggcgc gattttgccg gttactgcgc 5760ggcgctccat tgcccagtcg gcagcgacat ccttcggcgc gattttgccg gttactgcgc 5760

tgtaccaaat gcgggacaac gtaagcacta catttcgctc atcgccagcc cagtcgggcg 5820tgtaccaaat gcgggacaac gtaagcacta catttcgctc atcgccagcc cagtcgggcg 5820

gcgagttcca tagcgttaag gtttcattta gcgcctcaaa tagatcctgt tcaggaaccg 5880gcgagttcca tagcgttaag gtttcattta gcgcctcaaa tagatcctgt tcaggaaccg 5880

gatcaaagag ttcctccgcc gctggaccta ccaaggcaac gctatgttct cttgcttttg 5940gatcaaagag ttcctccgcc gctggaccta ccaaggcaac gctatgttct cttgcttttg 5940

tcagcaagat agccagatca atgtcgatcg tggctggctc gaagatacct gcaagaatgt 6000tcagcaagat agccagatca atgtcgatcg tggctggctc gaagatacct gcaagaatgt 6000

cattgcgctg ccattctcca aattgcagtt cgcgcttagc tggataacgc cacggaatga 6060cattgcgctg ccattctcca aattgcagtt cgcgcttagc tggataacgc cacggaatga 6060

tgtcgtcgtg cacaacaatg gtgacttcta cagcgcggag aatctcgctc tctccagggg 6120tgtcgtcgtg cacaacaatg gtgacttcta cagcgcggag aatctcgctc tctccagggg 6120

aagccgaagt ttccaaaagg tcgttgatca aagctcgccg cgttgtttca tcaagcctta 6180aagccgaagt ttccaaaagg tcgttgatca aagctcgccg cgttgtttca tcaagcctta 6180

cggtcaccgt aaccagcaaa tcaatatcac tgtgtggctt caggccgcca tccactgcgg 6240cggtcaccgt aaccagcaaa tcaatatcac tgtgtggctt caggccgcca tccactgcgg 6240

agccgtacaa atgtacggcc agcaacgtcg gttcgagatg gcgctcgatg acgccaacta 6300agccgtacaa atgtacggcc agcaacgtcg gttcgagatg gcgctcgatg acgccaacta 6300

cctctgatag ttgagtcgat acttcggcga tcaccgcttc cctcatactc ttcctttttc 6360cctctgatag ttgagtcgat acttcggcga tcaccgcttc cctcatactc ttcctttttc 6360

aatattattg aagcatttat cagggttatt gtctcatgag cggatacata tttgaatgta 6420aatattattg aagcatttat cagggttatt gtctcatgag cggatacata tttgaatgta 6420

tttagaaaaa taaacaaata gctagctcac tcggtcgcta cgctccgggc gtgagactgc 6480tttagaaaaa taaacaaata gctagctcac tcggtcgcta cgctccgggc gtgagactgc 6480

ggcgggcgct gcggacacat acaaagttac ccacagattc cgtggataag caggggacta 6540ggcgggcgct gcggacacat acaaagttac ccacagattc cgtggataag caggggacta 6540

acatgtgagg caaaacagca gggccgcgcc ggtggcgttt ttccataggc tccgccctcc 6600acatgtgagg caaaacagca gggccgcgcc ggtggcgttt ttccataggc tccgccctcc 6600

tgccagagtt cacataaaca gacgcttttc cggtgcatct gtgggagccg tgaggctcaa 6660tgccagagtt cacataaaca gacgcttttc cggtgcatct gtgggagccg tgaggctcaa 6660

ccatgaatct gacagtacgg gcgaaacccg acaggactta aagatcccca ccgtttccgg 6720ccatgaatct gacagtacgg gcgaaacccg acaggactta aagatcccca ccgtttccgg 6720

cgggtcgctc cctcttgcgc tctcctgttc cgaccctgcc gtttaccgga tacctgttcc 6780cgggtcgctc cctcttgcgc tctcctgttc cgaccctgcc gtttaccgga tacctgttcc 6780

gcctttctcc cttacgggaa gtgtggcgct ttctcatagc tcacacactg gtatctcggc 6840gcctttctcc cttacgggaa gtgtggcgct ttctcatagc tcacacactg gtatctcggc 6840

tcggtgtagg tcgttcgctc caagctgggc tgtaagcaag aactccccgt tcagcccgac 6900tcggtgtagg tcgttcgctc caagctgggc tgtaagcaag aactccccgt tcagcccgac 6900

tgctgcgcct tatccggtaa ctgttcactt gagtccaacc cggaaaagca cggtaaaacg 6960tgctgcgcct tatccggtaa ctgttcactt gagtccaacc cggaaaagca cggtaaaacg 6960

ccactggcag cagccattgg taactgggag ttcgcagagg atttgtttag ctaaacacgc 7020ccactggcag cagccattgg taactggggag ttcgcagagg atttgtttag ctaaacacgc 7020

ggttgctctt gaagtgtgcg ccaaagtccg gctacactgg aaggacagat ttggttgctg 7080ggttgctctt gaagtgtgcg ccaaagtccg gctacactgg aaggacagat ttggttgctg 7080

tgctctgcga aagccagtta ccacggttaa gcagttcccc aactgactta accttcgatc 7140tgctctgcga aagccagtta ccacggttaa gcagttcccc aactgactta accttcgatc 7140

aaaccacctc cccaggtggt tttttcgttt acagggcaaa agattacgcg cagaaaaaaa 7200aaaccacctc cccaggtggt tttttcgttt acagggcaaa agattacgcg cagaaaaaaa 7200

ggatctcaag aagatccttt gatcttttct actgaaccgc tctagatttc agtgcaattt 7260ggatctcaag aagatccttt gatcttttct actgaaccgc tctagatttc agtgcaattt 7260

atctcttcaa atgtagcacc tgaagtcagc cccatacgat ataagttgta attctcatgt 7320atctcttcaa atgtagcacc tgaagtcagc cccatacgat ataagttgta attctcatgt 7320

tagtcatgcc ccgcgcccac cggaaggagc tgactgggtt gaaggctctc aagggcatcg 7380tagtcatgcc ccgcgcccac cggaaggagc tgactgggtt gaaggctctc aagggcatcg 7380

gtcgagatcc cggtgcctaa tgagtgagct aacttaccgt tgtaaaacga cggccagtga 7440gtcgagatcc cggtgcctaa tgagtgagct aacttaccgt tgtaaaacga cggccagtga 7440

attcctgatg aatcccctaa tgatttttat caaaatcatt aaggttacca tcacggaaaa 7500attcctgatg aatcccctaa tgatttttat caaaatcatt aaggttacca tcacggaaaa 7500

aggttatgct gcttttaaga cccactttca catttaagtt gtttttctaa tccgcatatg 7560aggttatgct gcttttaaga cccactttca catttaagtt gtttttctaa tccgcatatg 7560

atcaattcaa ggccgaataa gaaggctggc tctgcacctt ggtgatcaaa taattcgata 7620atcaattcaa ggccgaataa gaaggctggc tctgcacctt ggtgatcaaa taattcgata 7620

gcttgtcgta ataatggcgg catactatca gtagtaggtg tttccctttc ttctttagcg 7680gcttgtcgta ataatggcgg catactatca gtagtaggtg tttccctttc ttctttagcg 7680

acttgatgct cttgatcttc caatacgcaa cctaaagtaa aatgccccac agcgctgagt 7740acttgatgct cttgatcttc caatacgcaa cctaaagtaa aatgccccac agcgctgagt 7740

gcatataatg cattctctag tgaaaaacct tgttggcata aaaaggctaa ttgattttcg 7800gcatataatg cattctctag tgaaaaacct tgttggcata aaaaggctaa ttgattttcg 7800

agagtttcat actgtttttc tgtaggccgt gtacctaaat gtacttttgc tccatcgcga 7860agagtttcat actgtttttc tgtaggccgt gtacctaaat gtacttttgc tccatcgcga 7860

tgacttagta aagcacatct aaaactttta gcgttattac gtaaaaaatc ttgccagctt 7920tgacttagta aagcacatct aaaactttta gcgttattac gtaaaaaatc ttgccagctt 7920

tccccttcta aagggcaaaa gtgagtatgg tgcctatcta acatctcaat ggctaaggcg 7980tccccttcta aagggcaaaa gtgagtatgg tgcctatcta acatctcaat ggctaaggcg 7980

tcgagcaaag cccgcttatt ttttacatgc caatacaatg taggctgctc tacacctagc 8040tcgagcaaag cccgcttatt ttttacatgc caatacaatg taggctgctc tacacctagc 8040

ttctgggcga gtttacgggt tgttaaacct tcgattccga cctcattaag cagctctaat 8100ttctgggcga gtttacgggt tgttaaacct tcgattccga cctcattaag cagctctaat 8100

gcgctgttaa tcactttact tttatctaat ctagacat 8138gcgctgttaa tcactttact tttatctaat ctagacat 8138

<210> 22<210> 22

<211> 6320<211> 6320

<212> DNA<212>DNA

<213> 人工序列()<213> artificial sequence ()

<400> 22<400> 22

acgatcgtaa aaggatctca agaagatcct ttacggattc ccgacaccat cactctagat 60acgatcgtaa aaggatctca agaagatcct ttacggattc ccgacaccat cactctagat 60

ttcagtgcaa tttatctctt caaatgtagc acctgaagtc agccccatac gatataagtt 120ttcagtgcaa tttatctctt caaatgtagc acctgaagtc agccccatac gatataagtt 120

gtaattctca tgttagtcat gccccgcgcc caccggaagg agctgactgg gttgaaggct 180gtaattctca tgttagtcat gccccgcgcc caccggaagg agctgactgg gttgaaggct 180

ctcaagggca tcggtcgaga tcccggtgcc taatgagtga gctaacttac attaattgcg 240ctcaagggca tcggtcgaga tcccggtgcc taatgagtga gctaacttac attaattgcg 240

ttgcgctgat gaatccccta atgattttta tcaaaatcat taaggttacc atcacggaaa 300ttgcgctgat gaatccccta atgattttta tcaaaatcat taaggttacc atcacggaaa 300

aaggttatgc tgcttttaag acccactttc acatttaagt tgtttttcta atccgcatat 360aaggttatgc tgcttttaag accactttc acatttaagt tgtttttcta atccgcatat 360

gatcaattca aggccgaata agaaggctgg ctctgcacct tggtgatcaa ataattcgat 420gatcaattca aggccgaata agaaggctgg ctctgcacct tggtgatcaa ataattcgat 420

agcttgtcgt aataatggcg gcatactatc agtagtaggt gtttcccttt cttctttagc 480agcttgtcgt aataatggcg gcatactatc agtagtaggt gtttcccttt cttctttagc 480

gacttgatgc tcttgatctt ccaatacgca acctaaagta aaatgcccca cagcgctgag 540gacttgatgc tcttgatctt ccaatacgca acctaaagta aaatgcccca cagcgctgag 540

tgcatataat gcattctcta gtgaaaaacc ttgttggcat aaaaaggcta attgattttc 600tgcatataat gcattctcta gtgaaaaacc ttgttggcat aaaaaggcta attgattttc 600

gagagtttca tactgttttt ctgtaggccg tgtacctaaa tgtacttttg ctccatcgcg 660gagagtttca tactgttttt ctgtaggccg tgtacctaaa tgtacttttg ctccatcgcg 660

atgacttagt aaagcacatc taaaactttt agcgttatta cgtaaaaaat cttgccagct 720atgacttagt aaagcacatc taaaactttt agcgttatta cgtaaaaaat cttgccagct 720

ttccccttct aaagggcaaa agtgagtatg gtgcctatct aacatctcaa tggctaaggc 780ttccccttct aaagggcaaa agtgagtatg gtgcctatct aacatctcaa tggctaaggc 780

gtcgagcaaa gcccgcttat tttttacatg ccaatacaat gtaggctgct ctacacctag 840gtcgagcaaa gcccgcttat tttttacatg ccaatacaat gtaggctgct ctacacctag 840

cttctgggcg agtttacggg ttgttaaacc ttcgattccg acctcattaa gcagctctaa 900cttctgggcg agtttacggg ttgttaaacc ttcgattccg acctcattaa gcagctctaa 900

tgcgctgtta atcactttac ttttatctaa tctagacatc attaattcct aatttttgtt 960tgcgctgtta atcactttac ttttatctaa tctagacatc attaattcct aatttttgtt 960

gacactctat cattgataga gttattttac cactccctat cagtgataga gaaaagtgaa 1020gacactctat cattgataga gttattttac cactccctat cagtgataga gaaaagtgaa 1020

ctctagaaat aattttgttt aactttaaaa ggagatatac catgtaccgt cgtaagctga 1080ctctagaaat aattttgttt aactttaaaa ggagatatac catgtaccgt cgtaagctga 1080

aatatagccg tgttaagaac ctgcacaaat ttgcgagcca gaagaacaaa agcacctgcc 1140aatatagccg tgttaagaac ctgcacaaat ttgcgagcca gaagaacaaa agcacctgcc 1140

tggtggagag cagcctggaa ttcgacgcgt gcttccactt tgagttcagc ccgccgatcg 1200tggtggagag cagcctggaa ttcgacgcgt gcttccactt tgagttcagc ccgccgatcg 1200

cggcgtttga agcgcaaccg ctgggttacg agtatgaatt cgataaccgt atttgccgtt 1260cggcgtttga agcgcaaccg ctgggttacg agtatgaatt cgataaccgt atttgccgtt 1260

acaccccgga ctttctgctg acccacaccg atggcaccca gaagttcatc gaggttaagc 1320acaccccgga ctttctgctg accccacaccg atggcaccca gaagttcatc gaggttaagc 1320

cgcaaagcaa aattgcggac gaggattttc gtgcgcgttt catcgaaaag caggcgattg 1380cgcaaagcaa aattgcggac gaggattttc gtgcgcgttt catcgaaaag caggcgattg 1380

cgaaacaaga cggtcgtgat ctgatcctgg tgaccgacaa gcagattcgt gtttacccga 1440cgaaacaaga cggtcgtgat ctgatcctgg tgaccgacaa gcagattcgt gtttacccga 1440

ccctgaacaa cctgaaactg ctgcaccgtt atagcggctt tcagagcctg accgagctgc 1500ccctgaacaa cctgaaactg ctgcaccgtt atagcggctt tcagagcctg accgagctgc 1500

aagcgagcgt gctggaactg gttaagcagt acggtagcat caaagtgggc caactgattc 1560aagcgagcgt gctggaactg gttaagcagt acggtagcat caaagtgggc caactgattc 1560

gttatctgaa agttaccgcg ggtgaactgc tggcgaccgt gctgcgtctg ctgagcctgg 1620gttatctgaa agttaccgcg ggtgaactgc tggcgaccgt gctgcgtctg ctgagcctgg 1620

gccaactgtt cgcggatctg accaccaacg agatcagcat tgaaaccgcg atctggagca 1680gccaactgtt cgcggatctg accaccaacg agatcagcat tgaaaccgcg atctggagca 1680

acaatgttta ataacgacct gttcgacgat gagtttaacc agccgctgcc gaaggcggaa 1740acaatgttta ataacgacct gttcgacgat gagtttaacc agccgctgcc gaaggcggaa 1740

accaaactgc cgcagaacta taccaaggat ctgcaagcgc tgccggagaa gatcaaaacc 1800accaaactgc cgcagaacta taccaaggat ctgcaagcgc tgccggagaa gatcaaaacc 1800

accaccttcg cgaagctgaa atacattcaa tggctggagg cgaacatcca gggtggctgg 1860accaccttcg cgaagctgaa atacattcaa tggctggagg cgaacatcca gggtggctgg 1860

acccaaaaga acctggaacc gctgctgaaa ctgatgccgg acgttgaggg tgaaaagaaa 1920acccaaaaga acctggaacc gctgctgaaa ctgatgccgg acgttgaggg tgaaaagaaa 1920

ccgagctggc gtaccgcggc gcgttggtat agcgcgtaca ccaacgcgga taagaacatt 1980ccgagctggc gtaccgcggc gcgttggtat agcgcgtaca ccaacgcgga taagaacatt 1980

atggcgctga tcccgagcca ccagaagaaa ggcaaccgtg aacgtgacac caccaccgat 2040atggcgctga tcccgagcca ccagaagaaa ggcaaccgtg aacgtgacac caccaccgat 2040

aagttctttg agaaagcgct ggaacgttac ctggtgaagg agaaaccgag cgttgcgagc 2100aagttctttg agaaagcgct ggaacgttac ctggtgaagg agaaaccgag cgttgcgagc 2100

gcgtataagt tctacaaaga cctggtgatc attgaaaacg acagcgtggt tgatagcgtt 2160gcgtataagt tctacaaaga cctggtgatc attgaaaacg acagcgtggt tgatagcgtt 2160

ctgaaaccgc tgacctataa ggcgtttaaa aaccgtattg acaacctgcc gcagtatgag 2220ctgaaaccgc tgacctataa ggcgtttaaa aaccgtattg acaacctgcc gcagtatgag 2220

gttatgatcg cgcgttacgg caagcgtctg gcggatattg cgtacaacaa ggtggaaggc 2280gttatgatcg cgcgttacgg caagcgtctg gcggatattg cgtacaacaa ggtggaaggc 2280

cacaaacgtc cgattcgtgt gctggagaaa gttgaaatcg accacacccc gctggatctg 2340cacaaacgtc cgattcgtgt gctggagaaa gttgaaatcg accacacccc gctggatctg 2340

attctgctgg acgatgagct gcacatcccg ctgggtcgtc cgaccctgac catgctggtt 2400attctgctgg acgatgagct gcacatcccg ctgggtcgtc cgaccctgac catgctggtt 2400

gacgtttata gccactgcat cgtgggctac tatttcagct ttagcgagcc gagctacgat 2460gacgtttata gccactgcat cgtgggctac tatttcagct ttagcgagcc gagctacgat 2460

gcggttcgtc gtgcgatgct gaacgcgatg aagccgaaaa gcgaagtggc gaaactgtac 2520gcggttcgtc gtgcgatgct gaacgcgatg aagccgaaaa gcgaagtggc gaaactgtac 2520

ccggacacca ttaacgagtg gaagtgcgcg ggtaaaatcg aaaccctggt ggttgataac 2580ccggcacca ttaacgagtg gaagtgcgcg ggtaaaatcg aaaccctggt ggttgataac 2580

ggcgcggagt tctggagcaa cagcctggaa ctggcgtgcg aggaaatcgg tattaacacc 2640ggcgcggagt tctggagcaa cagcctggaa ctggcgtgcg aggaaatcgg tattaacacc 2640

cagtataacc cggtggcgaa gccgtggctg aaaccgttcg ttgagcgtat gtttggcacc 2700cagtataacc cggtggcgaa gccgtggctg aaaccgttcg ttgagcgtat gtttggcacc 2700

atcaacaccg aactgctgga cccggttccg ggcaagacct tcagcaacat cctgcaaaaa 2760atcaacaccg aactgctgga cccggttccg ggcaagacct tcagcaacat cctgcaaaaa 2760

cacgaataca acccgaagaa agacgcgatt atgcgtttca ccacctttat gcagctgttt 2820cacgaataca acccgaagaa agacgcgatt atgcgtttca ccacctttat gcagctgttt 2820

cacaagtggg tggttgatgt gtatcaccaa gacgcggata gccgtttcaa atacattccg 2880cacaagtggg tggttgatgt gtatcaccaa gacgcggata gccgtttcaa atacattccg 2880

agccagctgt gggaccaagg ctttaacacc ctgccgccga ccatgctgag cgatgcggat 2940agccagctgt gggaccaagg ctttaacacc ctgccgccga ccatgctgag cgatgcggat 2940

ctgcagcaac tggatgtggt tctgagcatc agcaaccacc gtgtgctgcg taagggtggc 3000ctgcagcaac tggatgtggt tctgagcatc agcaaccacc gtgtgctgcg taagggtggc 3000

attcgtctgg agaacctgag ctatgacagc accgaactgg cgaactaccg taagcagttc 3060attcgtctgg agaacctgag ctatgacagc accgaactgg cgaactaccg taagcagttc 3060

agccacaaag tgagccaaga ggttctgatc aaactgaacc cggacgatat tagctacatc 3120agccacaaag tgagccaaga ggttctgatc aaactgaacc cggacgatat tagctacatc 3120

tatgtgtacc tggacaagct ggaacactat attaaagttc cgtgcatcga tccgaacggt 3180tatgtgtacc tggacaagct ggaacactat attaaagttc cgtgcatcga tccgaacggt 3180

tacacccaga acctgagcct gaaccaacac aagatcaaca ttcgtatcca ccgtgacttt 3240tacacccaga acctgagcct gaaccaacac aagatcaaca ttcgtatcca ccgtgacttt 3240

attagcggta gcatcgataa cgttggcctg gcgaaggcgc gtatgttcat tcacaacaaa 3300attagcggta gcatcgataa cgttggcctg gcgaaggcgc gtatgttcat tcacaacaaa 3300

atccagaacg agtttgagga actgaagaac gcgccgaaac acagcaaggt gaaaggtggc 3360atccagaacg agtttgagga actgaagaac gcgccgaaac acagcaaggt gaaaggtggc 3360

aaggcgctgg cgaaacacca gaacattagc agcgacagcc aaaagagcat cacccacagc 3420aaggcgctgg cgaaacacca gaacattagc agcgacagcc aaaagagcat cacccacagc 3420

aaaccggtgg aggcgaagaa agttaccccg aaagaacaac cgaccgatag ctgggacgat 3480aaaccggtgg aggcgaagaa agttaccccg aaagaacaac cgaccgatag ctgggacgat 3480

ttcatcagcg acctggatgg tttttaatta tgctgaccga caagcagaaa gaaaagctga 3540ttcatcagcg acctggatgg tttttaatta tgctgaccga caagcagaaa gaaaagctga 3540

acgagttccg tgatgttttt attgaatacc cgatcattac caccatcttc aacgactttg 3600acgagttccg tgatgttttt attgaatacc cgatcattac caccatcttc aacgactttg 3600

atcgtctgcg tctgggtaaa ggcctgaccg gcgagaagcc gtgcatgctg ctgaacggtg 3660atcgtctgcg tctgggtaaa ggcctgaccg gcgagaagcc gtgcatgctg ctgaacggtg 3660

acaccggcac cggtaaaacc gcgctgatta aacagtataa ggaacgtcac ctgccgcaat 3720acaccggcac cggtaaaacc gcgctgatta aacagtataa ggaacgtcac ctgccgcaat 3720

tcatcaacgg tgttatgaac cacccggtgc tggttagccg tattccgagc aacccgaccc 3780tcatcaacgg tgttatgaac cacccggtgc tggttagccg tattccgagc aacccgaccc 3780

tggaaagcac cctggcggag ctgctgaaag acctgggtca agtgggcagc accgagcgta 3840tggaaagcac cctggcggag ctgctgaaag acctgggtca agtgggcagc accgagcgta 3840

agctgcgtat taacggcacc cgtctgacca ccagcctgat caaatgcctg aagacctgcg 3900agctgcgtat taacggcacc cgtctgacca ccagcctgat caaatgcctg aagacctgcg 3900

gcaccgaact gatcattatc gatgagtttc aggaactgat tgagcacaac caaggcaaga 3960gcaccgaact gatcattatc gatgagtttc aggaactgat tgagcacaac caaggcaaga 3960

aacgtcgtga aattgcgaac cgtctgaaat acatcaacga cgaggcgggt gttagcattg 4020aacgtcgtga aattgcgaac cgtctgaaat acatcaacga cgaggcgggt gttagcattg 4020

tgctggttgg catgccgtgg gcggaaaaga tcgcggatga gccgcagtgg agcagccgtc 4080tgctggttgg catgccgtgg gcggaaaaga tcgcggatga gccgcagtgg agcagccgtc 4080

tgctgatccg tcgtcaactg ccgtatttca aactgagcga gaacccgaag cactttgtgc 4140tgctgatccg tcgtcaactg ccgtatttca aactgagcga gaacccgaag cactttgtgc 4140

agctgattat cggtctggcg aaccgtatgc cgttcgcgga aaaaccgaac ctgagcgagc 4200agctgattat cggtctggcg aaccgtatgc cgttcgcgga aaaaccgaac ctgagcgagc 4200

aagcgaccgt tttcaccctg tttagcctga gcaaaggctg cttccgtacc ctgaagtact 4260aagcgaccgt tttcaccctg tttagcctga gcaaaggctg cttccgtacc ctgaagtact 4260

ttctggacga tgcggtgctg tatgcgctga tggacaacgc gaagaccctg accaccaaac 4320ttctggacga tgcggtgctg tatgcgctga tggacaacgc gaagaccctg accaccaaac 4320

acctggtgaa ggcgttcgaa gttctgtttc cggatgtgcc gaacctgttt accctgccgg 4380acctggtgaa ggcgttcgaa gttctgtttc cggatgtgcc gaacctgttt accctgccgg 4380

ttgcggagat caccgcgagc gaggtggaac gttacagcct gtataagccg gaaagcagcc 4440ttgcggagat caccgcgagc gaggtggaac gttacagcct gtataagccg gaaagcagcc 4440

aggacgagga cccgttcatt gcgaccaaat ttaccgatcg tatgccgatc agccaactgc 4500aggacgagga cccgttcatt gcgaccaaat ttaccgatcg tatgccgatc agccaactgc 4500

tgcgtaagta actcgagccg ctgagcaata actagcataa ccccttgggg cctctaaacg 4560tgcgtaagta actcgagccg ctgagcaata actagcataa ccccttgggg cctctaaacg 4560

ggtcttgagg ggttttttgc tgaaacctca ggcatttgag aagcacacgg tcacactgct 4620ggtcttgagg ggttttttgc tgaaacctca ggcatttgag aagcacacgg tcacactgct 4620

tccggtagtc aataaaccgg taaaccagca atagacataa gcggctattt aacgaccctg 4680tccggtagtc aataaaccgg taaaccagca atagacataa gcggctattt aacgaccctg 4680

ccctgaaccg acgacaagct gacgaccggg tctccgcaag tggcactttt cggggaaatg 4740ccctgaaccg acgacaagct gacgaccggg tctccgcaag tggcactttt cggggaaatg 4740

tgcgcggaac ccctatttgt ttatttttct aaatacattc aaatatgtat ccgctcatga 4800tgcgcggaac ccctatttgt ttatttttct aaatacattc aaatatgtat ccgctcatga 4800

attaattctt agaaaaactc atcgagcatc aaatgaaact gcaatttatt catatcagga 4860attaattctt agaaaaactc atcgagcatc aaatgaaact gcaatttatt catatcagga 4860

ttatcaatac catatttttg aaaaagccgt ttctgtaatg aaggagaaaa ctcaccgagg 4920ttatcaatac catatttttg aaaaagccgt ttctgtaatg aaggagaaaa ctcaccgagg 4920

cagttccata ggatggcaag atcctggtat cggtctgcga ttccgactcg tccaacatca 4980cagttccata ggatggcaag atcctggtat cggtctgcga ttccgactcg tccaacatca 4980

atacaaccta ttaatttccc ctcgtcaaaa ataaggttat caagtgagaa atcaccatga 5040atacaaccta ttaatttccc ctcgtcaaaa ataaggttat caagtgagaa atcaccatga 5040

gtgacgactg aatccggtga gaatggcaaa agtttatgca tttctttcca gacttgttca 5100gtgacgactg aatccggtga gaatggcaaa agtttatgca tttctttcca gacttgttca 5100

acaggccagc cattacgctc gtcatcaaaa tcactcgcat caaccaaacc gttattcatt 5160acaggccagc cattacgctc gtcatcaaaa tcactcgcat caaccaaacc gttattcatt 5160

cgtgattgcg cctgagcgag acgaaatacg cggtcgctgt taaaaggaca attacaaaca 5220cgtgattgcg cctgagcgag acgaaatacg cggtcgctgt taaaaggaca attacaaaca 5220

ggaatcgaat gcaaccggcg caggaacact gccagcgcat caacaatatt ttcacctgaa 5280ggaatcgaat gcaaccggcg caggaacact gccagcgcat caacaatatt ttcacctgaa 5280

tcaggatatt cttctaatac ctggaatgct gttttcccgg ggatcgcagt ggtgagtaac 5340tcaggatatt cttctaatac ctggaatgct gttttcccgg ggatcgcagt ggtgagtaac 5340

catgcatcat caggagtacg gataaaatgc ttgatggtcg gaagaggcat aaattccgtc 5400catgcatcat caggagtacg gataaaatgc ttgatggtcg gaagaggcat aaattccgtc 5400

agccagttta gtctgaccat ctcatctgta acatcattgg caacgctacc tttgccatgt 5460agccagttta gtctgaccat ctcatctgta acatcattgg caacgctacc tttgccatgt 5460

ttcagaaaca actctggcgc atcgggcttc ccatacaatc gatagattgt cgcacctgat 5520ttcagaaaca actctggcgc atcgggcttc ccatacaatc gtagattgt cgcacctgat 5520

tgcccgacat tatcgcgagc ccatttatac ccatataaat cagcatccat gttggaattt 5580tgcccgacat tatcgcgagc ccatttatac ccatataaat cagcatccat gttggaattt 5580

aatcgcggcc tagagcaaga cgtttcccgt tgaatatggc tcatactctt cctttttcaa 5640aatcgcggcc tagagcaaga cgtttcccgt tgaatatggc tcatactctt cctttttcaa 5640

tattattgaa gcatttatca gggttattgt ctcatgagcg gatacatatt tgaatgtatt 5700tattattgaa gcatttatca gggttatgt ctcatgagcg gatacatatt tgaatgtatt 5700

tagaaaaata aacaaatagg catgctagcg cagaaacgtc ctagaagatg ccaggaggat 5760tagaaaaata aacaaatagg catgctagcg cagaaacgtc ctagaagatg ccaggaggat 5760

acttagcaga gagacaataa ggccggagcg aagccgtttt tccataggct ccgcccccct 5820acttagcaga gagacaataa ggccggagcg aagccgtttt tccataggct ccgcccccct 5820

gacgaacatc acgaaatctg acgctcaaat cagtggtggc gaaacccgac aggactataa 5880gacgaacatc acgaaatctg acgctcaaat cagtggtggc gaaacccgac aggactataa 5880

agataccagg cgtttccccc tgatggctcc ctcttgcgct ctcctgttcc cgtcctgcgg 5940agataccagg cgtttccccc tgatggctcc ctcttgcgct ctcctgttcc cgtcctgcgg 5940

cgtccgtgtt gtggtggagg ctttacccaa atcaccacgt cccgttccgt gtagacagtt 6000cgtccgtgtt gtggtggagg ctttacccaa atcaccacgt cccgttccgt gtagacagtt 6000

cgctccaagc tgggctgtgt gcaagaaccc cccgttcagc ccgactgctg cgccttatcc 6060cgctccaagc tgggctgtgt gcaagaaccc cccgttcagc ccgactgctg cgccttatcc 6060

ggtaactatc atcttgagtc caacccggaa agacacgaca aaacgccact ggcagcagcc 6120ggtaactatc atcttgagtc caacccggaa agacacgaca aaacgccact ggcagcagcc 6120

attggtaact gagaattagt ggatttagat atcgagagtc ttgaagtggt ggcctaacag 6180attggtaact gagaattagt ggatttagat atcgagagtc ttgaagtggt ggcctaacag 6180

aggctacact gaaaggacag tatttggtat ctgcgctcca ctaaagccag ttaccaggtt 6240aggctacact gaaaggacag tatttggtat ctgcgctcca ctaaagccag ttaccaggtt 6240

aagcagttcc ccaactgact taaccttcga tcaaaccgcc tccccaggcg gttttttcgt 6300aagcagttcc ccaactgact taaccttcga tcaaaccgcc tccccaggcg gttttttcgt 6300

ttacagagca ggagattacg 6320ttacagagca ggagattacg 6320

Claims

1. A CRISPR-associated transposase comprising a polypeptide selected from the group consisting of: a transposase protein tnsA derived from a bacterium of the genus pseudoalteromonas, a transposase protein tnsB derived from a bacterium of the genus pseudoalteromonas, a transposase protein tnsC derived from a bacterium of the genus pseudoalteromonas, a transposase protein tniQ derived from a bacterium of the genus pseudoalteromonas, a nuclease protein Cas5/8 derived from a bacterium of the genus pseudoalteromonas, a nuclease protein Cas6 derived from a bacterium of the genus pseudoalteromonas, and a nuclease protein Cas7 derived from a bacterium of the genus pseudoalteromonas.

2. The CRISPR-associated transposase of claim 1, wherein the Pseudoalteromonas bacterium is Pseudoalteromonas translucens, preferably wherein the Pseudoalteromonas translucens is Pseudoalteromonas translucens KMM520 (Pseudoalteromonas translucens transfucida KMM 520).

3. The CRISPR-associated transposase of claim 2,

the tnsA is a polypeptide with an amino acid sequence of SEQ ID NO. 1, or a polypeptide which has more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology and has the same function with the SEQ ID NO. 1;

the tnSB is a polypeptide with an amino acid sequence of SEQ ID NO. 2, or a polypeptide which has more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology and has the same function with the SEQ ID NO. 2;

the tnsC is a polypeptide with an amino acid sequence of SEQ ID NO. 3, or a polypeptide which has more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology with SEQ ID NO. 3 and has the same function;

the tniQ is a polypeptide having an amino acid sequence of SEQ ID NO. 4, or a polypeptide having more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology, with SEQ ID NO. 4 and having the same function;

the Cas5/8 is a polypeptide with an amino acid sequence of SEQ ID NO. 5, or a polypeptide which has more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology and has the same function with the SEQ ID NO. 5;

the Cas6 is a polypeptide with an amino acid sequence of SEQ ID NO. 6, or a polypeptide which has more than 95% homology, preferably more than 98% homology, more preferably more than 99% homology with the SEQ ID NO. 6 and has the same function;

the Cas7 is a polypeptide having an amino acid sequence of SEQ ID NO. 7, or a polypeptide having 95% or more homology, preferably 98% or more homology, more preferably 99% or more homology, with SEQ ID NO. 7 and having the same function.

4. A gene encoding the polypeptide as claimed in any one of claims 1 to 3.

5. The gene according to claim 4,

the gene encoding the polypeptide tnsA having the amino acid sequence of SEQ ID No. 1 is the nucleotide sequence of SEQ ID No. 8 or a polynucleotide having more than 80% homology, preferably more than 85% homology, more preferably more than 90% homology, more preferably more than 95% homology with SEQ ID No. 8;

the gene encoding the polypeptide tnsB having the amino acid sequence of SEQ ID NO. 2 is the nucleotide sequence SEQ ID NO. 9, or a polynucleotide having more than 80% homology, preferably more than 85% homology, more preferably more than 90% homology, more preferably more than 95% homology with SEQ ID NO. 9;

the gene encoding the polypeptide tnsC having the amino acid sequence of SEQ ID No. 3 is the nucleotide sequence of SEQ ID No. 10 or a polynucleotide having more than 80% homology, preferably more than 85% homology, more preferably more than 90% homology, more preferably more than 95% homology with SEQ ID No. 10;

the gene encoding the polypeptide tniQ having the amino acid sequence of SEQ ID No. 4 is the nucleotide sequence of SEQ ID No. 11, or a polynucleotide having more than 80% homology, preferably more than 85% homology, more preferably more than 90% homology, more preferably more than 95% homology with SEQ ID No. 11;

the gene encoding the polypeptide Cas5/8 with the amino acid sequence of SEQ ID No. 5 is the nucleotide sequence SEQ ID No. 12, or a polynucleotide having more than 80% homology, preferably more than 85% homology, more preferably more than 90% homology, more preferably more than 95% homology with SEQ ID No. 12;

the gene encoding the polypeptide Cas6 having the amino acid sequence of SEQ ID No. 6 is the nucleotide sequence SEQ ID No. 13 or a polynucleotide having more than 80% homology, preferably more than 85% homology, more preferably more than 90% homology, more preferably more than 95% homology with SEQ ID No. 13;

the gene encoding the polypeptide Cas7 having the amino acid sequence of SEQ ID NO. 7 is the nucleotide sequence SEQ ID NO. 14 or a polynucleotide having more than 80% homology, preferably more than 85% homology, more preferably more than 90% homology, more preferably more than 95% homology with SEQ ID NO. 14.

6. A plasmid, pqqacade or pqqacadeptr, for use in CRISPR transposon systems comprising a gene fragment selected from the group consisting of: a Cas 5/8-encoding gene as claimed in claim 4 or 5; a Cas 6-encoding gene as set forth in claim 4 or 5; a Cas 7-encoding gene as set forth in claim 4 or 5; a tniQ encoding gene as claimed in claim 4 or 5.

7. The plasmid pQCascadePtr of claim 6, wherein the spacer of the crRNA sequence, spacer, is a spacer that targets a single site in the genome or is an array of crRNAs that targets multiple sites in the genome.

8. The plasmid pQCascadePtr of claim 7, wherein the crRNA sequence is a genomic multisite targeted crRNA array, and wherein the repeat region repeat comprises one or more sequences selected from the group consisting of: the nucleotide sequence is repeat1 of SEQ ID NO. 15, the nucleotide sequence is repeat2 of SEQ ID NO. 16, the nucleotide sequence is repeat3 of SEQ ID NO. 17 and the nucleotide sequence is repeat4 of SEQ ID NO. 18, wherein 32N (N32) in the nucleotide sequences of SEQ ID NOs:15-18 are any base A, T, G or C.

9. A helper plasmid pTns or pTnsPtr for CRISPR transposon systems, for use in conjunction with the plasmid pqracadeptr of any of claims 6-8, comprising gene segments selected from the group consisting of: the gene encoding tnsA as claimed in claim 4 or 5; the tnsB-encoding gene as set forth in claim 4 or 5; the tnsC-encoding gene as set forth in claim 4 or 5.

10. A plasmid pqcrastnsptr for use in CRISPR transposon systems, which is the combination of the plasmid pqcratdtr of any one of claims 6 to 8 and the helper plasmid pTnsPtr of claim 9, comprising: the above Cas5/8, cas6, cas7, tniQ, tnSA, tnSB and tnSC genes, crRNA sequence targeting genome target site, colA replicon, promoter, kanamycin resistance gene.

11. A helper plasmid pDonorPtr for CRISPR transposon systems for use with the plasmid pqqascadeptr of any one of claims 6 to 8 and the helper plasmid pTnsPtr of claim 9, comprising gene segments selected from the group consisting of: a Left End (LE) having the nucleotide sequence of SEQ ID NO. 19 or a sequence having more than 80% homology, preferably more than 85% homology, more preferably more than 90% homology, more preferably more than 95% homology with SEQ ID NO. 19 and comprising 33bp of the 3' end of SEQ ID NO. 19; the sequence Right End (RE) having the nucleotide sequence of SEQ ID NO:20 or a sequence 27bp more than 80%, preferably more than 85%, more preferably more than 90%, more preferably more than 95% homologous to SEQ ID NO:20 and comprising the 5' end of SEQ ID NO: 20; the Cargo gene of interest (Cargo gene).

12. A plasmid pEffectorPtr for use in a CRISPR transposon system, formed by combining the plasmid pqqascadeptr of any one of claims 6-8, the helper plasmid pTnsPtr of claim 9, and the helper plasmid pDonorPtr of claim 11, comprising: the above Cas5/8, cas6, cas7, tniQ, tnsA, tnsB and tnsC genes, left End (LE) and Right End (RE) sequences, crRNA sequences targeting the genomic target site, colA replicons, promoters, kanamycin resistance genes.

13. A CRISPR transposon system, comprising: the plasmid pqqcacadepptr of any one of claims 6 to 8, the helper plasmid pTnsPtr of claim 9, the helper plasmid pDonorPtr of claim 11; or the plasmid pQCasTnsPtr of claim 10, the helper plasmid pDonorPtr of claim 11; or the plasmid pEffectorPtr of claim 12.

14. The CRISPR transposase system of claim 13, further comprising a Vibrio cholerae (Vibrio cholerae) Tn6677 derived CRISPR transposase-associated plasmid comprising plasmid pQCastnsVch and helper plasmid pDenoroVch,

wherein plasmid pQCastnsVch comprises: cas5/8, cas6, cas7, tniQ, tnSA, tnSB and tnsC genes from vibrio cholerae Tn6677, cloDF13 replicons, a promoter and a streptomycin resistance gene;

the plasmid pDronVch comprises Left End (LE) and Right End (RE) from Vibrio cholerae Tn6677, and a target Cargo gene (Cargo gene).

15. Use of a CRISPR-associated transposase of any of claims 1 to 3, a gene of claim 4 or 5, a plasmid pQCascadePtr of any of claims 6 to 8, a plasmid pTnsPtr of claim 9, a plasmid pqcastnptr of claim 10, a plasmid pDonorPtr of claim 11, a plasmid peffecterptr of claim 12, a CRISPR transposon system of claim 13 or 14 for gene editing.

16. Use of the CRISPR transposon system of claim 13 or 14 in gene editing, for gene editing of gram negative bacteria such as e.