Disclosure of Invention
The present invention relates to the establishment of a FokI and dCpf1 fusion protein mediated site-directed gene editing method, and certain aspects of the invention disclose compositions and methods for improving the specificity of RNA-editable endonucleases (e.g., Cpf 1); meanwhile, the method relates to a construction method of a FokI-dCpf1 fusion vector, a Fok1-Cpf1 fusion vector and a CAG-Cpf1 fusion vector, and realizes site-specific integration in a biological genome under low off-target rate through gRNA site-specific guidance and dimer Fok1 shearing.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
the invention fuses dCpf1 protein inactivated by mutation in a CRISPR/Cpf1 editing system and FokI nuclease to construct a high-efficiency gRNA-mediated gene site-directed editing system. The method utilizes the site-directed action of dCpf1 and the shearing action of nuclease FokI, and can improve the specificity of the gene editing system and achieve lower off-target rate by shearing in a dimer form. Compared with the Cas9 protein, the Cpf1 protein has greater advantages in gene editing and other operations. Therefore, the invention provides a more convenient and efficient fusion protein-mediated gene site-directed editing method.
The purpose of the invention is realized by the following technical scheme:
a fusion protein expression vector, wherein the nucleic acid sequence of the fusion protein expression vector is shown as SEQ ID NO: 1 is shown.
Further, the fusion protein expression vector is a plasmid vector pFOKI-dCpf1 formed by fusing FokI endonuclease and Cpf1 without cleavage activity.
Further, the fusion protein expression vector comprises a CAG promoter, a Kozak sequence, two SV40 verification micro-signal NLS sequences, a FokI sequence, a Linker sequence and a dCpf1 sequence;
the sequence of the CAG promoter is the base from the 5' end 1989-3638 in SEQ ID NO. 1;
the Kozak sequence is 3747-3556 bases from the 5' end in SEQ ID NO. 1;
the sequence of FokI is the 4374-;
the Linker sequence is 4401-4988 th base from the 5' end in SEQ ID NO. 1;
the sequence of dCpf1 is base 5004-8687 from the 5' end in SEQ ID NO. 1;
the two NLS sequences are respectively located at the upstream of FokI and the downstream of dCpf1, and are respectively the 4374 th-4394 th base and the 8694 th-8714 th base from the 5' end in SEQ ID NO. 1.
Further, the primer sequences of the fusion protein expression vector are as follows:
a fusion protein expression vector construction method includes cloning dCpf1 to LB vector, constructing pLB-dCpf1 vector, utilizing BsmBI enzyme to cut pLB-dCpf1, and cutting back fragment dCpf 1; obtaining T2A-NLS and FokI fragments from a laboratory storage vector by a PCR method; T2A-NLS, FokI, dCpf1 fragment and CAG frame were then linked together using the gold gate cloning method, transformed into TOP10 cells, single clones were picked, plasmid was extracted and identified by electrophoresis.
The fusion protein expression vector is applied to gene editing.
A fusion protein mediated site-directed gene editing method is characterized in that a genomic target site is edited in a site-directed manner under the guidance of gRNA after a nuclease-free dCpf1 is fused with a nuclease FokI.
Further, the FokI-dCpf1 gene editing system includes: a fokl-dCpf 1 fusion protein expression plasmid to recognize grnas and cleave and a gRNA expression plasmid to direct site-directed binding of the fokl-dCpf 1 fusion protein.
A gRNA vector, the nucleic acid sequence of which is shown in SEQ ID NO. 2.
Further, the sequences of gRNA primers are as follows:
a FokI and dCpf1 fusion protein mediated site-directed gene editing method is characterized in that a nuclease FokI is fused with dCpf1 without nuclease activity, and then a genomic target site is edited in a site-directed manner under the guidance of gRNA. The FokI-dCpf1 gene editing system comprises: a fokl-dCpf 1 fusion protein expression plasmid to recognize grnas and cleave and a gRNA expression plasmid to direct site-directed binding of the fokl-dCpf 1 fusion protein.
Fusion proteins and dimers thereof, such as fusion proteins, comprise two domains: 1) a nuclease-free Cpf1 domain; 2) nuclease regions (e.g., the fokl monomers provide DNA cleavage regions). The dCpf1 is subjected to D832A mutation to cause a cutting defect, and after the dCpf1 is fused with FokI, shearing can be carried out under the condition of generating a dimer, so that the specificity of the system is improved to a greater extent.
The fokl-dCpf 1 fusion protein contains two Nuclear Localization Signal (NLS) domains that provide a mediating signal for the transport of the fusion protein into the nucleus.
A Linker sequence (GGGGS) is contained between the FokI nuclease and dCpf1 protein.
A gRNA expression plasmid is constructed on the basis of an original frame pSQT1313 plasmid, and contains two gRNA scaffold sequences (such as SEQ ID No.2) between the gRNAs, and the sequences can be recognized and cut by dCpf1 to form 2 gRNAs and recognize target-targeting sites, so that the specificity of gRNA combination is improved, and the occurrence of off-target is reduced to a certain extent.
Grnas designed for any genomic site of interest can be edited by the fokl-dCpf 1 gene editing system described above, including but not limited to single base substitutions, random small fragment insertion/deletion mutations, gene fragment substitutions, gene knock-ins, gene knockouts, and other genetic manipulations.
The invention also provides a method for site-directed integration of the pFak 1-dCpf1 vector in the genome of an organism; the method comprises the following steps:
1) the fusion protein of the invention binds to DNA, e.g., a nuclease-free dCpf1 domain binds to a fokl DNA cleavage domain, wherein the gRNA binds to a region of the genomic DNA of the organism and a nuclease-free dCpf1 binds to the gRNA; 2) the second fusion protein (fokl-dCpf 1) binds to the biological genomic DNA, e.g., the second gRNA binds to another region of the biological genomic DNA, while dCpf1 binds to the second gRNA; 3) the binding domains generated in the first and second steps will form dimers of the nuclease domain, so that cleavage of DNA will occur at the site where the fusion protein binds.
The invention also provides a construction method of a gRNA vector used in cooperation with the plasmid fusion vector, a pair of gRNAs are jointly constructed into a vector, and the nucleic acid sequence of the vector is shown as SEQ ID NO.2
The invention utilizes the characteristics of Cpf1, has larger shearing selection margin than Cas9, constructs vectors with different spacer lengths, takes pSQT1313-IGF2-spacer 53 as an example, but is not limited to the gRNA, and the nucleotide sequence of the gRNA is shown as SEQ ID NO. 2.
Advantageous effects
The invention uses a Cpf1 protein (also known as Cas12a) that is more potent than Cas 9. The Cpf1 protein is a new member found in CRISPR/Cas system, is derived from francisella lanuginosa (francisella novirida), has double cleavage activity, can cleave not only DNA but also RNA, and in addition to Cas9, these bacteria also cleave foreign DNA using Cpf 1. Compared with the CRISPR/Cas9 system, the CRISPR/Cpf1 system has five advantages: 1) the Cpf1 protein requires only one RNA molecule for assistance (CrRNA), whereas Cas9 requires two RNA molecules for assistance (TracrRNA and CrRNA); 2) the molecular weight of the Cpf1 enzyme is smaller than that of Cas9, and the Cpf1 enzyme can enter cells more easily, so the editing success rate is higher; 3) the different recognition sequences of the Cpf1 system make the system more site-specific efficient; 4) the position of cleavage is different from Cas9, allowing greater latitude; 5) cpf1 cleavage creates sticky ends, favoring insertion of new DNA sequences, while Cas9 creates blunt ends. Compared with Cas9, the CRISPR/Cpf1 system is simpler and more accurate, has lower off-target rate and has better application prospect in gene editing. At present, the research on aspects such as gene editing is more and more important, and a set of gene editing tools with accurate fixed points, good specificity and low off-target rate is more urgent. In view of the advantages of Cpf1, the FokI-Cpf1 editing system constructed by the invention can better meet the current requirements.
The invention fuses Cpf1(dCpf1) with lost cleavage activity and Fok1, and aims to generate a tool which is more efficient, more convenient and faster than fCas9, has lower off-target rate and is more beneficial to biological gene editing. The invention constructs a fusion expression system of Fok1 and cleavage-defective Cas enzyme (Cpf1) by an in vivo DNA recombination technology, the fusion protein expressed by the system is specifically combined at a target site under the guidance of crRNA to form a dimer complex, and the dimer complex is cut at the target site at a fixed point; or under the combined action of a donor plasmid carrying a foreign gene, a fusion protein expression vector (pFak 1-Cpf1) and a gRNA expression vector, after the specific site of a dimer is formed, the foreign gene is inserted under the HDR (homologous recombination) action, so that the high-efficiency site-specific knock-in of a large gene fragment is realized, and meanwhile, the system removes the cutting action of Cpf1, and can cut only under the condition of dimer formation, so that the integration specificity of the Fok1-Cpf1 fusion vector is greatly improved, and the off-target rate of the fusion vector is reduced by recognizing different PAM sequences compared with Fok1-Cas 9. The method for site-directed mediated gene editing can be used as single base substitution, random small fragment insertion/deletion mutation, gene fragment substitution, gene knock-in, and gene knock-out, but is not limited thereto.
The invention is used as a gene editing tool with high efficiency and low off-target rate, and has great application potential in the fields of gene editing, gene therapy, gene function research and the like.
Detailed Description
Example 1
The experimental methods mentioned in the following examples are conventional methods unless otherwise specified; practice of the invention is not limited thereto.
Example I, pFok1 construction and application of dCpf1 fusion protein expression vector
1. pLB-dCpf1 vector construction
dCpf1 was PCR amplified from commercial vector WN10151(Addgene plasmid #53369) (see FIG. 1) and cloned into LB vector to construct pLB-dCpf1 vector, and plasmid DNA was detected by 1% agarose gel electrophoresis (see FIG. 2).
2. Construction of pFOKI-dCpf1 fusion protein expression vector
Obtained by digesting pLB-dCpf1 with BsmBI enzyme, and cutting back a fragment dCpf 1; the commercial vector pSQT1601(4849bp, 5474bp) was double-cut with both Acc65I and Not1 enzymes, and the CAG framework 4849bp was recovered. T2A-NLS, FokI fragment PCR amplified from a laboratory storage vector (primer sequences as in Table 1); the T2A-NLS, FokI, dCpf1 fragments and CAG framework were then ligated together using the gold-gated cloning method, reacted for 30min at 25 ℃ with T7 ligase, transformed into TOP10 cells, single clones were picked, and the plasmids were extracted and then identified by electrophoresis (see FIG. 3), and the suspected correct bands were sent to the company for sequencing. The correct vector was designated as pCAG-FokI-dCpf1 vector, and the plasmid map is shown in FIG. 4.
TABLE 1 FokI-dCpf1 vector construction primers
3. Construction of p SQT1313-gRNA vector
The present invention uses IGF2 gene as an example to verify the effect of the editing system, but is not limited to this gene. The gRNA was designed on the IGF2 first intron using the website CRISPRscan (https:// www.crisprscan.org /), PAM selected to be TTTV. The invention designs 6 pairs of gRNAs (as shown in Table 2) according to different spacer lengths (7bp, 16bp, 18bp, 21bp, 40bp and 53bp respectively), verifies the gRNAs, synthesizes gRNA primers (as shown in Table 3) with different spacer sequence lengths, dilutes the primers to 100 mu M, anneals the upstream and downstream primers to form double-stranded oligos, and phosphorylates the oligos by using T4PNK, wherein the reaction conditions are as follows: 2 mu L of gRNA-F; gRNA-R2 μ L; 1 mu L of T4PNK enzyme; 2X T7 ligase buffer (containing ATP) 5. mu.L. Reaction procedure: 30min at 37 ℃, 5min at 95 ℃ and ramp down to 25 ℃. After the reaction was completed, the annealed product was diluted 200-fold.
The pSQT1313(Addgene plasmid #53370) vector was digested with BsmB1 to obtain a framework containing Cpf1scaffold (AATTTCTACTAAGTGTAGAT). Connecting IGF2-gRNA (different spacer lengths are respectively 7bp, 16bp, 18bp, 21bp, 40bp and 53bp) annealing products with a SQT1313 framework, reacting for 1h at 25 ℃ by using T7 ligase, transforming into TOP10 cells, picking out a single clone, carrying out electrophoretic identification after plasmid extraction (as shown in figures 5-9), and sending a suspected correct band to a company for sequencing identification. The correct vector was named pSQT1313-IGF2-gRNA-spacer (modified according to different spacer lengths, e.g., pSQT1313-IGF2-gRNA-spacer53), and the plasmid map was only exemplified by spacer53 (see FIG. 10).
TABLE 2 gRNA sequences
TABLE 3 gRNA primer sequences
4. In this example, IGF2 was used as a target gene, and a mouse myoblast C2C12 was used as a model to successfully edit a target gene target sequence.
1) Recovery and culture of cryopreserved cells
pCAG-FokI-dCpf1 and pSQT1313-IGF2-gRNA-spacer plasmids were extracted using an OMEGA endotoxin-free plasmid extraction kit (purchased from OMEGA), and the final concentration of the product was adjusted to 500ng/ul for cell transfection.
Taking out the cryopreservation tube (cells preserved in the laboratory) filled with the mouse myoblast C2C12 from the liquid nitrogen, immediately putting into warm water at 37-40 ℃ and rapidly shaking until the cryopreservation liquid is completely thawed; completing rewarming within 1-2 min; transferring the cell suspension into a sterile centrifuge tube, adding 5mL of culture solution, and gently and uniformly blowing; centrifuging the cell suspension at 800-; adding 1mL of complete culture medium into a centrifuge tube containing the cell sediment, gently and uniformly blowing, transferring the cell suspension into a cell culture bottle, and adding a proper amount of complete culture medium for culture.
2) Cell transfection
The mouse myoblast C2C12 was divided into 6 groups, each group was transfected with 500ng of pCAG-FokI-dCpf1 and 500ng of pSQT1313-IGF2-gRNA-spacer (7bp, 16bp, 18bp, 21bp, 40bp, 53bp) plasmid, 3 replicates per group.
24h before transfection, 6-well plates were seeded at 3X 105 cells/well in 2000. mu.L MEM high-sugar medium (purchased from GIBCO) containing 10% fetal bovine serum (purchased from GIBCO) per well of six-well plates to achieve about 70-80% confluence before transfection; and (3) mixing 2 plasmids according to the proportion of 1: 1 mass ratio was mixed and diluted in 100. mu.L of Opti-MEM (purchased from GIBCO Co.) medium and gently mixed; add 3. mu.L of LFUGENE transfection reagent (from Promega) to 100. mu.L of Opti-MEM medium without serum and antibiotics and mix gently, incubate for 5min at room temperature; after 5min, 100. mu.L of each transfection reagent diluent was added to 100. mu.L of each DNA set diluent, gently mixed and left at room temperature for 20 min; add 200. mu.L of the mixture to the prepared wells, gently shake the plate back and forth, incubate it at 37 ℃ with saturated humidity and 5% CO2, replace the transfection medium with complete medium after 4h, collect the cells for use after complete incubation for 48 h.
3) T7EN enzyme digestion detection assay
Cells transfected for 48h were collected and the cell genome was extracted using the TIANGEN genome extraction kit. The corresponding target sites were amplified by PCR using each gRNA primer in table 4, and the amplification results are shown in fig. 11. The PCR product was subsequently purified using a Takara DNA purification kit for subsequent enzyme digestion identification assays. The total number of 7 groups are respectively a Spacer length7 group, a Spacer length16 group, a Spacer length 18 group, a Spacer length 2 group, a Spacer length40 group, a Spacer length53 group and a Positive control group. DNA denaturation System: DNA was purified at the 250ng target site, NEB Buffer 22. mu.L. The procedure is as follows: 5min at 95 ℃, 1min at 95 ℃, ramp at 65 ℃ (0.2 ℃/s)45sec, 45sec at 72 ℃, 19 cycles; 1min at 94 ℃, 45sec at 55 ℃,10 cycles; 10min at 72 ℃. Mu. L T7 endonuclease (10U/. mu.l, NEB) was added to the denatured DNA at 37 ℃ for 15 min. The cleavage products were detected by electrophoresis on a 2% agarose gel, and the results are shown in FIG. 12.
TABLE 4 spacer53 gRNA primer sequences for target sites
The result shows that incomplete paired DNA can be recognized and cut by the T7 endonuclease 1, so that the FokI-dCpf1 system utilizes specific gRNA, after the target site is positioned and cut, the non-homologous end connection of the organism genome can be generated, and the incomplete paired DNA sequence is generated, so that the target site can be cut by T7EN1 enzyme, and the incomplete paired DNA sequence can not be cut if the incomplete paired DNA sequence is not edited. From FIG. 12 and FIG. 13, two or more bands were generated in the five groups of spacer7, 18, 53, 40 and Positive control, indicating that the FokI-Cpf1 system in which four pairs of gRNAs were guided by 7, 18, 53 and 40 was spliced in C2C12 cells, indicating that the gene editing tool was effective in this system.
A pair of gRNA mediated FokI-dCpf1 systems of spacer53 were found to have a number of mutations at the target site by Sanger sequencing, specifically: cells were harvested 48h after co-transfection of C2C12 cells with 500ng pCAG-FokI-dCpf1 and 500ng pSQT1313-IGF2-gRNA-spacer 53. And extracting cell genome by using a TIAGEN genome extraction kit. PCR amplification was performed using the target site primers given in Table 3, followed by TA-cloning, ligation to LB vector, transformation to TOP10 cells, selection of single clones, electrophoresis identification after plasmid extraction as shown in FIG. 13, sequencing by Dagaku Bio Inc., and sequencing results as shown in FIG. 14, and multiple mutations at the target site in the C2C12 genome were observed. The results indicate that the FokI-dCpf1 system is an effective gene editing tool.
Sequence listing
<110> Yangzhou university
<120> FokI and dCpf1 fusion protein expression vector and its mediated site-directed gene editing method
<160> 2
<170> SIPOSequenceListing 1.0
<210> 1
<211> 9958
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
cattaatgaa tcggccaacg cgcggggaga ggcggtttgc gtattgggcg ctcttccgct 60
tcctcgctca ctgactcgct gcgctcggtc gttcggctgc ggcgagcggt atcagctcac 120
tcaaaggcgg taatacggtt atccacagaa tcaggggata acgcaggaaa gaacatgtga 180
gcaaaaggcc agcaaaaggc caggaaccgt aaaaaggccg cgttgctggc gtttttccat 240
aggctccgcc cccctgacga gcatcacaaa aatcgacgct caagtcagag gtggcgaaac 300
ccgacaggac tataaagata ccaggcgttt ccccctggaa gctccctcgt gcgctctcct 360
gttccgaccc tgccgcttac cggatacctg tccgcctttc tcccttcggg aagcgtggcg 420
ctttctcata gctcacgctg taggtatctc agttcggtgt aggtcgttcg ctccaagctg 480
ggctgtgtgc acgaaccccc cgttcagccc gaccgctgcg ccttatccgg taactatcgt 540
cttgagtcca acccggtaag acacgactta tcgccactgg cagcagccac tggtaacagg 600
attagcagag cgaggtatgt aggcggtgct acagagttct tgaagtggtg gcctaactac 660
ggctacacta gaagaacagt atttggtatc tgcgctctgc tgaagccagt taccttcgga 720
aaaagagttg gtagctcttg atccggcaaa caaaccaccg ctggtagcgg tggttttttt 780
gtttgcaagc agcagattac gcgcagaaaa aaaggatctc aagaagatcc tttgatcttt 840
tctacggggt ctgacgctca gtggaacgaa aactcacgtt aagggatttt ggtcatgaga 900
ttatcaaaaa ggatcttcac ctagatcctt ttaaattaaa aatgaagttt taaatcaatc 960
taaagtatat atgagtaaac ttggtctgac agttaccaat gcttaatcag tgaggcacct 1020
atctcagcga tctgtctatt tcgttcatcc atagttgcct gactccccgt cgtgtagata 1080
actacgatac gggagggctt accatctggc cccagtgctg caatgatacc gcgagaccca 1140
cgctcaccgg ctccagattt atcagcaata aaccagccag ccggaagggc cgagcgcaga 1200
agtggtcctg caactttatc cgcctccatc cagtctatta attgttgccg ggaagctaga 1260
gtaagtagtt cgccagttaa tagtttgcgc aacgttgttg ccattgctac aggcatcgtg 1320
gtgtcacgct cgtcgtttgg tatggcttca ttcagctccg gttcccaacg atcaaggcga 1380
gttacatgat cccccatgtt gtgcaaaaaa gcggttagct ccttcggtcc tccgatcgtt 1440
gtcagaagta agttggccgc agtgttatca ctcatggtta tggcagcact gcataattct 1500
cttactgtca tgccatccgt aagatgcttt tctgtgactg gtgagtactc aaccaagtca 1560
ttctgagaat agtgtatgcg gcgaccgagt tgctcttgcc cggcgtcaat acgggataat 1620
accgcgccac atagcagaac tttaaaagtg ctcatcattg gaaaacgttc ttcggggcga 1680
aaactctcaa ggatcttacc gctgttgaga tccagttcga tgtaacccac tcgtgcaccc 1740
aactgatctt cagcatcttt tactttcacc agcgtttctg ggtgagcaaa aacaggaagg 1800
caaaatgccg caaaaaaggg aataagggcg acacggaaat gttgaatact catactcttc 1860
ctttttcaat attattgaag catttatcag ggttattgtc tcatgagcgg atacatattt 1920
gaatgtattt agaaaaataa acaaataggg gttccgcgca catttccccg aaaagtgcca 1980
cctgggtcga cattgattat tgactagtta ttaatagtaa tcaattacgg ggtcattagt 2040
tcatagccca tatatggagt tccgcgttac ataacttacg gtaaatggcc cgcctggctg 2100
accgcccaac gacccccgcc cattgacgtc aataatgacg tatgttccca tagtaacgcc 2160
aatagggact ttccattgac gtcaatgggt ggagtattta cggtaaactg cccacttggc 2220
agtacatcaa gtgtatcata tgccaagtac gccccctatt gacgtcaatg acggtaaatg 2280
gcccgcctgg cattatgccc agtacatgac cttatgggac tttcctactt ggcagtacat 2340
ctacgtatta gtcatcgcta ttaccatggt cgaggtgagc cccacgttct gcttcactct 2400
ccccatctcc cccccctccc cacccccaat tttgtattta tttatttttt aattattttg 2460
tgcagcgatg ggggcggggg gggggggggg gcgaggggcg gggcggggcg aggcggagag 2520
gtgcggcggc agccaatcag agcggcgcgc tccgaaagtt tccttttatg gcgaggcggc 2580
ggcggcggcg gccctataaa aagcgaagcg cgcggcgggc gggagtcgct gcgcgctgcc 2640
ttcgccccgt gccccgctcc gccgccgcct cgcgccgccc gccccggctc tgactgaccg 2700
cgttactccc acaggtgagc gggcgggacg gcccttctcc tccgggctgt aattagcgct 2760
tggtttaatg acggcttgtt tcttttctgt ggctgcgtga aagccttgag gggctccggg 2820
agggcccttt gtgcgggggg agcggctcgg ggggtgcgtg cgtgtgtgtg tgcgtgggga 2880
gcgccgcgtg cggctccgcg ctgcccggcg gctgtgagcg ctgcgggcgc ggcgcggggc 2940
tttgtgcgct ccgcagtgtg cgcgagggga gcgcggccgg gggcggtgcc ccgcggtgga 3000
gtcgctgcgc gctgccttcg ccccgtgccc cgcgcggggg gggctgcgag gggaacaaag 3060
gctgcgtgcg gggtgtgtgc gtgggggggt gagcaggggg tgtgggcgcg tcggtcgggc 3120
tgcaaccccc cctgcacccc cctccccgag ttgctgagca cggcccggct tcgggtgcgg 3180
ggctccgtac ggggcgtggc gcggggctcg ccgtgccggg cggggggtgg cggcaggtgg 3240
gggtgccggg cggggcgggg ccgcctcggg ccggggaggg ctcgggggag gggcgcggcg 3300
gcccccggag cgccggcggc tgtcgaggcg cggcgagccg cagccattgc cttttatggt 3360
aatcgtgcga gagggcgcag ggacttcctt tgtcccaaat ctgtgcggag ccgaaatctg 3420
ggaggcgccg ccgcaccccc tctagcgggc gcggggcgaa gcggtgcggc gccggcagga 3480
aggaaatggg cggggagggc cttcgtgcgt cgccgcgccg ccgtcccctt ctccctctcc 3540
agcctcgggg ctgtccgcgg ggggacggct gccttcgggg gggacggggc agggcggggt 3600
tcggcttctg gcgtgtgacc ggcggctcta gagcctctgc taaccatgtt catgccttct 3660
tctttttcct acagctcctg ggcaacgtgc tggttattgt gctgtctcat cattttggca 3720
aagaattcta atacgactca ctatagggct taagggtacc gcgggcccgg gatccaccgg 3780
tcgccaccat gggtgatcat tatctggata ttcggctgag gcctgatcca gagttcccac 3840
ctgcgcagct gatgtctgtc ctttttggca aacttcatca ggccctggtt gcccagggcg 3900
gagatcggat aggggtaagc tttccagacc tcgacgaaag ccggagccgc ctgggagaac 3960
gcctgcggat ccacgcttct gccgacgatc tgagagcctt gctggcaagg ccatggcttg 4020
aggggctccg ggatcacctg cagtttggcg aacccgccgt tgttccccac ccaacccctt 4080
atcggcaggt gtctagagtg caggccaaat ctaatccaga acggctgcga cggcgactca 4140
tgcggcgaca tgatcttagc gaggaagagg cccgaaaaag aatccctgat accgtggccc 4200
gcgcccttga cttgcctttt gtcacactgc ggtcccagag tacggggcag catttcagac 4260
ttttcattcg acacgggcca ctgcaagtta ccgccgaaga aggaggcttt acttgttatg 4320
gactctccaa gggaggtttc gtgccctggt ttgagggcag aggaagtctg ttaacatgcg 4380
gtgacgtcga ggagaatcct ggcccaatgc ctaagaagaa gcggaaggtg agcagccaac 4440
ttgtgaagtc tgaactcgag gagaaaaaat cagagttgag acacaagttg aagtacgtgc 4500
cacacgaata catcgagctt atcgagatcg ccagaaacag tacccaggat aggatccttg 4560
agatgaaagt catggagttc tttatgaagg tctacggtta tagaggaaag caccttggcg 4620
gtagcagaaa gcccgatggc gccatctata ctgtcggatc tcctatcgat tatggggtga 4680
tcgtggatac caaagcttac tcaggcgggt acaacttgcc cataggacaa gccgacgaga 4740
tgcagcggta tgtcgaagag aaccagacgc gcaacaagca catcaacccc aatgaatggt 4800
ggaaagtgta cccaagtagt gtgactgagt tcaagttcct gtttgtctcc ggccacttta 4860
agggcaatta taaagctcag ctcactagac tcaatcacat cacaaactgc aacggagctg 4920
tgttgtcagt ggaggagctc ctgattggag gcgagatgat caaagccggc acccttacac 4980
tggaggaggt gcggcggaag ttcaacaatg gagagatcaa cttcggtggc ggtggatcca 5040
tgagcaagct ggagaagttt acaaactgct actccctgtc taagaccctg aggttcaagg 5100
ccatccctgt gggcaagacc caggagaaca tcgacaataa gcggctgctg gtggaggacg 5160
agaagagagc cgaggattat aagggcgtga agaagctgct ggatcgctac tatctgtctt 5220
ttatcaacga cgtgctgcac agcatcaagc tgaagaatct gaacaattac atcagcctgt 5280
tccggaagaa aaccagaacc gagaaggaga ataaggagct ggagaacctg gagatcaatc 5340
tgcggaagga gatcgccaag gccttcaagg gcaacgaggg ctacaagtcc ctgtttaaga 5400
aggatatcat cgagacaatc ctgccagagt tcctggacga taaggacgag atcgccctgg 5460
tgaacagctt caatggcttt accacagcct tcaccggctt ctttgataac agagagaata 5520
tgttttccga ggaggccaag agcacatcca tcgccttcag gtgtatcaac gagaatctga 5580
cccgctacat ctctaatatg gacatcttcg agaaggtgga cgccatcttt gataagcacg 5640
aggtgcagga gatcaaggag aagatcctga acagcgacta tgatgtggag gatttctttg 5700
agggcgagtt ctttaacttt gtgctgacac aggagggcat cgacgtgtat aacgccatca 5760
tcggcggctt cgtgaccgag agcggcgaga agatcaaggg cctgaacgag tacatcaacc 5820
tgtataatca gaaaaccaag cagaagctgc ctaagtttaa gccactgtat aagcaggtgc 5880
tgagcgatcg ggagtctctg agcttctacg gcgagggcta tacatccgat gaggaggtgc 5940
tggaggtgtt tagaaacacc ctgaacaaga acagcgagat cttcagctcc atcaagaagc 6000
tggagaagct gttcaagaat tttgacgagt actctagcgc cggcatcttt gtgaagaacg 6060
gccccgccat cagcacaatc tccaaggata tcttcggcga gtggaacgtg atccgggaca 6120
agtggaatgc cgagtatgac gatatccacc tgaagaagaa ggccgtggtg accgagaagt 6180
acgaggacga tcggagaaag tccttcaaga agatcggctc cttttctctg gagcagctgc 6240
aggagtacgc cgacgccgat ctgtctgtgg tggagaagct gaaggagatc atcatccaga 6300
aggtggatga gatctacaag gtgtatggct cctctgagaa gctgttcgac gccgattttg 6360
tgctggagaa gagcctgaag aagaacgacg ccgtggtggc catcatgaag gacctgctgg 6420
attctgtgaa gagcttcgag aattacatca aggccttctt tggcgagggc aaggagacaa 6480
acagggacga gtccttctat ggcgattttg tgctggccta cgacatcctg ctgaaggtgg 6540
accacatcta cgatgccatc cgcaattatg tgacccagaa gccctactct aaggataagt 6600
tcaagctgta ttttcagaac cctcagttca tgggcggctg ggacaaggat aaggagacag 6660
actatcgggc caccatcctg agatacggct ccaagtacta tctggccatc atggataaga 6720
agtacgccaa gtgcctgcag aagatcgaca aggacgatgt gaacggcaat tacgagaaga 6780
tcaactataa gctgctgccc ggccctaata agatgctgcc aaaggtgttc ttttctaaga 6840
agtggatggc ctactataac cccagcgagg acatccagaa gatctacaag aatggcacat 6900
tcaagaaggg cgatatgttt aacctgaatg actgtcacaa gctgatcgac ttctttaagg 6960
atagcatctc ccggtatcca aagtggtcca atgcctacga tttcaacttt tctgagacag 7020
agaagtataa ggacatcgcc ggcttttaca gagaggtgga ggagcagggc tataaggtga 7080
gcttcgagtc tgccagcaag aaggaggtgg ataagctggt ggaggagggc aagctgtata 7140
tgttccagat ctataacaag gacttttccg ataagtctca cggcacaccc aatctgcaca 7200
ccatgtactt caagctgctg tttgacgaga acaatcacgg acagatcagg ctgagcggag 7260
gagcagagct gttcatgagg cgcgcctccc tgaagaagga ggagctggtg gtgcacccag 7320
ccaactcccc tatcgccaac aagaatccag ataatcccaa gaaaaccaca accctgtcct 7380
acgacgtgta taaggataag aggttttctg aggaccagta cgagctgcac atcccaatcg 7440
ccatcaataa gtgccccaag aacatcttca agatcaatac agaggtgcgc gtgctgctga 7500
agcacgacga taacccctat gtgatcggca tcgcgagggg cgagcgcaat ctgctgtata 7560
tcgtggtggt ggacggcaag ggcaacatcg tggagcagta ttccctgaac gagatcatca 7620
acaacttcaa cggcatcagg atcaagacag attaccactc tctgctggac aagaaggaga 7680
aggagaggtt cgaggcccgc cagaactgga cctccatcga gaatatcaag gagctgaagg 7740
ccggctatat ctctcaggtg gtgcacaaga tctgcgagct ggtggagaag tacgatgccg 7800
tgatcgccct ggaggacctg aactctggct ttaagaatag ccgcgtgaag gtggagaagc 7860
aggtgtatca gaagttcgag aagatgctga tcgataagct gaactacatg gtggacaaga 7920
agtctaatcc ttgtgcaaca ggcggcgccc tgaagggcta tcagatcacc aataagttcg 7980
agagctttaa gtccatgtct acccagaacg gcttcatctt ttacatccct gcctggctga 8040
catccaagat cgatccatct accggctttg tgaacctgct gaaaaccaag tataccagca 8100
tcgccgattc caagaagttc atcagctcct ttgacaggat catgtacgtg cccgaggagg 8160
atctgttcga gtttgccctg gactataaga acttctctcg cacagacgcc gattacatca 8220
agaagtggaa gctgtactcc tacggcaacc ggatcagaat cttccggaat cctaagaaga 8280
acaacgtgtt cgactgggag gaggtgtgcc tgaccagcgc ctataaggag ctgttcaaca 8340
agtacggcat caattatcag cagggcgata tcagagccct gctgtgcgag cagtccgaca 8400
aggccttcta ctctagcttt atggccctga tgagcctgat gctgcagatg cggaacagca 8460
tcacaggccg caccgacgtg gattttctga tcagccctgt gaagaactcc gacggcatct 8520
tctacgatag ccggaactat gaggcccagg agaatgccat cctgccaaag aacgccgacg 8580
ccaatggcgc ctataacatc gccagaaagg tgctgtgggc catcggccag ttcaagaagg 8640
ccgaggacga gaagctggat aaggtgaaga tcgccatctc taacaaggag tggctggagt 8700
acgcccagac cagcgtgaag cacggatccc ccaagaagaa gaggaaagtc tcgagcgact 8760
acaaagacca tgacggtgat tataaagatc atgacatcga ttacaaggat gacgatgaca 8820
agtgaagcgg ccgcactcct caggtgcagg ctgcctatca gaaggtggtg gctggtgtgg 8880
ccaatgccct ggctcacaaa taccactgag atctttttcc ctctgccaaa aattatgggg 8940
acatcatgaa gccccttgag catctgactt ctggctaata aaggaaattt attttcattg 9000
caatagtgtg ttggaatttt ttgtgtctct cactcggaag gacatatggg agggcaaatc 9060
atttaaaaca tcagaatgag tatttggttt agagtttggc aacatatgcc catatgctgg 9120
ctgccatgaa caaaggttgg ctataaagag gtcatcagta tatgaaacag ccccctgctg 9180
tccattcctt attccataga aaagccttga cttgaggtta gatttttttt atattttgtt 9240
ttgtgttatt tttttcttta acatccctaa aattttcctt acatgtttta ctagccagat 9300
ttttcctcct ctcctgacta ctcccagtca tagctgtccc tcttctctta tggagatccc 9360
tcgacctgca gcccaagctt ggcgtaatca tggtcatagc tgtttcctgt gtgaaattgt 9420
tatccgctca caattccaca caacatacga gccggaagca taaagtgtaa agcctggggt 9480
gcctaatgag tgagctaact cacattaatt gcgttgcgct cactgcccgc tttccagtcg 9540
ggaaacctgt cgtgccagcg gatccgcatc tcaattagtc agcaaccata gtcccgcccc 9600
taactccgcc catcccgccc ctaactccgc ccagttccgc ccattctccg ccccatggct 9660
gactaatttt ttttatttat gcagaggccg aggccgcctc ggcctctgag ctattccaga 9720
agtagtgagg aggctttttt ggaggcctag gcttttgcaa aaagctaact tgtttattgc 9780
agcttataat ggttacaaat aaagcaatag catcacaaat ttcacaaata aagcattttt 9840
ttcactgcat tctagttgtg gtttgtccaa actcatcaat gtatcttatc atgtctggat 9900
ccgctgcatt aatgaatcgg ccaacgcgcg gggagaggcg gtttgcgtat tgggcgct 9958
<210> 2
<211> 2321
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
gacgtcgcta gctgtacaaa aaagcaggct ttaaaggaac caattcagtc gactggatcc 60
ggtaccaagg tcgggcagga agagggccta tttcccatga ttccttcata tttgcatata 120
cgatacaagg ctgttagaga gataattaga attaatttga ctgtaaacac aaagatatta 180
gtacaaaata cgtgacgtag aaagtaataa tttcttgggt agtttgcagt tttaaaatta 240
tgttttaaaa tggactatca tatgcttacc gtaacttgaa agtatttcga tttcttggct 300
ttatatatct tgtggaaagg acgaaacacc gttcactgcc gtataggcag aatttctact 360
aagtgtagat tctgtagtct attgccctaa ctcaatttct actaagtgta gattatctgc 420
agtaatgttc ctgctgaatt tctactaagt gtagatgttc actgccgtat aggcagagct 480
tgggccgctc gaggtacctc tctacatatg acatgtgagc aaaaggccag caaaaggcca 540
ggaaccgtaa aaaggccgcg ttgctggcgt ttttccatag gctccgcccc cctgacgagc 600
atcacaaaaa tcgacgctca agtcagaggt ggcgaaaccc gacaggacta taaagatacc 660
aggcgtttcc ccctggaagc tccctcgtgc gctctcctgt tccgaccctg ccgcttaccg 720
gatacctgtc cgcctttctc ccttcgggaa gcgtggcgct ttctcatagc tcacgctgta 780
ggtatctcag ttcggtgtag gtcgttcgct ccaagctggg ctgtgtgcac gaaccccccg 840
ttcagcccga ccgctgcgcc ttatccggta actatcgtct tgagtccaac ccggtaagac 900
acgacttatc gccactggca gcagccactg gtaacaggat tagcagagcg aggtatgtag 960
gcggtgctac agagttcttg aagtggtggc ctaactacgg ctacactaga agaacagtat 1020
ttggtatctg cgctctgctg aagccagtta ccttcggaaa aagagttggt agctcttgat 1080
ccggcaaaca aaccaccgct ggtagcggtg gtttttttgt ttgcaagcag cagattacgc 1140
gcagaaaaaa aggatctcaa gaagatcctt tgatcttttc tacggggtct gacgctcagt 1200
ggaacgaaaa ctcacgttaa gggattttgg tcatgagatt atcaaaaagg atcttcacct 1260
agatcctttt aaattaaaaa tgaagtttta aatcaatcta aagtatatat gagtaaactt 1320
ggtctgacag ttaccaatgc ttaatcagtg aggcacctat ctcagcgatc tgtctatttc 1380
gttcatccat agttgcctga ctccccgtcg tgtagataac tacgatacgg gagggcttac 1440
catctggccc cagtgctgca atgataccgc gagacccacg ctcaccggct ccagatttat 1500
cagcaataaa ccagccagcc ggaagggccg agcgcagaag tggtcctgca actttatccg 1560
cctccatcca gtctattaat tgttgccggg aagctagagt aagtagttcg ccagttaata 1620
gtttgcgcaa cgttgttgcc attgctacag gcatcgtggt gtcacgctcg tcgtttggta 1680
tggcttcatt cagctccggt tcccaacgat caaggcgagt tacatgatcc cccatgttgt 1740
gcaaaaaagc ggttagctcc ttcggtcctc cgatcgttgt cagaagtaag ttggccgcag 1800
tgttatcact catggttatg gcagcactgc ataattctct tactgtcatg ccatccgtaa 1860
gatgcttttc tgtgactggt gagtactcaa ccaagtcatt ctgagaatag tgtatgcggc 1920
gaccgagttg ctcttgcccg gcgtcaatac gggataatac cgcgccacat agcagaactt 1980
taaaagtgct catcattgga aaacgttctt cggggcgaaa actctcaagg atcttaccgc 2040
tgttgagatc cagttcgatg taacccactc gtgcacccaa ctgatcttca gcatctttta 2100
ctttcaccag cgtttctggg tgagcaaaaa caggaaggca aaatgccgca aaaaagggaa 2160
taagggcgac acggaaatgt tgaatactca tactcttcct ttttcaatat tattgaagca 2220
tttatcaggg ttattgtctc atgagcggat acatatttga atgtatttag aaaaataaac 2280
aaataggggt tccgcgcaca tttccccgaa aagtgccacc t 2321