CN112430622A

CN112430622A - FokI and dCpf1 fusion protein expression vector and site-directed gene editing method mediated by same

Info

Publication number: CN112430622A
Application number: CN202011155827.1A
Authority: CN
Inventors: 高波; 留汉文; 王赛赛; 王亚丽; 宋成义; 陈才; 王宵燕
Original assignee: Yangzhou University
Current assignee: Yangzhou University
Priority date: 2020-10-26
Filing date: 2020-10-26
Publication date: 2021-03-02

Abstract

The invention belongs to the field of gene editing, and in particular relates to a FokI and dCpf1 fusion protein expression vector and a site-directed gene editing method mediated by the same. Together, an efficient gRNA-mediated gene site-directed editing system was constructed. This method utilizes the site-directed action of dCpf1 and the cleavage action of the nuclease FokI, and the cleavage in the form of a dimer can improve the specificity of the gene editing system and achieve a lower off-target rate. Compared with Cas9 protein, Cpf1 protein has greater advantages to be used for gene editing and other operations. The present invention provides a more convenient and efficient fusion protein-mediated gene site-directed editing method.

Description

FokI and dCpf1 fusion protein expression vector and site-directed gene editing method mediated by same

Technical Field

The invention belongs to the field of gene editing, and particularly relates to a FokI and dCpf1 fusion protein expression vector and a mediated site-specific gene editing method thereof.

Background

In theory, site-specific endonucleases can be used to target a single site in the genome, and are useful for gene targeting in therapeutic and research applications. In many organisms, including mammals, site-specific endonucleases perform gene editing by stimulating NHEJ (non-homologous end joining) and HDR (homologous recombination), such as CRISPR/Cas9, Cpf1, and the like.

CRISPR/Cas is a defensive immune mechanism formed during the evolution of bacteria and archaea, and can be used to resist the invasion of viruses and foreign DNA. In a CRISPR/Cas system, where CRISPR is an abbreviation for regularly interspaced clustered short palindromic sequences (Cas), CRISPR-associated proteins (abbreviation for Cas). The CRISPR/Cas9 system has attracted much attention as a gene editing tool after 2013. The system is mainly composed of three parts, namely tracrRNA, CrRNA and Cas9 protein. TracrRNA and CrRNA form guide-RNA, which binds to the target site on the genome, while Cas9 protein not only has endonuclease action, but also can form a complex with guide-RNA to recognize the target sequence with high specificity, mainly due to 6 domains of Cas9 protein, namely RECI, RECII, Bridge Helix, PAM interaction domain, HNH and RuvC. RecI is the largest domain responsible for binding to guide-RNA; while the arginine-rich bridge helix domain is the critical part for Cas9 to begin cleavage after binding to the target DNA; HNH and RuvC domains are then the most predominant nuclease domains responsible for cleaving single-stranded DNA, and HNH and RuvC domains in Cas9 proteins are highly homologous to those found in other proteins; PAM interacting domain specifically recognizes PAM sequence and starts the combination of target DNA; finally, the function of the RECII domain is not well understood. It has been shown that in CRISPR system, guide-RNA is composed of a T-type single-stranded RNA, whose 5' end is a sequence complementary to the target site DNA. Binding of synthetic RNA to Cas9 protein changes the conformation of the protein, and the conformational change converts inactive protein to active form. The mechanism of this conformational change is not completely understood, but the Jinek et al scholars in the 2014 report speculated that this change may be due to spatial interactions or weak binding between protein side chains and RNA bases. So far, the application of the CRISPR/Cas9 system is very wide, for example, bacteria, yeast, arabidopsis thaliana, nematode, drosophila, zebrafish, mouse, rat, pig, cow, sheep, human and other species and cell lines of multiple species, but it is also found that the CRISPR/Cas9 system widely applied often has off-target effect, which causes great obstacle to the application of the system.

The catalytic amino acids of HNH and RuvC nuclease domains of Cas9 protein were mutated by gasituas et al 2012, resulting in a Cas9 protein with no cleavage activity, i.e., dCas9(D10A and H840A). The university of Gasiunas et al found that dCas9 has no cleavage activity but still has the property of specifically and efficiently binding double-stranded DNA. Therefore, John et al discovered in 2014 that fusion of dCas9 protein without splicing activity with Fok1 nuclease can improve the specificity of genome editing. Because of the dimeric nature required for Fok1 cleavage, the Fok1-dCas9(fCas9) system requires two fCas9 monomers, a pair of guide-RNAs recognizes the target site of the genome, a dCas9 fused to Fok1 recognizes the adjacent pair of RNAs, and finally, Fok1 is cleaved and then repaired by NHEJ or HDR for gene editing purposes. Gasiunas et al in 2012 found that the Cas9 nuclease domain can be mutated individually to form DNA "nickases", and after mutation, single-strand splicing with the same specificity as that of conventional CRISPR/Cas9 nuclease can be introduced, so that a single Cas9nickases can be used as a fully functional enzyme, while Mali et al in 2013 showed that a pair of Cas9nickases can be used as a gene editing tool for splicing double strands of genomic DNA. It was subsequently found in John et al that, when compared with the fCas9 system, Cas9nickases system and CRISPR/Cas9 system, the CRISPR/Cas9 system exhibited the highest editing activity, but also exhibited a high off-target rate. A pair of Cas9nickases can be used to cleave opposing strands adjacent to two target sites, creating an effective double strand break, and can induce a large number of modifications to the target DNA, reducing off-target effects due to greater constraint of the dimer; fusion of Fok1-dCas9 also requires dimers for cleavage, and also has higher restriction and specificity, so the off-target rate is theoretically lower than that of wild-type Cas 9. John et al found that the modified target site of fCas9 was > 140-fold more specific than wild-type Cas9, with similar efficiency to Cas9 nickases. However, through comparison, it is found that only one sgRNA of Cas9 nicases enables genome editing to occur, whereas two sgrnas of the fCas9 system must exist for genome editing to occur, so that the off-target rate of the fCas9 system is the lowest and the fCas is a good gene editing tool.

Disclosure of Invention

The present invention relates to the establishment of a FokI and dCpf1 fusion protein mediated site-directed gene editing method, and certain aspects of the invention disclose compositions and methods for improving the specificity of RNA-editable endonucleases (e.g., Cpf 1); meanwhile, the method relates to a construction method of a FokI-dCpf1 fusion vector, a Fok1-Cpf1 fusion vector and a CAG-Cpf1 fusion vector, and realizes site-specific integration in a biological genome under low off-target rate through gRNA site-specific guidance and dimer Fok1 shearing.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that:

the invention fuses dCpf1 protein inactivated by mutation in a CRISPR/Cpf1 editing system and FokI nuclease to construct a high-efficiency gRNA-mediated gene site-directed editing system. The method utilizes the site-directed action of dCpf1 and the shearing action of nuclease FokI, and can improve the specificity of the gene editing system and achieve lower off-target rate by shearing in a dimer form. Compared with the Cas9 protein, the Cpf1 protein has greater advantages in gene editing and other operations. Therefore, the invention provides a more convenient and efficient fusion protein-mediated gene site-directed editing method.

The purpose of the invention is realized by the following technical scheme:

a fusion protein expression vector, wherein the nucleic acid sequence of the fusion protein expression vector is shown as SEQ ID NO: 1 is shown.

Further, the fusion protein expression vector is a plasmid vector pFOKI-dCpf1 formed by fusing FokI endonuclease and Cpf1 without cleavage activity.

Further, the fusion protein expression vector comprises a CAG promoter, a Kozak sequence, two SV40 verification micro-signal NLS sequences, a FokI sequence, a Linker sequence and a dCpf1 sequence;

the sequence of the CAG promoter is the base from the 5' end 1989-3638 in SEQ ID NO. 1;

the Kozak sequence is 3747-3556 bases from the 5' end in SEQ ID NO. 1;

the sequence of FokI is the 4374-;

the Linker sequence is 4401-4988 th base from the 5' end in SEQ ID NO. 1;

the sequence of dCpf1 is base 5004-8687 from the 5' end in SEQ ID NO. 1;

the two NLS sequences are respectively located at the upstream of FokI and the downstream of dCpf1, and are respectively the 4374 th-4394 th base and the 8694 th-8714 th base from the 5' end in SEQ ID NO. 1.

Further, the primer sequences of the fusion protein expression vector are as follows:

a fusion protein expression vector construction method includes cloning dCpf1 to LB vector, constructing pLB-dCpf1 vector, utilizing BsmBI enzyme to cut pLB-dCpf1, and cutting back fragment dCpf 1; obtaining T2A-NLS and FokI fragments from a laboratory storage vector by a PCR method; T2A-NLS, FokI, dCpf1 fragment and CAG frame were then linked together using the gold gate cloning method, transformed into TOP10 cells, single clones were picked, plasmid was extracted and identified by electrophoresis.

The fusion protein expression vector is applied to gene editing.

A fusion protein mediated site-directed gene editing method is characterized in that a genomic target site is edited in a site-directed manner under the guidance of gRNA after a nuclease-free dCpf1 is fused with a nuclease FokI.

Further, the FokI-dCpf1 gene editing system includes: a fokl-dCpf 1 fusion protein expression plasmid to recognize grnas and cleave and a gRNA expression plasmid to direct site-directed binding of the fokl-dCpf 1 fusion protein.

A gRNA vector, the nucleic acid sequence of which is shown in SEQ ID NO. 2.

Further, the sequences of gRNA primers are as follows:

a FokI and dCpf1 fusion protein mediated site-directed gene editing method is characterized in that a nuclease FokI is fused with dCpf1 without nuclease activity, and then a genomic target site is edited in a site-directed manner under the guidance of gRNA. The FokI-dCpf1 gene editing system comprises: a fokl-dCpf 1 fusion protein expression plasmid to recognize grnas and cleave and a gRNA expression plasmid to direct site-directed binding of the fokl-dCpf 1 fusion protein.

Fusion proteins and dimers thereof, such as fusion proteins, comprise two domains: 1) a nuclease-free Cpf1 domain; 2) nuclease regions (e.g., the fokl monomers provide DNA cleavage regions). The dCpf1 is subjected to D832A mutation to cause a cutting defect, and after the dCpf1 is fused with FokI, shearing can be carried out under the condition of generating a dimer, so that the specificity of the system is improved to a greater extent.

The fokl-dCpf 1 fusion protein contains two Nuclear Localization Signal (NLS) domains that provide a mediating signal for the transport of the fusion protein into the nucleus.

A Linker sequence (GGGGS) is contained between the FokI nuclease and dCpf1 protein.

A gRNA expression plasmid is constructed on the basis of an original frame pSQT1313 plasmid, and contains two gRNA scaffold sequences (such as SEQ ID No.2) between the gRNAs, and the sequences can be recognized and cut by dCpf1 to form 2 gRNAs and recognize target-targeting sites, so that the specificity of gRNA combination is improved, and the occurrence of off-target is reduced to a certain extent.

Grnas designed for any genomic site of interest can be edited by the fokl-dCpf 1 gene editing system described above, including but not limited to single base substitutions, random small fragment insertion/deletion mutations, gene fragment substitutions, gene knock-ins, gene knockouts, and other genetic manipulations.

The invention also provides a method for site-directed integration of the pFak 1-dCpf1 vector in the genome of an organism; the method comprises the following steps:

1) the fusion protein of the invention binds to DNA, e.g., a nuclease-free dCpf1 domain binds to a fokl DNA cleavage domain, wherein the gRNA binds to a region of the genomic DNA of the organism and a nuclease-free dCpf1 binds to the gRNA; 2) the second fusion protein (fokl-dCpf 1) binds to the biological genomic DNA, e.g., the second gRNA binds to another region of the biological genomic DNA, while dCpf1 binds to the second gRNA; 3) the binding domains generated in the first and second steps will form dimers of the nuclease domain, so that cleavage of DNA will occur at the site where the fusion protein binds.

The invention also provides a construction method of a gRNA vector used in cooperation with the plasmid fusion vector, a pair of gRNAs are jointly constructed into a vector, and the nucleic acid sequence of the vector is shown as SEQ ID NO.2

The invention utilizes the characteristics of Cpf1, has larger shearing selection margin than Cas9, constructs vectors with different spacer lengths, takes pSQT1313-IGF2-spacer 53 as an example, but is not limited to the gRNA, and the nucleotide sequence of the gRNA is shown as SEQ ID NO. 2.

Advantageous effects

The invention uses a Cpf1 protein (also known as Cas12a) that is more potent than Cas 9. The Cpf1 protein is a new member found in CRISPR/Cas system, is derived from francisella lanuginosa (francisella novirida), has double cleavage activity, can cleave not only DNA but also RNA, and in addition to Cas9, these bacteria also cleave foreign DNA using Cpf 1. Compared with the CRISPR/Cas9 system, the CRISPR/Cpf1 system has five advantages: 1) the Cpf1 protein requires only one RNA molecule for assistance (CrRNA), whereas Cas9 requires two RNA molecules for assistance (TracrRNA and CrRNA); 2) the molecular weight of the Cpf1 enzyme is smaller than that of Cas9, and the Cpf1 enzyme can enter cells more easily, so the editing success rate is higher; 3) the different recognition sequences of the Cpf1 system make the system more site-specific efficient; 4) the position of cleavage is different from Cas9, allowing greater latitude; 5) cpf1 cleavage creates sticky ends, favoring insertion of new DNA sequences, while Cas9 creates blunt ends. Compared with Cas9, the CRISPR/Cpf1 system is simpler and more accurate, has lower off-target rate and has better application prospect in gene editing. At present, the research on aspects such as gene editing is more and more important, and a set of gene editing tools with accurate fixed points, good specificity and low off-target rate is more urgent. In view of the advantages of Cpf1, the FokI-Cpf1 editing system constructed by the invention can better meet the current requirements.

The invention fuses Cpf1(dCpf1) with lost cleavage activity and Fok1, and aims to generate a tool which is more efficient, more convenient and faster than fCas9, has lower off-target rate and is more beneficial to biological gene editing. The invention constructs a fusion expression system of Fok1 and cleavage-defective Cas enzyme (Cpf1) by an in vivo DNA recombination technology, the fusion protein expressed by the system is specifically combined at a target site under the guidance of crRNA to form a dimer complex, and the dimer complex is cut at the target site at a fixed point; or under the combined action of a donor plasmid carrying a foreign gene, a fusion protein expression vector (pFak 1-Cpf1) and a gRNA expression vector, after the specific site of a dimer is formed, the foreign gene is inserted under the HDR (homologous recombination) action, so that the high-efficiency site-specific knock-in of a large gene fragment is realized, and meanwhile, the system removes the cutting action of Cpf1, and can cut only under the condition of dimer formation, so that the integration specificity of the Fok1-Cpf1 fusion vector is greatly improved, and the off-target rate of the fusion vector is reduced by recognizing different PAM sequences compared with Fok1-Cas 9. The method for site-directed mediated gene editing can be used as single base substitution, random small fragment insertion/deletion mutation, gene fragment substitution, gene knock-in, and gene knock-out, but is not limited thereto.

The invention is used as a gene editing tool with high efficiency and low off-target rate, and has great application potential in the fields of gene editing, gene therapy, gene function research and the like.

Drawings

FIG. 1: dCpf1PCR amplification product;

FIG. 2: pLB-dCpf1 plasmid;

FIG. 3: the pCAG-FokI-dCpf1 plasmid;

FIG. 4: plasmid map of pCAG-FokI-dCpf 1;

FIG. 5: pSQT1313-IGF2-gRNA-spacer7 plasmid;

FIG. 6: pSQT1313-IGF2-gRNA-spacer16, plasmid 40;

FIG. 7: pSQT1313-IGF2-gRNA-spacer18 plasmid;

FIG. 8: pSQT1313-IGF2-gRNA-spacer21 plasmid;

FIG. 9: pSQT1313-IGF2-gRNA-spacer53 plasmid;

FIG. 10: pSQT1313-IGF2-gRNA-spacer53 plasmid map;

FIG. 11: PCR amplification of each target site;

FIG. 12: cleavage map of each target site T7;

FIG. 13: pLB-spacer53 plasmid;

FIG. 14: alignment chart of sequencing result of spacer53 target site.

Detailed Description

Example 1

The experimental methods mentioned in the following examples are conventional methods unless otherwise specified; practice of the invention is not limited thereto.

Example I, pFok1 construction and application of dCpf1 fusion protein expression vector

1. pLB-dCpf1 vector construction

dCpf1 was PCR amplified from commercial vector WN10151(Addgene plasmid #53369) (see FIG. 1) and cloned into LB vector to construct pLB-dCpf1 vector, and plasmid DNA was detected by 1% agarose gel electrophoresis (see FIG. 2).

2. Construction of pFOKI-dCpf1 fusion protein expression vector

Obtained by digesting pLB-dCpf1 with BsmBI enzyme, and cutting back a fragment dCpf 1; the commercial vector pSQT1601(4849bp, 5474bp) was double-cut with both Acc65I and Not1 enzymes, and the CAG framework 4849bp was recovered. T2A-NLS, FokI fragment PCR amplified from a laboratory storage vector (primer sequences as in Table 1); the T2A-NLS, FokI, dCpf1 fragments and CAG framework were then ligated together using the gold-gated cloning method, reacted for 30min at 25 ℃ with T7 ligase, transformed into TOP10 cells, single clones were picked, and the plasmids were extracted and then identified by electrophoresis (see FIG. 3), and the suspected correct bands were sent to the company for sequencing. The correct vector was designated as pCAG-FokI-dCpf1 vector, and the plasmid map is shown in FIG. 4.

TABLE 1 FokI-dCpf1 vector construction primers

3. Construction of p SQT1313-gRNA vector

The present invention uses IGF2 gene as an example to verify the effect of the editing system, but is not limited to this gene. The gRNA was designed on the IGF2 first intron using the website CRISPRscan (https:// www.crisprscan.org /), PAM selected to be TTTV. The invention designs 6 pairs of gRNAs (as shown in Table 2) according to different spacer lengths (7bp, 16bp, 18bp, 21bp, 40bp and 53bp respectively), verifies the gRNAs, synthesizes gRNA primers (as shown in Table 3) with different spacer sequence lengths, dilutes the primers to 100 mu M, anneals the upstream and downstream primers to form double-stranded oligos, and phosphorylates the oligos by using T4PNK, wherein the reaction conditions are as follows: 2 mu L of gRNA-F; gRNA-R2 μ L; 1 mu L of T4PNK enzyme; 2X T7 ligase buffer (containing ATP) 5. mu.L. Reaction procedure: 30min at 37 ℃, 5min at 95 ℃ and ramp down to 25 ℃. After the reaction was completed, the annealed product was diluted 200-fold.

The pSQT1313(Addgene plasmid #53370) vector was digested with BsmB1 to obtain a framework containing Cpf1scaffold (AATTTCTACTAAGTGTAGAT). Connecting IGF2-gRNA (different spacer lengths are respectively 7bp, 16bp, 18bp, 21bp, 40bp and 53bp) annealing products with a SQT1313 framework, reacting for 1h at 25 ℃ by using T7 ligase, transforming into TOP10 cells, picking out a single clone, carrying out electrophoretic identification after plasmid extraction (as shown in figures 5-9), and sending a suspected correct band to a company for sequencing identification. The correct vector was named pSQT1313-IGF2-gRNA-spacer (modified according to different spacer lengths, e.g., pSQT1313-IGF2-gRNA-spacer53), and the plasmid map was only exemplified by spacer53 (see FIG. 10).

TABLE 2 gRNA sequences

TABLE 3 gRNA primer sequences

4. In this example, IGF2 was used as a target gene, and a mouse myoblast C2C12 was used as a model to successfully edit a target gene target sequence.

1) Recovery and culture of cryopreserved cells

pCAG-FokI-dCpf1 and pSQT1313-IGF2-gRNA-spacer plasmids were extracted using an OMEGA endotoxin-free plasmid extraction kit (purchased from OMEGA), and the final concentration of the product was adjusted to 500ng/ul for cell transfection.

Taking out the cryopreservation tube (cells preserved in the laboratory) filled with the mouse myoblast C2C12 from the liquid nitrogen, immediately putting into warm water at 37-40 ℃ and rapidly shaking until the cryopreservation liquid is completely thawed; completing rewarming within 1-2 min; transferring the cell suspension into a sterile centrifuge tube, adding 5mL of culture solution, and gently and uniformly blowing; centrifuging the cell suspension at 800-; adding 1mL of complete culture medium into a centrifuge tube containing the cell sediment, gently and uniformly blowing, transferring the cell suspension into a cell culture bottle, and adding a proper amount of complete culture medium for culture.

2) Cell transfection

The mouse myoblast C2C12 was divided into 6 groups, each group was transfected with 500ng of pCAG-FokI-dCpf1 and 500ng of pSQT1313-IGF2-gRNA-spacer (7bp, 16bp, 18bp, 21bp, 40bp, 53bp) plasmid, 3 replicates per group.

24h before transfection, 6-well plates were seeded at 3X 105 cells/well in 2000. mu.L MEM high-sugar medium (purchased from GIBCO) containing 10% fetal bovine serum (purchased from GIBCO) per well of six-well plates to achieve about 70-80% confluence before transfection; and (3) mixing 2 plasmids according to the proportion of 1: 1 mass ratio was mixed and diluted in 100. mu.L of Opti-MEM (purchased from GIBCO Co.) medium and gently mixed; add 3. mu.L of LFUGENE transfection reagent (from Promega) to 100. mu.L of Opti-MEM medium without serum and antibiotics and mix gently, incubate for 5min at room temperature; after 5min, 100. mu.L of each transfection reagent diluent was added to 100. mu.L of each DNA set diluent, gently mixed and left at room temperature for 20 min; add 200. mu.L of the mixture to the prepared wells, gently shake the plate back and forth, incubate it at 37 ℃ with saturated humidity and 5% CO2, replace the transfection medium with complete medium after 4h, collect the cells for use after complete incubation for 48 h.

3) T7EN enzyme digestion detection assay

Cells transfected for 48h were collected and the cell genome was extracted using the TIANGEN genome extraction kit. The corresponding target sites were amplified by PCR using each gRNA primer in table 4, and the amplification results are shown in fig. 11. The PCR product was subsequently purified using a Takara DNA purification kit for subsequent enzyme digestion identification assays. The total number of 7 groups are respectively a Spacer length7 group, a Spacer length16 group, a Spacer length 18 group, a Spacer length 2 group, a Spacer length40 group, a Spacer length53 group and a Positive control group. DNA denaturation System: DNA was purified at the 250ng target site, NEB Buffer 22. mu.L. The procedure is as follows: 5min at 95 ℃, 1min at 95 ℃, ramp at 65 ℃ (0.2 ℃/s)45sec, 45sec at 72 ℃, 19 cycles; 1min at 94 ℃, 45sec at 55 ℃,10 cycles; 10min at 72 ℃. Mu. L T7 endonuclease (10U/. mu.l, NEB) was added to the denatured DNA at 37 ℃ for 15 min. The cleavage products were detected by electrophoresis on a 2% agarose gel, and the results are shown in FIG. 12.

TABLE 4 spacer53 gRNA primer sequences for target sites

The result shows that incomplete paired DNA can be recognized and cut by the T7 endonuclease 1, so that the FokI-dCpf1 system utilizes specific gRNA, after the target site is positioned and cut, the non-homologous end connection of the organism genome can be generated, and the incomplete paired DNA sequence is generated, so that the target site can be cut by T7EN1 enzyme, and the incomplete paired DNA sequence can not be cut if the incomplete paired DNA sequence is not edited. From FIG. 12 and FIG. 13, two or more bands were generated in the five groups of spacer7, 18, 53, 40 and Positive control, indicating that the FokI-Cpf1 system in which four pairs of gRNAs were guided by 7, 18, 53 and 40 was spliced in C2C12 cells, indicating that the gene editing tool was effective in this system.

A pair of gRNA mediated FokI-dCpf1 systems of spacer53 were found to have a number of mutations at the target site by Sanger sequencing, specifically: cells were harvested 48h after co-transfection of C2C12 cells with 500ng pCAG-FokI-dCpf1 and 500ng pSQT1313-IGF2-gRNA-spacer 53. And extracting cell genome by using a TIAGEN genome extraction kit. PCR amplification was performed using the target site primers given in Table 3, followed by TA-cloning, ligation to LB vector, transformation to TOP10 cells, selection of single clones, electrophoresis identification after plasmid extraction as shown in FIG. 13, sequencing by Dagaku Bio Inc., and sequencing results as shown in FIG. 14, and multiple mutations at the target site in the C2C12 genome were observed. The results indicate that the FokI-dCpf1 system is an effective gene editing tool.

Sequence listing

<110> Yangzhou university

<120> FokI and dCpf1 fusion protein expression vector and its mediated site-directed gene editing method

<160> 2

<170> SIPOSequenceListing 1.0

<210> 1

<211> 9958

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 1

cattaatgaa tcggccaacg cgcggggaga ggcggtttgc gtattgggcg ctcttccgct 60

tcctcgctca ctgactcgct gcgctcggtc gttcggctgc ggcgagcggt atcagctcac 120

tcaaaggcgg taatacggtt atccacagaa tcaggggata acgcaggaaa gaacatgtga 180

gcaaaaggcc agcaaaaggc caggaaccgt aaaaaggccg cgttgctggc gtttttccat 240

aggctccgcc cccctgacga gcatcacaaa aatcgacgct caagtcagag gtggcgaaac 300

ccgacaggac tataaagata ccaggcgttt ccccctggaa gctccctcgt gcgctctcct 360

gttccgaccc tgccgcttac cggatacctg tccgcctttc tcccttcggg aagcgtggcg 420

ctttctcata gctcacgctg taggtatctc agttcggtgt aggtcgttcg ctccaagctg 480

ggctgtgtgc acgaaccccc cgttcagccc gaccgctgcg ccttatccgg taactatcgt 540

cttgagtcca acccggtaag acacgactta tcgccactgg cagcagccac tggtaacagg 600

attagcagag cgaggtatgt aggcggtgct acagagttct tgaagtggtg gcctaactac 660

ggctacacta gaagaacagt atttggtatc tgcgctctgc tgaagccagt taccttcgga 720

aaaagagttg gtagctcttg atccggcaaa caaaccaccg ctggtagcgg tggttttttt 780

gtttgcaagc agcagattac gcgcagaaaa aaaggatctc aagaagatcc tttgatcttt 840

tctacggggt ctgacgctca gtggaacgaa aactcacgtt aagggatttt ggtcatgaga 900

ttatcaaaaa ggatcttcac ctagatcctt ttaaattaaa aatgaagttt taaatcaatc 960

taaagtatat atgagtaaac ttggtctgac agttaccaat gcttaatcag tgaggcacct 1020

atctcagcga tctgtctatt tcgttcatcc atagttgcct gactccccgt cgtgtagata 1080

actacgatac gggagggctt accatctggc cccagtgctg caatgatacc gcgagaccca 1140

cgctcaccgg ctccagattt atcagcaata aaccagccag ccggaagggc cgagcgcaga 1200

agtggtcctg caactttatc cgcctccatc cagtctatta attgttgccg ggaagctaga 1260

gtaagtagtt cgccagttaa tagtttgcgc aacgttgttg ccattgctac aggcatcgtg 1320

gtgtcacgct cgtcgtttgg tatggcttca ttcagctccg gttcccaacg atcaaggcga 1380

gttacatgat cccccatgtt gtgcaaaaaa gcggttagct ccttcggtcc tccgatcgtt 1440

gtcagaagta agttggccgc agtgttatca ctcatggtta tggcagcact gcataattct 1500

cttactgtca tgccatccgt aagatgcttt tctgtgactg gtgagtactc aaccaagtca 1560

ttctgagaat agtgtatgcg gcgaccgagt tgctcttgcc cggcgtcaat acgggataat 1620

accgcgccac atagcagaac tttaaaagtg ctcatcattg gaaaacgttc ttcggggcga 1680

aaactctcaa ggatcttacc gctgttgaga tccagttcga tgtaacccac tcgtgcaccc 1740

aactgatctt cagcatcttt tactttcacc agcgtttctg ggtgagcaaa aacaggaagg 1800

caaaatgccg caaaaaaggg aataagggcg acacggaaat gttgaatact catactcttc 1860

ctttttcaat attattgaag catttatcag ggttattgtc tcatgagcgg atacatattt 1920

gaatgtattt agaaaaataa acaaataggg gttccgcgca catttccccg aaaagtgcca 1980

cctgggtcga cattgattat tgactagtta ttaatagtaa tcaattacgg ggtcattagt 2040

tcatagccca tatatggagt tccgcgttac ataacttacg gtaaatggcc cgcctggctg 2100

accgcccaac gacccccgcc cattgacgtc aataatgacg tatgttccca tagtaacgcc 2160

aatagggact ttccattgac gtcaatgggt ggagtattta cggtaaactg cccacttggc 2220

agtacatcaa gtgtatcata tgccaagtac gccccctatt gacgtcaatg acggtaaatg 2280

gcccgcctgg cattatgccc agtacatgac cttatgggac tttcctactt ggcagtacat 2340

ctacgtatta gtcatcgcta ttaccatggt cgaggtgagc cccacgttct gcttcactct 2400

ccccatctcc cccccctccc cacccccaat tttgtattta tttatttttt aattattttg 2460

tgcagcgatg ggggcggggg gggggggggg gcgaggggcg gggcggggcg aggcggagag 2520

gtgcggcggc agccaatcag agcggcgcgc tccgaaagtt tccttttatg gcgaggcggc 2580

ggcggcggcg gccctataaa aagcgaagcg cgcggcgggc gggagtcgct gcgcgctgcc 2640

ttcgccccgt gccccgctcc gccgccgcct cgcgccgccc gccccggctc tgactgaccg 2700

cgttactccc acaggtgagc gggcgggacg gcccttctcc tccgggctgt aattagcgct 2760

tggtttaatg acggcttgtt tcttttctgt ggctgcgtga aagccttgag gggctccggg 2820

agggcccttt gtgcgggggg agcggctcgg ggggtgcgtg cgtgtgtgtg tgcgtgggga 2880

gcgccgcgtg cggctccgcg ctgcccggcg gctgtgagcg ctgcgggcgc ggcgcggggc 2940

tttgtgcgct ccgcagtgtg cgcgagggga gcgcggccgg gggcggtgcc ccgcggtgga 3000

gtcgctgcgc gctgccttcg ccccgtgccc cgcgcggggg gggctgcgag gggaacaaag 3060

gctgcgtgcg gggtgtgtgc gtgggggggt gagcaggggg tgtgggcgcg tcggtcgggc 3120

tgcaaccccc cctgcacccc cctccccgag ttgctgagca cggcccggct tcgggtgcgg 3180

ggctccgtac ggggcgtggc gcggggctcg ccgtgccggg cggggggtgg cggcaggtgg 3240

gggtgccggg cggggcgggg ccgcctcggg ccggggaggg ctcgggggag gggcgcggcg 3300

gcccccggag cgccggcggc tgtcgaggcg cggcgagccg cagccattgc cttttatggt 3360

aatcgtgcga gagggcgcag ggacttcctt tgtcccaaat ctgtgcggag ccgaaatctg 3420

ggaggcgccg ccgcaccccc tctagcgggc gcggggcgaa gcggtgcggc gccggcagga 3480

aggaaatggg cggggagggc cttcgtgcgt cgccgcgccg ccgtcccctt ctccctctcc 3540

agcctcgggg ctgtccgcgg ggggacggct gccttcgggg gggacggggc agggcggggt 3600

tcggcttctg gcgtgtgacc ggcggctcta gagcctctgc taaccatgtt catgccttct 3660

tctttttcct acagctcctg ggcaacgtgc tggttattgt gctgtctcat cattttggca 3720

aagaattcta atacgactca ctatagggct taagggtacc gcgggcccgg gatccaccgg 3780

tcgccaccat gggtgatcat tatctggata ttcggctgag gcctgatcca gagttcccac 3840

ctgcgcagct gatgtctgtc ctttttggca aacttcatca ggccctggtt gcccagggcg 3900

gagatcggat aggggtaagc tttccagacc tcgacgaaag ccggagccgc ctgggagaac 3960

gcctgcggat ccacgcttct gccgacgatc tgagagcctt gctggcaagg ccatggcttg 4020

aggggctccg ggatcacctg cagtttggcg aacccgccgt tgttccccac ccaacccctt 4080

atcggcaggt gtctagagtg caggccaaat ctaatccaga acggctgcga cggcgactca 4140

tgcggcgaca tgatcttagc gaggaagagg cccgaaaaag aatccctgat accgtggccc 4200

gcgcccttga cttgcctttt gtcacactgc ggtcccagag tacggggcag catttcagac 4260

ttttcattcg acacgggcca ctgcaagtta ccgccgaaga aggaggcttt acttgttatg 4320

gactctccaa gggaggtttc gtgccctggt ttgagggcag aggaagtctg ttaacatgcg 4380

gtgacgtcga ggagaatcct ggcccaatgc ctaagaagaa gcggaaggtg agcagccaac 4440

ttgtgaagtc tgaactcgag gagaaaaaat cagagttgag acacaagttg aagtacgtgc 4500

cacacgaata catcgagctt atcgagatcg ccagaaacag tacccaggat aggatccttg 4560

agatgaaagt catggagttc tttatgaagg tctacggtta tagaggaaag caccttggcg 4620

gtagcagaaa gcccgatggc gccatctata ctgtcggatc tcctatcgat tatggggtga 4680

tcgtggatac caaagcttac tcaggcgggt acaacttgcc cataggacaa gccgacgaga 4740

tgcagcggta tgtcgaagag aaccagacgc gcaacaagca catcaacccc aatgaatggt 4800

ggaaagtgta cccaagtagt gtgactgagt tcaagttcct gtttgtctcc ggccacttta 4860

agggcaatta taaagctcag ctcactagac tcaatcacat cacaaactgc aacggagctg 4920

tgttgtcagt ggaggagctc ctgattggag gcgagatgat caaagccggc acccttacac 4980

tggaggaggt gcggcggaag ttcaacaatg gagagatcaa cttcggtggc ggtggatcca 5040

tgagcaagct ggagaagttt acaaactgct actccctgtc taagaccctg aggttcaagg 5100

ccatccctgt gggcaagacc caggagaaca tcgacaataa gcggctgctg gtggaggacg 5160

agaagagagc cgaggattat aagggcgtga agaagctgct ggatcgctac tatctgtctt 5220

ttatcaacga cgtgctgcac agcatcaagc tgaagaatct gaacaattac atcagcctgt 5280

tccggaagaa aaccagaacc gagaaggaga ataaggagct ggagaacctg gagatcaatc 5340

tgcggaagga gatcgccaag gccttcaagg gcaacgaggg ctacaagtcc ctgtttaaga 5400

aggatatcat cgagacaatc ctgccagagt tcctggacga taaggacgag atcgccctgg 5460

tgaacagctt caatggcttt accacagcct tcaccggctt ctttgataac agagagaata 5520

tgttttccga ggaggccaag agcacatcca tcgccttcag gtgtatcaac gagaatctga 5580

cccgctacat ctctaatatg gacatcttcg agaaggtgga cgccatcttt gataagcacg 5640

aggtgcagga gatcaaggag aagatcctga acagcgacta tgatgtggag gatttctttg 5700

agggcgagtt ctttaacttt gtgctgacac aggagggcat cgacgtgtat aacgccatca 5760

tcggcggctt cgtgaccgag agcggcgaga agatcaaggg cctgaacgag tacatcaacc 5820

tgtataatca gaaaaccaag cagaagctgc ctaagtttaa gccactgtat aagcaggtgc 5880

tgagcgatcg ggagtctctg agcttctacg gcgagggcta tacatccgat gaggaggtgc 5940

tggaggtgtt tagaaacacc ctgaacaaga acagcgagat cttcagctcc atcaagaagc 6000

tggagaagct gttcaagaat tttgacgagt actctagcgc cggcatcttt gtgaagaacg 6060

gccccgccat cagcacaatc tccaaggata tcttcggcga gtggaacgtg atccgggaca 6120

agtggaatgc cgagtatgac gatatccacc tgaagaagaa ggccgtggtg accgagaagt 6180

acgaggacga tcggagaaag tccttcaaga agatcggctc cttttctctg gagcagctgc 6240

aggagtacgc cgacgccgat ctgtctgtgg tggagaagct gaaggagatc atcatccaga 6300

aggtggatga gatctacaag gtgtatggct cctctgagaa gctgttcgac gccgattttg 6360

tgctggagaa gagcctgaag aagaacgacg ccgtggtggc catcatgaag gacctgctgg 6420

attctgtgaa gagcttcgag aattacatca aggccttctt tggcgagggc aaggagacaa 6480

acagggacga gtccttctat ggcgattttg tgctggccta cgacatcctg ctgaaggtgg 6540

accacatcta cgatgccatc cgcaattatg tgacccagaa gccctactct aaggataagt 6600

tcaagctgta ttttcagaac cctcagttca tgggcggctg ggacaaggat aaggagacag 6660

actatcgggc caccatcctg agatacggct ccaagtacta tctggccatc atggataaga 6720

agtacgccaa gtgcctgcag aagatcgaca aggacgatgt gaacggcaat tacgagaaga 6780

tcaactataa gctgctgccc ggccctaata agatgctgcc aaaggtgttc ttttctaaga 6840

agtggatggc ctactataac cccagcgagg acatccagaa gatctacaag aatggcacat 6900

tcaagaaggg cgatatgttt aacctgaatg actgtcacaa gctgatcgac ttctttaagg 6960

atagcatctc ccggtatcca aagtggtcca atgcctacga tttcaacttt tctgagacag 7020

agaagtataa ggacatcgcc ggcttttaca gagaggtgga ggagcagggc tataaggtga 7080

gcttcgagtc tgccagcaag aaggaggtgg ataagctggt ggaggagggc aagctgtata 7140

tgttccagat ctataacaag gacttttccg ataagtctca cggcacaccc aatctgcaca 7200

ccatgtactt caagctgctg tttgacgaga acaatcacgg acagatcagg ctgagcggag 7260

gagcagagct gttcatgagg cgcgcctccc tgaagaagga ggagctggtg gtgcacccag 7320

ccaactcccc tatcgccaac aagaatccag ataatcccaa gaaaaccaca accctgtcct 7380

acgacgtgta taaggataag aggttttctg aggaccagta cgagctgcac atcccaatcg 7440

ccatcaataa gtgccccaag aacatcttca agatcaatac agaggtgcgc gtgctgctga 7500

agcacgacga taacccctat gtgatcggca tcgcgagggg cgagcgcaat ctgctgtata 7560

tcgtggtggt ggacggcaag ggcaacatcg tggagcagta ttccctgaac gagatcatca 7620

acaacttcaa cggcatcagg atcaagacag attaccactc tctgctggac aagaaggaga 7680

aggagaggtt cgaggcccgc cagaactgga cctccatcga gaatatcaag gagctgaagg 7740

ccggctatat ctctcaggtg gtgcacaaga tctgcgagct ggtggagaag tacgatgccg 7800

tgatcgccct ggaggacctg aactctggct ttaagaatag ccgcgtgaag gtggagaagc 7860

aggtgtatca gaagttcgag aagatgctga tcgataagct gaactacatg gtggacaaga 7920

agtctaatcc ttgtgcaaca ggcggcgccc tgaagggcta tcagatcacc aataagttcg 7980

agagctttaa gtccatgtct acccagaacg gcttcatctt ttacatccct gcctggctga 8040

catccaagat cgatccatct accggctttg tgaacctgct gaaaaccaag tataccagca 8100

tcgccgattc caagaagttc atcagctcct ttgacaggat catgtacgtg cccgaggagg 8160

atctgttcga gtttgccctg gactataaga acttctctcg cacagacgcc gattacatca 8220

agaagtggaa gctgtactcc tacggcaacc ggatcagaat cttccggaat cctaagaaga 8280

acaacgtgtt cgactgggag gaggtgtgcc tgaccagcgc ctataaggag ctgttcaaca 8340

agtacggcat caattatcag cagggcgata tcagagccct gctgtgcgag cagtccgaca 8400

aggccttcta ctctagcttt atggccctga tgagcctgat gctgcagatg cggaacagca 8460

tcacaggccg caccgacgtg gattttctga tcagccctgt gaagaactcc gacggcatct 8520

tctacgatag ccggaactat gaggcccagg agaatgccat cctgccaaag aacgccgacg 8580

ccaatggcgc ctataacatc gccagaaagg tgctgtgggc catcggccag ttcaagaagg 8640

ccgaggacga gaagctggat aaggtgaaga tcgccatctc taacaaggag tggctggagt 8700

acgcccagac cagcgtgaag cacggatccc ccaagaagaa gaggaaagtc tcgagcgact 8760

acaaagacca tgacggtgat tataaagatc atgacatcga ttacaaggat gacgatgaca 8820

agtgaagcgg ccgcactcct caggtgcagg ctgcctatca gaaggtggtg gctggtgtgg 8880

ccaatgccct ggctcacaaa taccactgag atctttttcc ctctgccaaa aattatgggg 8940

acatcatgaa gccccttgag catctgactt ctggctaata aaggaaattt attttcattg 9000

caatagtgtg ttggaatttt ttgtgtctct cactcggaag gacatatggg agggcaaatc 9060

atttaaaaca tcagaatgag tatttggttt agagtttggc aacatatgcc catatgctgg 9120

ctgccatgaa caaaggttgg ctataaagag gtcatcagta tatgaaacag ccccctgctg 9180

tccattcctt attccataga aaagccttga cttgaggtta gatttttttt atattttgtt 9240

ttgtgttatt tttttcttta acatccctaa aattttcctt acatgtttta ctagccagat 9300

ttttcctcct ctcctgacta ctcccagtca tagctgtccc tcttctctta tggagatccc 9360

tcgacctgca gcccaagctt ggcgtaatca tggtcatagc tgtttcctgt gtgaaattgt 9420

tatccgctca caattccaca caacatacga gccggaagca taaagtgtaa agcctggggt 9480

gcctaatgag tgagctaact cacattaatt gcgttgcgct cactgcccgc tttccagtcg 9540

ggaaacctgt cgtgccagcg gatccgcatc tcaattagtc agcaaccata gtcccgcccc 9600

taactccgcc catcccgccc ctaactccgc ccagttccgc ccattctccg ccccatggct 9660

gactaatttt ttttatttat gcagaggccg aggccgcctc ggcctctgag ctattccaga 9720

agtagtgagg aggctttttt ggaggcctag gcttttgcaa aaagctaact tgtttattgc 9780

agcttataat ggttacaaat aaagcaatag catcacaaat ttcacaaata aagcattttt 9840

ttcactgcat tctagttgtg gtttgtccaa actcatcaat gtatcttatc atgtctggat 9900

ccgctgcatt aatgaatcgg ccaacgcgcg gggagaggcg gtttgcgtat tgggcgct 9958

<210> 2

<211> 2321

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 2

gacgtcgcta gctgtacaaa aaagcaggct ttaaaggaac caattcagtc gactggatcc 60

ggtaccaagg tcgggcagga agagggccta tttcccatga ttccttcata tttgcatata 120

cgatacaagg ctgttagaga gataattaga attaatttga ctgtaaacac aaagatatta 180

gtacaaaata cgtgacgtag aaagtaataa tttcttgggt agtttgcagt tttaaaatta 240

tgttttaaaa tggactatca tatgcttacc gtaacttgaa agtatttcga tttcttggct 300

ttatatatct tgtggaaagg acgaaacacc gttcactgcc gtataggcag aatttctact 360

aagtgtagat tctgtagtct attgccctaa ctcaatttct actaagtgta gattatctgc 420

agtaatgttc ctgctgaatt tctactaagt gtagatgttc actgccgtat aggcagagct 480

tgggccgctc gaggtacctc tctacatatg acatgtgagc aaaaggccag caaaaggcca 540

ggaaccgtaa aaaggccgcg ttgctggcgt ttttccatag gctccgcccc cctgacgagc 600

atcacaaaaa tcgacgctca agtcagaggt ggcgaaaccc gacaggacta taaagatacc 660

aggcgtttcc ccctggaagc tccctcgtgc gctctcctgt tccgaccctg ccgcttaccg 720

gatacctgtc cgcctttctc ccttcgggaa gcgtggcgct ttctcatagc tcacgctgta 780

ggtatctcag ttcggtgtag gtcgttcgct ccaagctggg ctgtgtgcac gaaccccccg 840

ttcagcccga ccgctgcgcc ttatccggta actatcgtct tgagtccaac ccggtaagac 900

acgacttatc gccactggca gcagccactg gtaacaggat tagcagagcg aggtatgtag 960

gcggtgctac agagttcttg aagtggtggc ctaactacgg ctacactaga agaacagtat 1020

ttggtatctg cgctctgctg aagccagtta ccttcggaaa aagagttggt agctcttgat 1080

ccggcaaaca aaccaccgct ggtagcggtg gtttttttgt ttgcaagcag cagattacgc 1140

gcagaaaaaa aggatctcaa gaagatcctt tgatcttttc tacggggtct gacgctcagt 1200

ggaacgaaaa ctcacgttaa gggattttgg tcatgagatt atcaaaaagg atcttcacct 1260

agatcctttt aaattaaaaa tgaagtttta aatcaatcta aagtatatat gagtaaactt 1320

ggtctgacag ttaccaatgc ttaatcagtg aggcacctat ctcagcgatc tgtctatttc 1380

gttcatccat agttgcctga ctccccgtcg tgtagataac tacgatacgg gagggcttac 1440

catctggccc cagtgctgca atgataccgc gagacccacg ctcaccggct ccagatttat 1500

cagcaataaa ccagccagcc ggaagggccg agcgcagaag tggtcctgca actttatccg 1560

cctccatcca gtctattaat tgttgccggg aagctagagt aagtagttcg ccagttaata 1620

gtttgcgcaa cgttgttgcc attgctacag gcatcgtggt gtcacgctcg tcgtttggta 1680

tggcttcatt cagctccggt tcccaacgat caaggcgagt tacatgatcc cccatgttgt 1740

gcaaaaaagc ggttagctcc ttcggtcctc cgatcgttgt cagaagtaag ttggccgcag 1800

tgttatcact catggttatg gcagcactgc ataattctct tactgtcatg ccatccgtaa 1860

gatgcttttc tgtgactggt gagtactcaa ccaagtcatt ctgagaatag tgtatgcggc 1920

gaccgagttg ctcttgcccg gcgtcaatac gggataatac cgcgccacat agcagaactt 1980

taaaagtgct catcattgga aaacgttctt cggggcgaaa actctcaagg atcttaccgc 2040

tgttgagatc cagttcgatg taacccactc gtgcacccaa ctgatcttca gcatctttta 2100

ctttcaccag cgtttctggg tgagcaaaaa caggaaggca aaatgccgca aaaaagggaa 2160

taagggcgac acggaaatgt tgaatactca tactcttcct ttttcaatat tattgaagca 2220

tttatcaggg ttattgtctc atgagcggat acatatttga atgtatttag aaaaataaac 2280

aaataggggt tccgcgcaca tttccccgaa aagtgccacc t 2321

Claims

1. A fusion protein expression vector is characterized in that the nucleic acid sequence of the fusion protein expression vector is shown as SEQ ID NO: 1 is shown.

2. The fusion protein expression vector of claim 1, wherein the fusion protein expression vector is a plasmid vector pFOKI-dCpf1 formed by fusing FokI endonuclease with Cpf1 having no cleavage activity.

3. The fusion protein expression vector of claim 2, wherein the fusion protein expression vector comprises a CAG promoter, a Kozak sequence, two SV40 nuclear micro signal NLS sequences, a fokl sequence, a Linker sequence, a dCpf1 sequence;

the Kozak sequence is 3747-3556 bases from the 5' end in SEQ ID NO. 1;

the sequence of FokI is the 4374-;

the Linker sequence is 4401-4988 th base from the 5' end in SEQ ID NO. 1;

the sequence of dCpf1 is base 5004-8687 from the 5' end in SEQ ID NO. 1;

4. The fusion protein expression vector of claim 2, wherein the primer sequence of the fusion protein expression vector is as follows:

5. the method for constructing the fusion protein expression vector of claim 1, wherein dCpf1 is cloned into LB vector to construct pLB-dCpf1 vector, pLB-dCpf1 is cut by BsmBI enzyme, and fragment dCpf1 is cut back; obtaining T2A-NLS and FokI fragments from a laboratory storage vector by a PCR method; the T2A-NLS, FokI, dCpf1 fragment and CAG framework were then ligated together using the gold gate cloning method, transformed into TOP10 cells, and single clones were picked up and identified by electrophoresis after plasmid extraction.

6. Use of the fusion protein expression vector of any one of claims 1 to 3 in gene editing.

7. The fusion protein-mediated site-directed gene editing method according to any one of claims 1 to 3, wherein a genomic target site is edited in a site-directed manner under the guidance of gRNA after fusion of nuclease-free dCpf1 with a nuclease FokI.

8. The fusion protein-mediated site-directed gene editing method according to claim 1, wherein the fokl-dCpf 1 gene editing system comprises: a fokl-dCpf 1 fusion protein expression plasmid to recognize grnas and cleave and a gRNA expression plasmid to direct site-directed binding of the fokl-dCpf 1 fusion protein.

9. A gRNA vector is characterized in that the nucleic acid sequence of the vector is shown in SEQ ID NO. 2.

10. A gRNA vector according to claim 9, characterized in that the sequences of the gRNA primers are as follows: