The application relates to a fusion protein for generating point mutation in cells, and a preparation method and application thereof, which are divisional applications of Chinese patent application, wherein the application date is 2017, 6, 15, and the application number is 2017104514243.
Detailed Description
The present disclosure relates to fusion proteins of Cas proteins with a deletion of nuclease activity and cytosine deaminase AID or mutants thereof. Under the direction of sgRNA, the fusion protein is recruited to a specific DNA sequence, AID or mutant thereof deaminates cytosine to generate uracil, and then is randomly mutated into other bases in the DNA repair process, so that high mutation efficiency is achieved while site-directed mutation is achieved.
For the content of Cas/sgrnas, see CN 201380049665.5 and CN 201380072752.2, in addition to those described herein below, the entire contents of which are incorporated herein by reference.
Cas proteins
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is a gene editing system for bacteria to defend against viral attack or to evade mammalian immune responses. The system is transformed and optimized, and is widely applied to in vitro biochemical reaction and gene editing of cells and individuals.
In general, a complex formed by a Cas protein with endonuclease activity and its specifically recognized sgrnas complementarily pairs with a template strand in the target DNA through the pairing region of the sgrnas, and double-stranded DNA is cleaved by Cas at a specific position. It is understood that herein, "Cas protein" is used interchangeably with "Cas enzyme".
The above-described properties of Cas/sgrnas, i.e., specific binding of the sgrnas to the targets, are exploited herein to localize Cas to a desired position where cytosine is deaminated by AID or mutants thereof in the fusion protein. Cas proteins suitable for use in the present invention with partial or complete deletion of nuclease activity, particularly partial or complete deletion of endonuclease activity, but retaining helicase activity, may be derived from a variety of Cas proteins and variants thereof well known in the art, including but not limited to Cas1, cas1B, cas2, cas3, cas4, cas5, cas6, cas7, cas8, cas9 (also known as Csn1 and Csx12)、Cas10、Csy1、Csy2、Csy3、Cse1、Cse2、Csc1、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、Cmr1、Cmr3、Cmr4、Cmr5、Cmr6、Csb1、Csb2、Csb3、Csx17、Csx14、Csx10、Csx16、CsaX、Csx3、Csx1、Csx15、Csf1、Csf2、Csf3、Csf4、 homologs thereof or modified versions thereof.
In some embodiments, cas9 enzymes with deleted nuclease activity and single-stranded sgrnas specifically recognized by the same are used. The Cas9 enzyme may be a Cas9 enzyme from a different species, including but not limited to Cas9 from streptococcus pyogenes (SpCas 9), cas9 from staphylococcus aureus (SaCas 9), cas9 from streptococcus thermophilus (St 1Cas 9), and the like. Various variants of Cas9 enzymes may be used, provided that the Cas9 enzyme is capable of specifically recognizing its sgrnas and lacks nuclease activity.
Cas proteins with deleted nuclease activity can be prepared using methods well known in the art, including, but not limited to, deleting the entire catalytic domain of an endonuclease in a Cas protein or mutating one or several amino acids in the domain, thereby producing Cas proteins with deleted nuclease activity. Mutations may be deletions or substitutions of one or several (e.g., 2 or more, 3 or more, 4 or more, 5 or more, 10 or more, to the entire catalytic domain) amino acid residues, or insertions of one or several new amino acid residues (e.g., 1 or more, 2 or more, 3 or more, 4 or more, 5 or more, 10 or more, or 1-10, 1-15, unequal). Deletion of the above domains or mutation of amino acid residues can be performed using methods conventional in the art, and detection of whether the mutated Cas protein has nuclease activity. For example, for Cas9, its two endonuclease catalytic domains RuvC1 and HNH can be mutated separately, e.g., asparagine at amino acid 10 of the enzyme (in RuvC1 domain) to alanine or other amino acid, histidine at amino acid 841 (in HNH domain) to alanine or other amino acid. These two mutations cause Cas9 to lose endonuclease activity. Preferably, the Cas enzyme is completely free of nuclease activity. In one or more embodiments, the amino acid sequence of the Cas9 enzyme without nuclease activity as used herein is shown in SEQ ID No. 2, nos. 42-1452. In other embodiments, the Cas enzyme used herein lacks nuclease activity in part, i.e., the Cas enzyme can cause DNA single strand breaks. Representative examples of such Cas enzymes are shown as amino acid residues 42-1419 of SEQ ID No. 72.
The Cas/sgRNA complex needs to have a prosomain sequence adjacent motif (protospacer adjacent motif, PAM) in the non-template strand (3 'to 5') of DNA for functioning. Different Cas enzymes, their corresponding PAMs are not identical. For example, PAM for SpCas9 is typically NGG, PAM for SaCas9 enzyme is typically NNGRR, PAM for St1Cas9 enzyme is typically NNAGAA, where N is A, C, T or G and R is G or a.
In certain preferred embodiments, PAM for SaCas9 enzyme is NNGRRT. In certain preferred embodiments, the PAM for SpCas9 is TGG.
sgRNA
The sgrnas typically comprise two parts, a target binding region and a Cas protein recognition region. The target binding region is typically linked to the Cas protein recognition region in a 5 'to 3' direction.
The length of the target binding region is typically 15 to 25 bases, more typically 18 to 22 bases, such as 20 bases. The target binding region specifically binds to the template strand of the DNA, thereby recruiting the fusion protein to a predetermined site. Typically, the opposite region of the sgRNA binding region on the DNA template strand is immediately adjacent to PAM, or is separated by a few bases (e.g., within 10, or within 8, or within 5). Thus, in designing an sgRNA, typically the PAM of the enzyme is determined based on the Cas enzyme used, then a site is found on the non-template strand of the DNA that can serve as PAM, and then a fragment 15-25 bases long, more typically 18-22 bases long, downstream of the non-template strand (3 'to 5') PAM site, immediately or within 10 (e.g., 8, 5, etc.) of the PAM site is used as the sequence of the target binding region of the sgRNA.
The Cas protein recognition region of the sgRNA is then determined according to the Cas protein used, as will be appreciated by those skilled in the art.
Thus, the sequence of the target binding region of the sgrnas herein is a 15-25 base long, more typically 18-22 base long fragment of the DNA strand containing the PAM site recognized by the selected Cas enzyme immediately downstream of or within 10 (e.g., within 8, within 5, etc.) of the PAM site, the Cas protein recognition region of which is specifically recognized by the selected Cas enzyme.
The sgrnas can be prepared using methods conventional in the art, for example, synthesized using conventional chemical synthesis methods. The sgrnas can also be transferred into cells via expression vectors, where they are expressed. Expression vectors for sgrnas can be constructed using methods well known in the art.
Activation-induced cytosine deaminase (AID)
AID is a cytosine deaminase belonging to the APOBEC family, an RNA editing enzyme family, wherein the N end has nuclear localization signals, the C end has nuclear output signals, and the catalytic domain is shared by the APOBEC family. N-terminal structure is thought to be necessary for Somatic Hypervariability (SHM). The function of AID is to deaminate cytosine, change cytosine to uracil, and subsequent DNA repair can change uracil to another base. It is understood that cytosine deaminase or fragments or mutants thereof that retain the biological activity of deaminating cytosine to convert cytosine to uracil, as known in the art, may be used herein.
The structural domains of AID are shown in fig. 14. Wherein amino acids 9-26 are Nuclear Localization (NLS) domains, in particular amino acids 13-26 are involved in DNA binding, amino acids 56-94 are catalytic domains, amino acids 109-182 are APOBEC-like domains, amino acids 193-198 are Nuclear Export (NES) domains, amino acids 39-42 interact with catenin-like protein 1 (CTNNBL 1), and amino acids 113-123 are hotspot recognition loops.
Full length sequences of AIDs (as shown in amino acids 1457-1654 of SEQ ID NO: 2) may be used herein, as may fragments of AIDs. Preferably, the fragment comprises at least an NLS domain, a catalytic domain and an APOBEC-like domain. Thus, in certain embodiments, the fragment comprises at least amino acid residues 9-182 of AID (i.e., amino acid residues 1465-1638 of SEQ ID NO: 2). In other embodiments, the fragment comprises at least amino acid residues 1-182 of AID (i.e., amino acid residues 1457-1638 of SEQ ID NO: 2). For example, in certain embodiments, AID fragments as used herein consist of amino acid residues 1-182, of amino acid residues 1-186, or of amino acid residues 1-190. Thus, in certain embodiments, AID fragments as used herein consist of amino acid residues 1457-1638 of SEQ ID NO. 2, amino acid residues 1457-1642 of SEQ ID NO. 2, or amino acid residues 1457-1646 of SEQ ID NO. 2.
Variants of AID that retain their cytosine deaminase activity may also be used herein. For example, such variants may have 1-10, such as 1-8, 1-5 or 1-3 amino acid variations, including deletions, substitutions and mutations of amino acids, corresponding to the wild-type sequence of AID. Preferably, these amino acid variations do not occur within the NLS domains, catalytic domains and apodec-like domains described above, or even within these domains, do not affect the original biological functions of these domains. For example, it is preferred that these variations do not occur at positions 24, 27, 38, 56, 58, 87, 90, 112, 140, etc., of the AID amino acid sequence. In certain embodiments, these variations also do not occur within amino acids 39-42, amino acids 113-123. Thus, for example, variations may occur among amino acids 1-8, amino acids 28-37, amino acids 43-55, and/or amino acids 183-198. In certain embodiments, the variation occurs at positions 10, 82 and 156. For example, substitution mutations occur at positions 10, 82 and 156, and such substitution mutations may be K10E, T I and E156G. In these embodiments, the amino acid sequence of the exemplary AID mutant contains, or consists of, the amino acid sequence shown at positions 1448-1629 of SEQ ID NO. 68.
Fusion proteins
Provided herein are fusion proteins containing a Cas enzyme and AID. The fusion proteins herein, cas enzymes, are typically N-terminal to the fusion protein amino acid sequence, AID is C-terminal. In certain embodiments, provided herein are fusion proteins formed primarily from a Cas enzyme and AID. It is to be understood that the "predominantly..once formed" fusion proteins or similar expressions described herein are not intended to be a fusion protein comprising only Cas enzyme and AID, and that the definition is to be understood that the fusion protein may comprise only Cas enzyme and AID, or may also contain other moieties that do not affect the targeting effect of Cas enzyme in the fusion protein and the function of AID mutant target sequences, including, but not limited to, various linker sequences, nuclear localization sequences, and amino acid sequences introduced into the fusion protein as a result of gene cloning procedures as described below, and/or to construct the fusion protein, facilitate expression of the recombinant protein, obtain recombinant protein that is automatically secreted outside of the host cell, or facilitate detection and/or purification of the recombinant protein, etc.
Cas enzymes may be fused to AID through a linker. The linker may be a3 to 25 residue peptide, for example 3 to 15, 5 to 15, 10 to 20 residue peptide. Suitable examples of peptide linkers are well known in the art. Typically, the linker contains one or more motifs that repeat back and forth, typically containing Gly and/or Ser. For example, the motifs may be SGGS, GSSGS, GGGS, GGGGS, SSSSG, GSGSA and GGSGG. Preferably, the motifs are contiguous in the linker sequence with no amino acid residues inserted between the repeats. The linker sequence may comprise 1,2, 3, 4 or 5 repeat motif compositions. In certain embodiments, the linker sequence is a glycine linker sequence. The number of glycine in the linker sequence is not particularly limited, and is usually 2 to 20, for example, 2 to 15, 2 to 10, 2 to 8. In addition to glycine and serine, other known amino acid residues may be contained in the linker, such as alanine (A), leucine (L), threonine (T), glutamic acid (E), phenylalanine (F), arginine (R), glutamine (Q), etc. In certain embodiments, the linker sequence is XTEN, which has the amino acid sequence shown as amino acid residues 183-198 of SEQ ID NO. 66.
As an example, the linker may consist of amino acid residues 1420-1456 of the amino acid sequence :G(SGGGG)2SGGGLGSTEF(SEQ ID NO:21)、RSTSGLGGGS(GGGGS)2G(SEQ ID NO:22)、QLTSGLGGGS(GGGGS)2G(SEQ ID NO:23)、GGGS(SEQ ID NO:24)、GGGGS(SEQ ID NO:25)、SSSSG(SEQ ID NO:26)、GSGSA(SEQ ID NO:27)、GGSGGGGGGSGGGGSGGGGS(SEQ ID NO:28)、SSSSGSSSSGSSSSG(SEQ ID NO:29)、GSGSAGSGSAGSGSA(SEQ ID NO:30)、GGSGGGGSGGGGSGG(SEQ ID NO:31)、SEQ ID NO:72, etc.
It will be appreciated that in gene cloning operations, it is often necessary to design suitable cleavage sites, which tend to introduce one or more unrelated residues at the end of the expressed amino acid sequence, without affecting the activity of the sequence of interest. To construct fusion proteins, facilitate expression of recombinant proteins, obtain recombinant proteins that are automatically secreted outside of the host cell, or facilitate purification of recombinant proteins, it is often desirable to add some amino acid to the N-terminus, C-terminus, or other suitable region within the recombinant protein, including, for example, but not limited to, suitable linker peptides, signal peptides, leader peptides, terminal extensions, and the like. Thus, the amino-or carboxy-terminus of a fusion protein herein may also contain one or more polypeptide fragments as protein tags. Any suitable label may be used herein. For example, the tag may be FLAG (DYKDDDDK, SEQ ID NO: 32), HA, HA1, c-Myc, poly-His, poly-Arg, strep-TagII, AU1, EE, T7,4A6, ε, B, gE, and Ty1. These tags can be used to purify proteins.
The fusion proteins herein may also contain a Nuclear Localization Sequence (NLS). Nuclear localization sequences of various origins and various amino acid compositions, as known in the art, may be used. Such nuclear localization sequences include, but are not limited to, NLS of SV40 viral large T antigen with amino acid sequence PKKKRKV (SEQ ID NO: 33), NLS from nucleoplasmin, e.g., nucleoplasmin bipartite NLS with sequence KRPAATKKAGQAKKKK (SEQ ID NO: 34), NLS from c-myc with amino acid sequence PAAKRVKLD (SEQ ID NO: 35) or RQRRNELKRSP (SEQ ID NO: 36), NLS from hRNPA M9 with sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 37), sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV from IBB domain of input protein-alpha (SEQ ID NO: 38), sequence VSRKRPRP (SEQ ID NO: 39) and PPKKARED (SEQ ID NO: 40), sequence SALIKKKKKMAP of mouse c-ablIV (SEQ ID NO: 41), sequence DRLRR (SEQ ID NO: 42) and PKQKKRK (SEQ ID NO: 43) of hepatitis virus delta antigen, sequence 35 (SEQ ID NO: 44) of mouse Mx1 protein (SEQ ID NO: 92), sequence 3245) of human glucocorticoid receptor (3447), human glucocorticoid sequence (3447), and the like. In certain embodiments, the sequences shown at amino acid residues 26-33 of SEQ ID NO. 2 are used herein as NLS. NLS can be located at the N-terminus, C-terminus, or within the fusion protein sequence, for example at the N-terminus and/or C-terminus of Cas9 enzyme in the fusion protein, or at the N-terminus and/or C-terminus of AID in the fusion protein.
The accumulation of the fusion protein of the invention in the nucleus of a cell can be detected by any suitable technique. For example, a detection label can be fused to the Cas enzyme such that the position of the fusion protein within the cell can be visualized when combined with a means to detect the position of the cell nucleus (e.g., a dye specific to the cell nucleus, such as DAPI). In certain embodiments, 3 x flag is used herein as a tag and the peptide stretch sequence may be as shown in amino acid residues 1-23 of SEQ ID NO. 2. It will be appreciated that in general, if a tag sequence is present, the tag sequence will typically be at the N-terminus of the fusion protein. The tag sequence may be directly linked to the NLS or may be linked by a suitable linker sequence. The NLS sequence may be directly linked to the Cas enzyme or AID, or may be linked to the Cas enzyme or AID via an appropriate linker sequence.
Thus, in certain embodiments, the fusion proteins herein consist of Cas enzyme and AID. In other embodiments, the fusion proteins herein are made by the Cas enzyme linked to AID through a linker. In certain embodiments, the fusion proteins NLS, cas enzyme, AID, and optional linker sequences between Cas enzyme and AID herein consist of. In certain embodiments, the Cas enzyme in the fusion protein is a Cas9 enzyme as described previously. In certain embodiments, the amino acid sequence of AID in the fusion protein is shown as amino acid residues 1457-1654 of SEQ ID NO. 2. In other embodiments, the amino acid sequence of AID in the fusion protein is shown as amino acid residues 1457-1646 of SEQ ID NO. 4. In other embodiments, the amino acid sequence of AID in the fusion protein is shown as amino acid residues 1448-1629 of SEQ ID NO. 68.
In certain embodiments, the fusion proteins herein have an amino acid sequence as shown in SEQ ID NO. 2, 4, 66, 68, 70 or 72, or as shown in amino acids 26-1654 of SEQ ID NO. 2, or as shown in amino acids 26-1638 of SEQ ID NO.4, or as shown in amino acids 26-1629 of SEQ ID NO. 68, or as shown in amino acids 26-1629 of SEQ ID NO. 70, or as shown in amino acids 26-1638 of SEQ ID NO. 72.
Polynucleotide sequences, hosts and protein expression
Included herein are polynucleotide sequences encoding the fusion proteins herein. The polynucleotides herein may be in DNA form or in RNA form. DNA forms include cDNA, genomic DNA, or synthetic DNA. The DNA may be single-stranded or double-stranded. The DNA may be a coding strand or a non-coding strand.
The nucleotide sequences described herein can generally be obtained by PCR amplification. Specifically, primers can be designed based on the nucleotide sequences disclosed herein, particularly open reading frame sequences, and amplified to obtain the relevant sequences using a commercially available cDNA library or a cDNA library prepared according to conventional methods known to those skilled in the art as a template. When the sequence is longer, it is often necessary to perform two or more PCR amplifications, and then splice the amplified fragments together in the correct order. For example, in certain embodiments, the polynucleotide sequence encoding the fusion proteins described herein is shown as SEQ ID NO. 1, 3, 65, 67, 79, or 71, or as bases 73-4965 of SEQ ID NO. 1, or as bases 73-4917 of SEQ ID NO. 3, or as bases 76-4890 of SEQ ID NO. 67, or as bases 76-4890 of SEQ ID NO. 70, or as bases 76-4917 of SEQ ID NO. 72.
Also included herein are nucleic acid constructs comprising the polynucleotides. The nucleic acid construct comprises the coding sequences for the fusion proteins described herein, and one or more regulatory sequences operably linked to these sequences. The coding sequence of the fusion protein of the invention can be manipulated in a number of ways to ensure expression of the protein. The nucleic acid construct may be manipulated according to the expression vector or requirements prior to insertion into the vector. Techniques for altering polynucleotide sequences using recombinant DNA methods are known in the art.
The regulatory sequence may be a suitable promoter sequence. The promoter sequence is typically operably linked to the coding sequence of the protein to be expressed. The promoter may be any nucleotide sequence that exhibits transcriptional activity in the host cell of choice including mutant, truncated, and hybrid promoters, and may be obtained from genes encoding extracellular or intracellular polypeptides either homologous or heterologous to the host cell.
The regulatory sequence may also be a suitable transcription terminator sequence, a sequence recognized by a host cell to terminate transcription. The terminator sequence is operably linked to the 3' terminus of the nucleotide sequence encoding the polypeptide. Any terminator which is functional in the host cell of choice may be used in the present invention.
The control sequences may also be suitable leader sequences, untranslated regions of mRNA that are important for host cell translation. The leader sequence is operably linked to the 5' terminus of the nucleotide sequence encoding the polypeptide. Any terminator which is functional in the host cell of choice may be used in the present invention.
In certain embodiments, the nucleic acid construct is a vector. For example, the polynucleotide sequences herein may be inserted into a recombinant expression vector. The term "recombinant expression vector" refers to bacterial plasmids, phages, yeast plasmids, plant cell viruses, mammalian cell viruses such as adenoviruses, retroviruses or other vectors well known in the art. Any plasmid or vector may be used as long as it is replicable and stable in the host. An important feature of expression vectors is that they generally contain an origin of replication, a promoter, a marker gene and translational control elements. The expression vector may also include a ribosome binding site for translation initiation and a transcription terminator. The polynucleotide sequences described herein are operably linked to an appropriate promoter in an expression vector to direct mRNA synthesis via the promoter. Representative examples of such promoters are the lac or trp promoter of E.coli, the lambda phage PL promoter, eukaryotic promoters including the CMV immediate early promoter, the HSV thymidine kinase promoter, the early and late SV40 promoters, LTRs from retroviruses and some other known promoters which control gene expression in prokaryotic or eukaryotic cells or viruses thereof. Marker genes can be used to provide phenotypic traits for selection of transformed host cells, including but not limited to dihydrofolate reductase, neomycin resistance, and Green Fluorescent Protein (GFP) for eukaryotic cell culture, or tetracycline or ampicillin resistance for E.coli. When the polynucleotides described herein are expressed in higher eukaryotic cells, transcription will be enhanced if enhancer sequences are inserted into the vector. Enhancers are cis-acting elements of DNA, usually about 10 to 300 base pairs, that act on a promoter to increase the transcription of a gene.
It will be clear to a person of ordinary skill in the art how to select appropriate vectors, promoters, enhancers and host cells. Expression vectors comprising the polynucleotide sequences described herein and appropriate transcriptional/translational control signals can be constructed using methods well known to those of skill in the art. These methods include in vitro recombinant DNA techniques, DNA synthesis techniques, in vivo recombinant techniques, and the like.
The vectors described herein may be transformed into suitable host cells to enable expression of the fusion proteins described herein. The host cell may be a prokaryotic cell, such as a bacterial cell, or a lower eukaryotic cell, such as a yeast cell, a filamentous fungal cell, or a higher eukaryotic cell, such as a mammalian cell. The host cell may also be a plant cell. Representative examples of host cells are E.coli, bacterial cells of Streptomyces, salmonella typhimurium, fungal cells such as yeast, filamentous fungi, plant cells, insect cells of Drosophila S2 or Sf9, animal cells of CHO, COS, 293 cells, or Bowes melanoma cells, etc. In addition to cells used to express fusion proteins, other cells containing the polynucleotide sequences or vectors described herein and sgrnas or expression vectors thereof, such as cells used to make point mutant proteins, are also within the scope of the host cells described herein.
Transformation of host cells with recombinant DNA can be performed using conventional techniques well known to those skilled in the art. When the host is a prokaryote such as E.coli, competent cells, which are capable of absorbing DNA, can be obtained after an exponential growth phase and treated by the CaCl 2 method using procedures well known in the art. Another approach is to use MgCl 2. Transformation can also be performed by electroporation, if desired. When the host is eukaryotic, DNA transfection methods such as calcium phosphate co-precipitation, conventional mechanical methods such as microinjection, electroporation, liposome packaging, etc. may be used.
After transformation of the host cell, the resulting transformant may be cultured in a conventional manner to allow its expression of the fusion protein described herein. The medium used in the culture may be selected from various conventional media depending on the host cell used. The recombinant fusion proteins herein can be isolated and purified using various isolation methods known in the art. Such methods are well known to those skilled in the art and include, but are not limited to, conventional renaturation treatment, treatment with protein precipitants (salting-out method), centrifugation, osmotic sterilization, super-treatment, super-centrifugation, molecular sieve chromatography (gel filtration), adsorption chromatography, ion exchange chromatography, high Performance Liquid Chromatography (HPLC) and other various liquid chromatography techniques and combinations of these methods.
Thus, also included herein is a host cell comprising the fusion protein described herein, a coding sequence or expression vector thereof, and optionally an sgRNA or expression vector thereof. Such host cells may constitutively express the fusion proteins described herein, or may express the fusion proteins described herein under certain induction conditions. Methods of how to constitutively express host cells or express fusion proteins of the invention under induction conditions are well known in the art. For example, in certain embodiments, the inducible promoters are used to construct the expression vectors of the present invention, thereby effecting inducible expression of the fusion protein.
Composition and kit
The fusion proteins, coding sequences or expression vectors thereof, and/or sgrnas, coding sequences or expression vectors thereof herein may be provided in the form of compositions. For example, the composition may contain the fusion protein herein and an sgRNA or an expression vector for an sgRNA, or may contain the expression vector for the fusion protein herein and an expression vector for an sgRNA or an sgRNA. In the composition, the fusion protein or its expression vector, or the sgRNA or its expression vector, may be provided in the form of a mixture, or may be packaged separately. The composition may be in the form of a solution or may be in lyophilized form.
The composition may be provided in a kit. Thus, provided herein are kits comprising the compositions described herein. Alternatively, provided herein is a kit comprising the fusion protein herein and an sgRNA or an expression vector for an sgRNA, or an expression vector for the fusion protein herein and an expression vector for an sgRNA or an expression vector for an sgRNA. In the kit, the fusion protein or its expression vector, or the sgRNA or its expression vector, may be packaged separately or provided in the form of a mixture. Reagents for transferring the fusion protein or its expression vector and/or sgRNA or its expression vector into a cell, for example, as well as instructions for the skilled person to perform the transfer, may also be included in the kit. Alternatively, the kit may further comprise instructions for the skilled artisan to practice the various methods and uses described herein using the components contained in the kit. Other reagents, such as reagents for PCR, etc., are also included in the kit.
Method and use
In a third aspect, provided herein is a method of generating a point mutation in a cell, the method comprising the step of expressing the fusion protein and sgRNA described herein in the cell. In certain embodiments, the fusion protein of the invention or an expression vector thereof and the sgRNA or an expression vector thereof are transferred into the cell. In the case of constitutive expression of the fusion proteins described herein by cells, the corresponding sgrnas or expression vectors thereof can be transferred into cells alone. In the case of cell-inducible expression of the fusion proteins described herein, the cells can also be incubated with an inducer or subjected to corresponding induction measures (e.g.light) after transfer of the sgRNA. The fusion protein or its expression vector and/or sgRNA or its expression vector may be transferred into cells using conventional transfection methods. For example, in certain embodiments, at transfection, plasmid DNA-liposome complexes are first prepared and then the plasmid DNA-liposome complexes and corresponding sgrnas are co-transfected into cells. After obtaining the cells having the point mutation, the cells may be cultured under conditions suitable for growth and expression of the desired protein, and the resulting mutants may be isolated and analyzed by various conventional methods (e.g., high-throughput methods).
Thus, the methods described herein for generating point mutations in cells can also be used to generate libraries of mutants, and then the mutants in the library can be isolated and screened using conventional techniques to obtain mutants having the desired biological function. Accordingly, the present invention also provides a method of constructing a library of mutants, the method comprising the step of expressing the fusion protein and sgRNA described herein in said cells.
One or more sgrnas can be designed for the same locus to be mutated. When designing multiple sgrnas, the target binding regions of the designed multiple sgrnas are different, but have the same Cas protein recognition region. The one or more sgrnas can then be transferred into a cell along with the corresponding fusion protein.
The cells may be any cell of interest, including prokaryotic and eukaryotic cells, such as plant cells, animal cells, microbial cells, and the like. Particularly preferred are animal cells, such as mammalian cells, rodent cells, including human, equine, bovine, ovine, murine, rabbit, and the like. Microbial cells include cells from a variety of microbial species well known in the art, particularly those of microbial species having medical research value, production value (e.g., production of fuels such as ethanol, production of proteins, production of oils such as DHA). The cells may also be cells of various organ origin, such as cells from the human liver, kidneys, skin, etc. The cells may also be various mature cell lines currently on the market, such as 293 cells, COS cells. In certain embodiments, the cells are cells from a healthy individual, and in other embodiments, the cells are cells from diseased tissue of a diseased individual, such as cells from inflamed tissue, tumor cells, induced pluripotent stem cells, and the like. The cell may also be a cell genetically engineered to have a particular function (e.g., to produce a protein of interest) or to produce a phenotype of interest. In other words, the gene or nucleic acid sequence to be mutated can be either a (endogenous) gene or nucleic acid sequence which is naturally present in the cell or a (exogenous) gene or nucleic acid sequence which is transferred exogenously. The foreign transferred gene or nucleic acid sequence may be integrated into the genomic sequence of the cell or may be isolated from the genome and stably expressed.
Expression vectors for expression of the fusion proteins and sgrnas herein can be designed for different cells using known techniques to render these expression vectors suitable for expression in the cells. For example, promoters and other related regulatory sequences may be provided in the expression vector that facilitate expression in the cell. These may be selected and implemented by the skilled person according to the actual circumstances.
The nucleic acid sequence for which point mutations are expected to be generated may be any nucleic acid sequence of interest, such as a gene sequence, in particular various genes or nucleic acid sequences associated with diseases, or associated with the production of various proteins of interest, or associated with biological functions of interest. Such genes or nucleic acid sequences of interest include, but are not limited to, nucleic acid sequences encoding various functional proteins. Herein, functional proteins refer to proteins capable of performing physiological functions of an organism, including catalytic proteins, transport proteins, immune proteins, regulatory proteins, and the like. In certain embodiments, the functional proteins include, but are not limited to, proteins involved in the occurrence, progression and metastasis of diseases, proteins involved in cell differentiation, proliferation and apoptosis, proteins involved in metabolism, proteins associated with development, and various drug targets, among others. For example, the functional protein may be an antibody, an enzyme, a lipoprotein, a hormonal protein, a transport and storage protein, a motor protein, a receptor protein, a membrane protein, or the like. Thus, libraries of mutants can be constructed using the fusion proteins, polynucleotides, nucleic acid constructs, cells, methods, and the like described herein, and further screened for proteins with novel or greater function, such as antibodies, enzymes, or other functional proteins, and the like.
Random mutations may be made in a nucleic acid sequence of interest, or mutations may be made at specific sites in a nucleic acid sequence of interest using the methods described herein. For the former, the Cas enzyme may be used to find the PAM site on the template strand, and the sgrnas recognized by the Cas enzyme may be designed with fragments 15 to 25 bases in length, more typically 18 to 22 bases in length, downstream of the PAM site immediately or within 10 (e.g., within 8, 5 or 3) of the PAM site as target recognition regions of the sgrnas. For the latter, a site can be found near the specific site that can be used as PAM, a Cas enzyme that recognizes the PAM is selected according to the PAM, and the fusion protein of the invention comprising the Cas enzyme and the corresponding sgrnas are designed, prepared as described herein.
The methods herein may be in vitro methods or in vivo methods. When performed in vivo, the fusion proteins or expression vectors thereof and sgrnas or expression vectors thereof herein may be transferred into a subject, such as a corresponding tissue cell, by means well known in the art, and the functional variants of interest may be screened for by observing phenotypic changes in the animal. It will be appreciated that in vivo experiments, the subject may be a variety of non-human animals, particularly a variety of non-human model organisms commonly employed in the art. In vivo experiments should also meet ethical requirements.
The invention will be illustrated by way of specific examples. It should be understood that these embodiments are merely illustrative and are not limiting on the scope of the invention. The experimental procedure, in which no specific conditions are noted in the examples below, is generally carried out according to conventional conditions such as those described in Molecular Cloning: A Laboratory Manual (third edition of the molecular cloning Experimental guidelines) by Sambrook & Russell, or according to the manufacturer's recommendations. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. In addition, any methods and materials similar or equivalent to those described herein can be used in the present invention. The preferred methods and materials described herein are presented for illustrative purposes only.
EXAMPLE 1 construction of pEntr11-dCAS9-AID plasmid and pEntr11-dCAS9-AIDX plasmid
1. Using cDNA reverse transcribed from RNA of A20 cell line (purchased from the China academy of sciences typical culture Collection Committee cell line) as a template, the full-length AID sequence and AIDX fragment (truncated from amino acid residue 183) were amplified using primers shown in SEQ ID NOS 5 and 6 and primers shown in SEQ ID NOS 5 and 7, respectively (see FIGS. 1, A and C);
2. construction of pEntr11-dCAS9-TET1CD plasmid:
(1) Amplifying a dCS 9 target gene fragment from a dCS 9 plasmid (Addgene) by using PCR;
(2) Restriction enzymes BamHI and NcoI are utilized to cut a dCS 9 target gene fragment and pEntr11 plasmid (Invitrogen), and the fragments are recovered;
(3) Ligating the digested dCS 9 fragment with the pEntr11 vector, and then transforming the ligation product into TOP10 competent cells;
(4) Selecting positive clone, extracting plasmid and carrying out sequencing verification, thus completing the construction of pEntr11-dCAS9 plasmid;
(5) Amplifying a TET1CD target gene fragment by using PCR;
(6) Cutting pEntr11-dCAS9 plasmid by utilizing restriction enzymes BamHI and XhoI, and recovering fragments;
(7) Cloning the TET1CD into the pEntr11-dCAS9 plasmid by using a Gibson Assembly method, so as to finish the construction of the pEntr11-dCAS9-TET1CD plasmid;
3. Restriction enzymes BamHI and XhoI are utilized to carry out enzyme digestion on pEntr11-dCAS9-TET1CD plasmid, AID and AIDX fragments, and then pEntr11-dCAS9 vector, AID and AIDX fragments are recovered;
4. Respectively connecting the digested AID and AIDX fragments with a pEntr11-dCAs9 vector, and then transforming the connection products into TOP10 competent cells;
5. positive clones were selected, plasmids were extracted and sequenced to verify, thus completing the construction of pEntr11-dCAS9-AID and pEntr11-dCAS9-AIDX plasmids (FIGS. 1, B and D).
EXAMPLE 2 construction of the MO91-dCAS9-AID plasmid and the MO91-dCAS9-AIDX plasmid
1. The primers shown in SEQ ID NOS 8 and 9 were used to amplify the dCAS9-AID fragment and the dCAS9-AIDX fragment from the pEntr11-dCAS9-AID plasmid and the pEntr11-dCAS9-AIDX plasmid (FIG. 2, A);
2. Restriction enzymes BglII and XhoI are utilized to carry out enzyme digestion on MO91 plasmid (ADDGENE PLASMID # 19755), AID and AIDX fragments, and then the vector, the AID fragment and the AIDX fragment are recovered (FIG. 2, B);
3. Respectively connecting the digested AID fragments and AIDX fragments with an MO91 vector, and then converting the connection products into Stbl3 competent cells;
4. positive clones were selected, plasmids were extracted and sequenced to verify, thus completing the construction of MO91-dCAS9-AID and MO91-dCAS9-AIDX plasmids (FIGS. 2, C and D).
EXAMPLE 3 construction of MO91-dCAS9 (3. Times. Flag, NLS) -AID plasmid and MO91-dCAS9 (3. Times. Flag, NLS) -AIDX plasmid
The pCW-Cas9 plasmid (WUHan vast Biotechnology Co., ltd.) is used as a template, a primer PCR is designed to amplify 3X-flag+NLS fragments, and the 3X-flag+NLS fragments are cloned to dCAS 9N ends of MO91-dCAS9-AID plasmid and MO91-dCAS9-AIDX plasmid respectively by using a Gibson Assembly method to construct MO91-dCAS9 (3X-flag, NLS) -AID plasmid and MO91-dCAS9 (3X-flag, NLS) -AIDX plasmid (FIG. 3).
Example 4 creation of an efficient reporting System indicating AID Point mutation efficiency
The point mutation level caused at the genome level needs to be detected by a simple and visual method, and the invention mainly adopts a flow analysis technology to indirectly detect the point mutation level at the protein level. The EGFP gene was artificially inserted with a stop codon (TAG) and EGFP was not normally expressed. When the fusion protein acts on the stop codon in the EGFP gene, the stop codon point is mutated, so that the EGFP gene mutation is expressed normally. Thus, the higher the EGFP expression level, the higher the efficiency of point mutations.
In this example, EGFP gene containing a stop codon (sequence shown in FIG. 4) was inserted into MO405-thy1.1 plasmid (Addgene) and MSCV was expressed. The use of this plasmid to carry out virus-infection 293T specifically includes:
1. plating 293T, and cell density reaches 90% during virus packaging;
2. after 24 hours, the virus is wrapped, and the virus wrapping method is the same as transfection;
3. Liquid is changed 24 hours after the toxin is wrapped;
4. after the virus is wrapped for 24 hours, the virus is recovered for the first time, 1ug/ml of polybrene is added, 800g,90min and 6-8 hours of liquid exchange are carried out;
5. After the toxin is wrapped for 48 hours, the toxin is killed for the second time, 1ug/ml of polybrene is added, 800g,90min and 6-8 hours of liquid is changed;
6. After the cells had grown to a sufficient number, flow-stained (PE-thy 1.1), th1.1 positive cells were sorted as reporter cells. The results are shown in FIG. 6. A schematic of the pattern of reporter cells is shown in FIG. 5.
EXAMPLE 5 preparation of sgRNA
1. The target sequence of 20bp was found. If the 20bp target sequence is not G, a G is added to its 5' end to allow efficient transcription by the RNA polymerase III U6 promoter. It should be noted that the target sequence cannot contain recognition sites for XhoI or NheI.
2. The sgrnas were cloned into pLX (adedge 50662) to obtain pLX sgrnas. The following 4 primers were required, where R1 and F2 are sgRNA specific:
F1:AAACTCGAGTGTACAAAAAAGCAGGCTTTAAAG(SEQ ID NO:10)
R1:rc(GN19)GGTGTTTCGTCCTTTCC(SEQ ID NO:11)
F2:GN19GTTTTAGAGCTAGAAATAGCAA(SEQ ID NO:12)
R2:AAAGCTAGCTAATGCCAACTTTGTACAAGAAAGCTG(SEQ ID NO:13)
wherein GN 19 = new target sequence, rc (GN 19) = reverse complement of new target sequence.
3. PLX sgRNA was amplified using F1+R1 and F2+R2, respectively;
4. Gel-purifying the products obtained by the two amplifications, combining the products, and performing a third PCR on the products by using F1+R2;
5. digestion of the PCR product obtained in step 4 with NheI and XhoI, and
6. And (3) connecting and transforming to prepare the expression vector of the sgRNA.
The base sequences of the target binding regions of the four sgrnas are shown below:
GCATGCCCGAAGGCTACGTCC(SEQ ID NO:14);
GCAACTAGTATACCCGCGCCG(SEQ ID NO:15);
GCCTCGAACTTCACCTCGGCG(SEQ ID NO:16);
GTCAGCTCGATGCGGTTCACC(SEQ ID NO:17)。
Example 6 CRISPR-Cas9 increases AID point mutation efficiency
The reporter cells constructed in example 4 were cultured to a confluence of 70-90% for transfection. At transfection, plasmid DNA-liposome complexes are first prepared, comprising four times the amount of2000 Reagent diluted inIn the culture medium, MO91-dCAS9 (3×flag, NLS) -AID plasmid or MO91-dCAS9 (3×flag, NLS) -AIDX plasmid is diluted respectivelyThe diluted plasmids were then added to the diluted medium separatelyIncubation in 2000 reagent (1:1) for 30 minutes. The plasmid DNA-liposome complex was then co-transfected with 4 sgRNAs for EGFP stop codons prepared in example 5, into the reporter cells constructed in example 4. As a control, the reporter cells constructed in example 4 were transfected with only the plasmid DNA-liposome complex. The culture was carried out with puromycin 2ug/ml and blasticidin 20ug/ml, screening was carried out for 3d, and EGFP expression levels were analyzed by flow assay on day 4 and day 7 after transfection, respectively.
The results are shown in FIG. 7, with AID and AIDX% EGFP+ being 0.14% and 0.30% respectively, and dCAS9-AID+sgRNA and dCAS9-AIDX +sgRNA% EGFP+ being 2.14% and 4.36% respectively.
The result shows that the fusion of AID or AIDX and dCAS9 can lead the AID to be limited to a specific part under the targeting action of the sgRNA under the guiding action of the sgRNA, and simultaneously improve the action concentration and the mutation efficiency of the AID.
Example 7 CRISPR-Cas9 enhanced AID point mutation efficiency and optimization
The expression vector of sgRNA and dCAS9-AID was co-transformed in the reporter cell constructed in example 4 in the same manner as in example 6. Wherein the sgRNAs are divided into two groups, one group is control sgRNA for AAVS1, and the target binding regions are GATTCCCAGGGCCGGTTAATG (SEQ ID NO: 18), GTCCCCTCCACCCCACAGTG (SEQ ID NO: 19), and GGGGCCACTAGGGACAGGAT (SEQ ID NO: 20), respectively. The other group was the sgRNA group for EGFP (SEQ ID NOS: 14-17). Control groups were also set up for single-pass AID in the reporter cells. Expression vectors for control sgrnas were constructed as described in example 5.
FACS was measured at day 8 post transfection, with only 0.13% EGFP ++for the AID group, whereas EGFP ++for the dCas9-aid+sgrna group reached 2.1% (fig. 8, a), with a 16-fold increase in EGFP ++. To further optimize the efficiency of the dCAS9-AID system, dCAS9 was fused to different AID mutants, AID-FL (full length), AID-CD (catalytic domain only), P182X (truncated from amino acid residue 183), R186X (truncated from amino acid residue 187), R190X (truncated from amino acid residue 191). Co-transformation of each dCAS9-AID expression vector and sgRNA was performed in reporter cells, with dCAS9-R186X being the most efficient (FIGS. 8, B and C). The experiments of examples 8 to 13 were thus carried out using dCS 9 to R186X, in which examples dCS 9 to R186X is abbreviated as dCS 9 to AIDX.
To demonstrate that in the dCas9-AID system it was indeed the AID that fused to dCas9, the whole system had base substitution function, and that functional mutants of Cas9, dCas9-AIDX [ R186X (E58Q) ], dCas9-AIDX and sgrnas were co-transferred in the reporter cells, respectively, only the dCas9-AIDX and sgRNA groups had egfp++, while the other groups were all 0 (fig. 8, c). It was confirmed that the whole system had the base substitution function only after the fusion of AID and dCAS 9.
Example 8 CRISPR-Cas9 localizes AID point mutation function to sgRNA targeting sites
To investigate whether CRISPR-Cas9 can localize AID point mutation function to sgRNA targeting sites, PCR was performed on EGFP containing stop codon using genomic DNA of the reporter system constructed in example 4 as template, library was constructed, and cMyc was used as control gene for Miseq sequencing. The results are shown in FIG. 9. As can be seen from the sequencing results of the reporter cells, miseq had a sequencing substrate mutation frequency of 0.25% EGFP and 0.15% cMyc, although the sequencing throughput was high, and low quality reads (reads) were filtered. However, even with basal level interference, the EGFP gene point mutation frequency of the dCAS9-AIDX +sgRNA group can be observed to be obviously higher than that of the AIDX group, and the CRISPR-Cas9 is also proved to improve the AID point mutation efficiency. And these high frequency mutation sites are mainly concentrated at the targeting site of sgrnas, whereas little point mutation occurs in cMyc genes. After dCS 9 and AID are fused, the sgRNA targets dCS 9-AID to the targeting site of the sgRNA, so that the AID only acts on the targeting site of the sgRNA to generate point mutation without greatly changing other gene loci, and the point mutation frequency can be greatly improved.
EXAMPLE 9 dCAS9-AIDX random mutagenesis of C and G bases to the other three bases
AIDX itself will mutate C to T and G to A. After fusion of dCas9 with AIDX, the mutation direction of C and G became more uniform compared to AIDX group.
While AID itself functions as a wrCY depending on the hotspot motif (W stands for A/T, R stands for A/C, and Y stands for C/T), the most preferred motif is AGCT. The preference of this motif is clearly lost after the dCS 9 has been fused to AIDX. The inventors therefore propose the hypothesis that, under normal circumstances, AID would deaminate cytosine to form uracil, and this U-g mismatch is retained by DNA replication repair, mutations from C to T, G to a occur, and that the U base can be excised by base excision repair, followed by insertion of four bases. Therefore, fusion of dCAS9 with AID is likely to inhibit DNA replication, promote base excision repair, and make mutation direction more uniform (FIG. 10, b).
In addition, statistical analysis was performed on Miseq data, AIDX and dCas9-AIDX + sgrnas set resulted in point mutations of type on EGFP that were essentially consistent with the reports, with C and G base mutations being the predominant part and a and T being a minor proportion. And G is predominantly mutated to T, C to A. However, in the dCAS9-AIDX group, the ratio of the G mutation to T and C was increased, and the ratio of the C mutation to G or A was increased. Thus, dCas9-AIDX can produce more uniform mutation types (fig. 10, a).
Example 10 UGI increased the base substitution frequency of the dCAS9-AIDX system, revealing the locus of dCAS9-AIDX on the gene and making the base mutation direction more uniform.
UGI is an inhibitor of UNG, a phage protein that protects its genome from host UNG repair when phage invades e.coli (fig. 11, a). Three plasmids were co-transformed in reporter cells, expressing dCAS9-AIDX, single sgRNA (target binding domain GCCTCGAACTTCACCTCGGCG, SEQ ID NO: 16) and UGI (protein sequence: uniProtKB-P14739), respectively, to increase the mutation efficiency of single sgRNA in the whole system. The results showed a 10-fold improvement in the efficiency of the highest point mutation (FIG. 11, b).
In addition, after UGI is added, the mutation direction of the whole system is more single, C to T, G to A. Meanwhile, the action track of dCAS9-AIDX is counted, and the mutation frequency of the whole system is caused before and after the PAM sequence. FIG. 11 (c) is a statistic based on data for 4 sgRNAs designed for EGFP sites. All are based on the first base of N in NGG in PAM sequence. The statistics of the two groups of data are consistent, namely, mutation is caused to 20bp upstream of PAM, namely, the region of the protospacer sequence, and the highest point of the mutation is at-12/-13 positions of the PAM. UGI can increase the overall mutation frequency of AID, but can increase the proportion of base substitutions and decrease the conversion proportion (FIG. 11, d).
Example 11 dCAS9-AIDX can act not only on exogenous genes but also on endogenous genes. The above experiments were all performed in reporter cells, in which 3 sgRNAs (SEQ ID NOS: 18-20) were designed using the endogenous gene AAVS1 as the target site, and dmas 9-AID and vectors for three sgRNAs of AAVS1 were co-expressed in 293T (as described in example 7).
The results are shown in FIG. 12. The dCas9-AID system can likewise generate base substitutions for the endogenous gene AAVS1, and this mutation is also concentrated at the sgRNA target site.
Example 12 Gleevec resistance screening of K562 BCR-ABL Gene with dmas 9-AIDX A leukemia cell line derived from chronic myeloid leukemia humans. In such cells, a chromosome called the ph chromosome exists. The chromosome is formed by transposing long arms of the No. 9 chromosome and the No. 22 chromosome. The ABL gene on chromosome 9 contains a tyrosine kinase active center and is in a low activity state under normal conditions, and has high activity when transposed into the BCR locus. The BCR-ABL is a protooncogene, a commonly used drug is Gleevec (Gleevec, the active ingredient is imatinib mesylate), and the main action mechanism is that the Gleevec can competitively bind with the ABL to bind with ATP, so that the ABL gene is in low activity. However, in patient samples, point mutations such as T315I are found in the tyrosine kinase active domain, which deprive the domain of its ability to bind gleevec, resulting in gleevec resistance. In addition, base substitutions at other sites also lead to Gleevec resistance. The dCas9-AIDX system can be used to screen for Gleevec resistance sites and specific mutation types as a basis for the design of next generation inhibitors.
First, to obtain K562 cells stably expressing dCS 9-AIDX, we transfected 293T cells with the plasmid of interest MSCV-dCS 9-AID-P182X-IRES-Thy1.1 together with the viral packaging plasmid pcl-10A 1. 293T cells of 1X10 6 were plated 12-24 hours in advance in one well of a six-well plate and incubated overnight with 2ml of DMEM without anti-10% FBS, and when cells were grown to 80% density the next day, 3ug of plasmid of interest and 1ug of virus packaging plasmid were transfected, and 10ul of transfection reagent LIPO2000. After 24 hours of transfection, 2ml of the culture medium with anti-virus was used for culturing, and viruses were collected at 48 hours and 72 hours, respectively. The collected virus was immediately centrifuged at 1000rpm for 5 minutes to remove cell debris, and the supernatant was added to 2ul 10mg/ml Polybrene-infected 1X10 5 K562 cells, and the plate was spun at 37℃and 900g for 90 minutes. Cells were centrifuged 4 hours after infection and the pellet was incubated with anti-culture medium. The K562 cells after two days of continuous infection need to be cultured for two more days, then flow staining is used to mark the cells expressing the Thy1.1 surface molecules as PE + (antibody 1:200 dilution), and a single cell sorting technology is used to obtain two K562 single cells of 96-well plate PE-Thy1.1 +. After two weeks of culture, RNA from the cell population generated by each single cell clone was collected and subjected to RT-qPCR experiments, respectively. The cell line with the highest expression of dCS 9-AIDX was used for subsequent screening of Gleevec resistance sites and mutation types.
Meanwhile, in order to screen out Gleevec drug-resistant sites, we designed sgRNA for the genomic region of the ABL gene where Exon6 six is located. A total of 16 sgRNAs (target region sequences shown as SEQ ID NOS: 49-64, respectively) were designed, 6 of which were targeted to the intron region adjacent to Exon Exon6, 10 of which were directly targeted to the Exon6 region and covered 83% of the Exon sequence. Since the mutation of T315I has been identified as one of the most dominant mutations responsible for Gleevec resistance, we designed a sgRNA with only 1 site (944C) capable of covering the T315I mutation, which served as a positive control. Meanwhile, we designed 3 sgRNAs as negative controls for the genomic sequence of AAVS1 gene unrelated to Gleevec resistance (target region sequence shown as SEQ ID NO: 18-20). These sgRNA sequences were all chemically synthesized, digested with BamH1 and HindIII, and finally cloned into pSUPER-sgRNA vector carrying the H1 promoter. We settled 16 Exon6 sgRNA plasmids or 3 AAVS1 sgRNA plasmids equally mixed by phenol chloroform-ethanol settling method to make the final concentration of the mixed plasmids above 1.5 ug/ul. Subsequently, K562 cell lines stably expressing dCAS9-AIDX were electrotransformed with a pool of mixed sgRNAs of ABL-Exon6 and AAVS1, respectively, using the Neo electrotransducer from Life Technology, USA. K562 cells were first cultured with IMDM medium without anti-10% FBS 12-24 hours prior to electrotransformation, and on the day of electrotransformation, two 1.2X10 6 K562 cells were transfected with 8ug of equally mixed sgRNA of ABL-Exon6 or AAVS1, respectively, under conditions of 1000V voltage, single pulse, 50ms shock time. Since the pSUPER-sgRNA plasmid vector carries the puromycin resistance gene, cells expressing sgRNA were screened 24 hours after transfection by adding 2ug/ml puromycin. Puromycin treatment was removed 48 hours and K562 cells continued to be grown in expansion. The DNA and RNA of the cells from 2X10 5 were collected on the sixth day after transfection for high throughput sequencing and used as Input controls, and the remaining cells were split into two portions, treated with 10uM Gleevec drug or equivalent volumes of DMSO, respectively. Ficoll was performed every three days to remove dead cells until the number of cells was below 2X10 4. Under Gleevec drug treatment, control cells transfected with AAVS1 sgRNA died substantially all around 7-10 days, while experimental cells transfected with ABL-Exon6 sgRNA were able to continue to proliferate. On days 36-40 or so after transfection, the experimental group cells proliferated to an order of magnitude of 10 7 (FIG. 14, b). DNA and RNA from Gleevec-treated and DMSO-treated cells were collected simultaneously for high throughput sequencing analysis. Sequencing results showed that there was a mutation of T315I in 30% of the cells, which is a known drug resistance mutation found in patients, in addition to a number of unreported point mutations (FIGS. 14, c and d).
Example 13 application of dCAS9-AIDX to in vitro enhancement of affinity and specificity of antibodies
Antibodies can specifically recognize antigens as pharmaceutical proteins for the treatment of various diseases. The affinity of an antibody is proportional to the somatic mutation that it produces in the germinal center in vivo, and in general, high affinity antibodies all have multiple somatic high frequency mutations. Thus, dCas9-AIDX can be used to mutate against antibody genes to screen antibodies with greater affinity or other characteristics (e.g., better specificity, etc.).
The use scheme is as follows, antibody molecules are stably expressed on the surface of 293T cells, then the 293T cells are transfected simultaneously with sgRNA and dCAS9-AIDX aiming at antibody genes, then the cell surface is stained, the more strongly stained cells are stained, and the mutated antibody molecules have stronger affinity.
This example uses Flp-In TM -293 cells from Invitrogen that stably express a lacZ-ZeocinTM fusion locus. First, cDNA sequences of a low affinity mouse IgG1 antibody (K D =2.78E-09M) against chicken egg lysozyme (HEL) were synthesized, and the coding sequence of the H2Kk protein transmembrane region sequence was ligated to add the H2Kk protein transmembrane region sequence at the end of the antibody, and the resulting DNA sequence was cloned as in a pcDNA5/FRT/GOI vector (LIFE SCIENCE Technology, USA). The vector was transferred into Flp-In TM -293 cells, and the IgG1 coding sequence containing the Flp recombination target site was integrated into the lacZ-ZeocinTM fusion locus by Flp recombinase using the Flp-In TM system contained In the Flp-In TM -293 cells. Cells that have not been successfully integrated are capable of expressing the anti-Zeocin protein, whereas after successful integration the anti-Zeocin protein is not capable of expressing due to the lack of the start codon ATG, but is capable of expressing the anti-hygromycin protein. Thus, 293 cells successfully integrated with IgG1 were screened using hygromycin antibiotics, in which cells only one copy of the anti-HEL-IgG 1 gene was expressed per cell.
Next, 16 suitable PAM sequences were selected for each 3 CDRs of the IgG1 heavy and light chains, respectively, and the sgrnas shown below were designed (SEQ ID NOs: 73-88) such that at least 2 sgrnas were covered for the CDRs of each heavy or light chain:
IgH
CDR1_1:TCCCTCACCTGTTCTGTCAC(SEQ ID NO:73);
CDR1_2:GCTCCAGTAATCACTGGTGA(SEQ ID NO:74);
CDR1_3:GATCCAGCTCCAGTAATCAC(SEQ ID NO:75);
CDR1_4:GTGATTACTGGAGCTGGATC(SEQ ID NO:76);
CDR2_1:ATGGGGTACGTAAGCTACAG(SEQ ID NO:77);
CDR2_2:GAGATTCGACTTTTGAGAGA(SEQ ID NO:78);
CDR3_1:TATTACTGTGCAAACTGGGA(SEQ ID NO:79);
CDR3_2:CAAACTGGGACGGTGATTAC(SEQ ID NO:80);
CDR3_3:GACGGTGATTACTGGGGCCA(SEQ ID NO:81);
IgL
CDR1_1:GTTGTTGCCAATACTTTGGC(SEQ ID NO:82);
CDR1_2:ATAGCGTCAGTCTTTCCTGC(SEQ ID NO:83);
CDR1_3:GTATTGGCAACAACCTACAC(SEQ ID NO:84);
CDR2_1:AGGGGATCCCAGAGATGGAC(SEQ ID NO:85);
CDR2_2:TATGCTTCCCAGTCCATCTC(SEQ ID NO:86);
CDR3_1:TCTGTCAACAGAGTAACAGC(SEQ ID NO:87);
CDR3_2:GTCCCCCCTCCGAACGTGTA(SEQ ID NO:88)。
The sgRNA sequence was then cloned into the pSUPER-puro plasmid vector (adedge). MO91-dCas9 (3 x flag, nls) -AIDX plasmid constructed in example 3 and the library of sgrnas (i.e. 16 sgrnas mixed together in equal amounts) or the sgrnas of control gene AAVS1 were co-transfected into 293 cells expressing IgG1 obtained as described above, screened with puromycin and blasticidin antibiotics, surface stained with PE anti-mouse IgG and Alex647-HEL on day 7 after transfection, and flow-sorted to separate cells with unchanged IgG intensity and increased binding to HEL antigen. After proliferation by culture, the mutation on the DNA was first subjected to high throughput sequencing analysis, and the result was substantially identical to the mutation on ABL gene or GFP gene herein (fig. 15). dCAS9-AIDX induced base mutations in the anti-HEL IgG1 variable region and reproducibly induced base mutations in the IgG1 CDRs (FIG. 16).
The mutated cells were then examined on a flow cytometer using PE anti-mouse IgG1 and 647-HEL surface staining and a small population of cells were found to have unchanged IgG1 expression and increased HEL binding. The population of cells was then flow sorted and amplified, and compared to cells prior to mutation, the mutated antibodies were found to have an affinity for HEL that was increased by more than 10-fold (FIG. 17).
Appropriate amounts of cell-extracted genomic DNA were then harvested and sequenced, and the major cause of increased affinity was found to be mutation from glycine at position 52 of the light chain to aspartic acid (base GGT to GAT, FIG. 15).
Example 14 preparation of other fusion proteins
1. Construction of plasmids
(1) Synthesizing an XTEN linker sequence by gene synthesis;
(2) The MO91-dCAS9-AIDX plasmid constructed in example 2 was digested with restriction enzymes, and the vector, AIDX fragment and dCAS9 fragment were recovered;
(3) Respectively connecting the AIDX fragment, the dCAS9 fragment and the XTEN linker sequence after enzyme digestion with an MO91 vector, and then converting the connection products into Stbl3 competent cells;
(4) Selecting positive clone, extracting plasmid and carrying out sequencing verification, thus completing construction of MO91-dCAS9-XTEN-AIDX plasmid;
Plasmids MO91-AIDX-XTEN-dCAS9, MO91-dCAS9-XTEN-AIDX (K10E T I E156G) and MO91-nCas9-AIDX were constructed as described above with reference to the procedure described above and examples 1 and 2.
When cloning of 3 x flag and/or NLS fragments is desired, 3 x flag and/or NLS fragments can be cloned into the above plasmid by the method of example 3 to obtain plasmids expressing the fusion proteins shown in SEQ ID NOS 66, 68, 70 and 72, respectively. AIDX of these fusion proteins are truncated AID fragments or mutants thereof from amino acid residue 183.
2. Expression and purification of recombinant proteins
(1) Constructing a plasmid pET-nCas9-AIDX-6His according to a conventional method, and then transforming an escherichia coli BL21 STAR-competent cell by using the plasmid;
(2) The resulting expression strain was grown overnight at 37℃in LB medium containing 100. Mu.g/ml kanamycin. Cells were diluted 1:100 into 2xYT medium and grown to OD 600 = -0.6 at 37 ℃. Cooling the culture to 4 ℃ within 2 hours, adding IPTG 0.5mM, and inducing protein expression for 16 hours;
(3) Cells were collected by centrifugation at 4000g for 15 min and resuspended in lysis buffer;
(4) Cells were lysed with cell disruption agent (Union) at 800 bar for 5min, and the lysate supernatant was separated after centrifugation for 15 min;
(5) Lysates were incubated with Ni-NTA (1 ml slurry/L bacteria) (DP 101, transGen) at 4℃for 1 hour to capture His-tagged fusion proteins, resin was transferred to a column and extensively washed with cold wash buffer (extent of color change not observed with Coomassie G250);
(6) His-tagged fusion proteins were eluted in elution buffer and concentrated to a total volume of 1ml by ultrafiltration (Amicon-Millipore, 100kDa molecular weight cut-off);
(7) Protein was diluted to 20ml in buffer A and loaded onto Hi-Trap SP column (29051324,GE Healthcare) and eluted with a 100mM-1M NaCl gradient;
(8) The eluted fraction containing nCas-AIDX was concentrated to about 1ml and purified by using Superdex 20010/300GL column (17517501, GE medical);
(9) The eluted protein was concentrated to about 3mg/ml, flash frozen in liquid nitrogen and stored at-80 ℃.
The electropherograms inducing nCas-AIDX expression in bacteria are shown in FIG. 18.
3. Functional testing of different fusion proteins
The function of the different fusion proteins of this example was tested in the same manner as in example 10. The results are shown in FIGS. 19-21.
Sequence listing
<110> Shanghai nutrition and health institute of China academy of sciences
<120> Fusion proteins producing point mutations in cells, their preparation and use
<130> 162593Z1
<160> 95
<170> PatentIn version 3.3
<210> 1
<211> 4989
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence coding sequence of dCAS9-AID
<400> 1
atggactata aggaccacga cggagactac aaggatcatg atattgatta caaagacgat 60
gacgataaga tggccccaaa gaagaagcgg aaggtcggta tccacggagt cccagcagct 120
accatggaca agaagtattc tatcggactg gccatcggga ctaatagcgt cgggtgggcc 180
gtgatcactg acgagtacaa ggtgccctct aagaagttca aggtgctcgg gaacaccgac 240
cggcattcca tcaagaaaaa tctgatcgga gctctcctct ttgattcagg ggagaccgct 300
gaagcaaccc gcctcaagcg gactgctaga cggcggtaca ccaggaggaa gaaccggatt 360
tgttaccttc aagagatatt ctccaacgaa atggcaaagg tcgacgacag cttcttccat 420
aggctggaag aatcattcct cgtggaagag gataagaagc atgaacggca tcccatcttc 480
ggtaatatcg tcgacgaggt ggcctatcac gagaaatacc caaccatcta ccatcttcgc 540
aaaaagctgg tggactcaac cgacaaggca gacctccggc ttatctacct ggccctggcc 600
cacatgatca agttcagagg ccacttcctg atcgagggcg acctcaatcc tgacaatagc 660
gatgtggata aactgttcat ccagctggtg cagacttaca accagctctt tgaagagaac 720
cccatcaatg caagcggagt cgatgccaag gccattctgt cagcccggct gtcaaagagc 780
cgcagacttg agaatcttat cgctcagctg ccgggtgaaa agaaaaatgg actgttcggg 840
aacctgattg ctctttcact tgggctgact cccaatttca agtctaattt cgacctggca 900
gaggatgcca agctgcaact gtccaaggac acctatgatg acgatctcga caacctcctg 960
gcccagatcg gtgaccaata cgccgacctt ttccttgctg ctaagaatct ttctgacgcc 1020
atcctgctgt ctgacattct ccgcgtgaac actgaaatca ccaaggcccc tctttcagct 1080
tcaatgatta agcggtatga tgagcaccac caggacctga ccctgcttaa ggcactcgtc 1140
cggcagcagc ttccggagaa gtacaaggaa atcttctttg accagtcaaa gaatggatac 1200
gccggctaca tcgacggagg tgcctcccaa gaggaatttt ataagtttat caaacctatc 1260
cttgagaaga tggacggcac cgaagagctc ctcgtgaaac tgaatcggga ggatctgctg 1320
cggaagcagc gcactttcga caatgggagc attccccacc agatccatct tggggagctt 1380
cacgccatcc ttcggcgcca agaggacttc tacccctttc ttaaggacaa cagggagaag 1440
attgagaaaa ttctcacttt ccgcatcccc tactacgtgg gacccctcgc cagaggaaat 1500
agccggtttg cttggatgac cagaaagtca gaagaaacta tcactccctg gaacttcgaa 1560
gaggtggtgg acaagggagc cagcgctcag tcattcatcg aacggatgac taacttcgat 1620
aagaacctcc ccaatgagaa ggtcctgccg aaacattccc tgctctacga gtactttacc 1680
gtgtacaacg agctgaccaa ggtgaaatat gtcaccgaag ggatgaggaa gcccgcattc 1740
ctgtcaggcg aacaaaagaa ggcaattgtg gaccttctgt tcaagaccaa tagaaaggtg 1800
accgtgaagc agctgaagga ggactatttc aagaaaattg aatgcttcga ctctgtggag 1860
attagcgggg tcgaagatcg gttcaacgca agcctgggta cctaccatga tctgcttaag 1920
atcatcaagg acaaggattt tctggacaat gaggagaacg aggacatcct tgaggacatt 1980
gtcctgactc tcactctgtt cgaggaccgg gaaatgatcg aggagaggct taagacctac 2040
gcccatctgt tcgacgataa agtgatgaag caacttaaac ggagaagata taccggatgg 2100
ggacgcctta gccgcaaact catcaacgga atccgggaca aacagagcgg aaagaccatt 2160
cttgatttcc ttaagagcga cggattcgct aatcgcaact tcatgcaact tatccatgat 2220
gattccctga cctttaagga ggacatccag aaggcccaag tgtctggaca aggtgactca 2280
ctgcacgagc atatcgcaaa tctggctggt tcacccgcta ttaagaaggg tattctccag 2340
accgtgaaag tcgtggacga gctggtcaag gtgatgggtc gccataaacc agagaacatt 2400
gtcatcgaga tggccaggga aaaccagact acccagaagg gacagaagaa cagcagggag 2460
cggatgaaaa gaattgagga agggattaag gagctcgggt cacagatcct taaagagcac 2520
ccggtggaaa acacccagct tcagaatgag aagctctatc tgtactacct tcaaaatgga 2580
cgcgatatgt atgtggacca agagcttgat atcaacaggc tctcagacta cgacgtggac 2640
gccatcgtcc ctcagagctt cctcaaagac gactcaattg acaataaggt gctgactcgc 2700
tcagacaaga accggggaaa gtcagataac gtgccctcag aggaagtcgt gaaaaagatg 2760
aagaactatt ggcgccagct tctgaacgca aagctgatca ctcagcggaa gttcgacaat 2820
ctcactaagg ctgagagggg cggactgagc gaactggaca aagcaggatt cattaaacgg 2880
caacttgtgg agactcggca gattactaaa catgtcgccc aaatccttga ctcacgcatg 2940
aataccaagt acgacgaaaa cgacaaactt atccgcgagg tgaaggtgat taccctgaag 3000
tccaagctgg tcagcgattt cagaaaggac tttcaattct acaaagtgcg ggagatcaat 3060
aactatcatc atgctcatga cgcatatctg aatgccgtgg tgggaaccgc cctgatcaag 3120
aagtacccaa agctggaaag cgagttcgtg tacggagact acaaggtcta cgacgtgcgc 3180
aagatgattg ccaaatctga gcaggagatc ggaaaggcca ccgcaaagta cttcttctac 3240
agcaacatca tgaatttctt caagaccgaa atcacccttg caaacggtga gatccggaag 3300
aggccgctca tcgagactaa tggggagact ggcgaaatcg tgtgggacaa gggcagagat 3360
ttcgctaccg tgcgcaaagt gctttctatg cctcaagtga acatcgtgaa gaaaaccgag 3420
gtgcaaaccg gaggcttttc taaggaatca atcctcccca agcgcaactc cgacaagctc 3480
attgcaagga agaaggattg ggaccctaag aagtacggcg gattcgattc accaactgtg 3540
gcttattctg tcctggtcgt ggctaaggtg gaaaaaggaa agtctaagaa gctcaagagc 3600
gtgaaggaac tgctgggtat caccattatg gagcgcagct ccttcgagaa gaacccaatt 3660
gactttctcg aagccaaagg ttacaaggaa gtcaagaagg accttatcat caagctccca 3720
aagtatagcc tgttcgaact ggagaatggg cggaagcgga tgctcgcctc cgctggcgaa 3780
cttcagaagg gtaatgagct ggctctcccc tccaagtacg tgaatttcct ctaccttgca 3840
agccattacg agaagctgaa ggggagcccc gaggacaacg agcaaaagca actgtttgtg 3900
gagcagcata agcattatct ggacgagatc attgagcaga tttccgagtt ttctaaacgc 3960
gtcattctcg ctgatgccaa cctcgataaa gtccttagcg catacaataa gcacagagac 4020
aaaccaattc gggagcaggc tgagaatatc atccacctgt tcaccctcac caatcttggt 4080
gcccctgccg cattcaagta cttcgacacc accatcgacc ggaaacgcta tacctccacc 4140
aaagaagtgc tggacgccac cctcatccac cagagcatca ccggacttta cgaaactcgg 4200
attgacctct cacagctcgg aggggatgag ggagctccca agaaaaagcg caaggtaggt 4260
agttccggat ctccgaaaaa gaaacgcaaa gttggtagtg atgctttaga cgattttgac 4320
ttagatatgc ttggttcaga cgcgttagac gacttcggtg gaggatccat ggacagcctc 4380
ttgatgaacc ggaggaagtt tctttaccaa ttcaaaaatg tccgctgggc taagggtcgg 4440
cgtgagacct acctgtgcta cgtagtgaag aggcgtgaca gtgctacatc cttttcactg 4500
gactttggtt atcttcgcaa taagaacggc tgccacgtgg aattgctctt cctccgctac 4560
atctcggact gggacctaga ccctggccgc tgctaccgcg tcacctggtt cacctcctgg 4620
agcccctgct acgactgtgc ccgacatgtg gccgactttc tgcgagggaa ccccaacctc 4680
agtctgagga tcttcaccgc gcgcctctac ttctgtgagg accgcaaggc tgagcccgag 4740
gggctgcggc ggctgcaccg cgccggggtg caaatagcca tcatgacctt caaagattat 4800
ttttactgct ggaatacttt tgtagaaaac catgaaagaa ctttcaaagc ctgggaaggg 4860
ctgcatgaaa attcagttcg tctctccaga cagcttcggc gcatcctttt gcccctgtat 4920
gaggttgatg acttacgaga cgcatttcgt acttggggac gtgattacaa agacgatgac 4980
gataagtga 4989
<210> 2
<211> 1662
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence amino acid sequence of dCAS9-AID
<400> 2
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Thr Met Asp Lys Lys Tyr Ser Ile
35 40 45
Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp
50 55 60
Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp
65 70 75 80
Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser
85 90 95
Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg
100 105 110
Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser
115 120 125
Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
130 135 140
Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe
145 150 155 160
Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile
165 170 175
Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu
180 185 190
Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His
195 200 205
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys
210 215 220
Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn
225 230 235 240
Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg
245 250 255
Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly
260 265 270
Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly
275 280 285
Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys
290 295 300
Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu
305 310 315 320
Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn
325 330 335
Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
340 345 350
Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu
355 360 365
His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu
370 375 380
Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
385 390 395 400
Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe
405 410 415
Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val
420 425 430
Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
435 440 445
Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu
450 455 460
Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys
465 470 475 480
Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu
485 490 495
Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu
500 505 510
Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser
515 520 525
Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro
530 535 540
Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr
545 550 555 560
Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg
565 570 575
Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu
580 585 590
Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
595 600 605
Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val
610 615 620
Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys
625 630 635 640
Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile
645 650 655
Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met
660 665 670
Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val
675 680 685
Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser
690 695 700
Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile
705 710 715 720
Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
725 730 735
Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala
740 745 750
Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu
755 760 765
Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
770 775 780
Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile
785 790 795 800
Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys
805 810 815
Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu
820 825 830
Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
835 840 845
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr
850 855 860
Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
865 870 875 880
Ala Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys
885 890 895
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
900 905 910
Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu
915 920 925
Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala
930 935 940
Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg
945 950 955 960
Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu
965 970 975
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
980 985 990
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg
995 1000 1005
Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His
1010 1015 1020
His Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu
1025 1030 1035
Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp
1040 1045 1050
Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln
1055 1060 1065
Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile
1070 1075 1080
Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile
1085 1090 1095
Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile
1100 1105 1110
Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu
1115 1120 1125
Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr
1130 1135 1140
Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp
1145 1150 1155
Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly
1160 1165 1170
Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala
1175 1180 1185
Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu
1190 1195 1200
Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn
1205 1210 1215
Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys
1220 1225 1230
Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu
1235 1240 1245
Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys
1250 1255 1260
Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr
1265 1270 1275
Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn
1280 1285 1290
Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp
1295 1300 1305
Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu
1310 1315 1320
Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His
1325 1330 1335
Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu
1340 1345 1350
Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe
1355 1360 1365
Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val
1370 1375 1380
Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu
1385 1390 1395
Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Glu Gly Ala Pro
1400 1405 1410
Lys Lys Lys Arg Lys Val Gly Ser Ser Gly Ser Pro Lys Lys Lys
1415 1420 1425
Arg Lys Val Gly Ser Asp Ala Leu Asp Asp Phe Asp Leu Asp Met
1430 1435 1440
Leu Gly Ser Asp Ala Leu Asp Asp Phe Gly Gly Gly Ser Met Asp
1445 1450 1455
Ser Leu Leu Met Asn Arg Arg Lys Phe Leu Tyr Gln Phe Lys Asn
1460 1465 1470
Val Arg Trp Ala Lys Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val
1475 1480 1485
Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp Phe Gly
1490 1495 1500
Tyr Leu Arg Asn Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu
1505 1510 1515
Arg Tyr Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr Arg
1520 1525 1530
Val Thr Trp Phe Thr Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg
1535 1540 1545
His Val Ala Asp Phe Leu Arg Gly Asn Pro Asn Leu Ser Leu Arg
1550 1555 1560
Ile Phe Thr Ala Arg Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu
1565 1570 1575
Pro Glu Gly Leu Arg Arg Leu His Arg Ala Gly Val Gln Ile Ala
1580 1585 1590
Ile Met Thr Phe Lys Asp Tyr Phe Tyr Cys Trp Asn Thr Phe Val
1595 1600 1605
Glu Asn His Glu Arg Thr Phe Lys Ala Trp Glu Gly Leu His Glu
1610 1615 1620
Asn Ser Val Arg Leu Ser Arg Gln Leu Arg Arg Ile Leu Leu Pro
1625 1630 1635
Leu Tyr Glu Val Asp Asp Leu Arg Asp Ala Phe Arg Thr Trp Gly
1640 1645 1650
Arg Asp Tyr Lys Asp Asp Asp Asp Lys
1655 1660
<210> 3
<211> 4941
<212> DNA
<213> Artificial sequence
<220>
<223> Description of the artificial sequence coding sequence of dCAS9-AIDX
<400> 3
atggactata aggaccacga cggagactac aaggatcatg atattgatta caaagacgat 60
gacgataaga tggccccaaa gaagaagcgg aaggtcggta tccacggagt cccagcagct 120
accatggaca agaagtattc tatcggactg gccatcggga ctaatagcgt cgggtgggcc 180
gtgatcactg acgagtacaa ggtgccctct aagaagttca aggtgctcgg gaacaccgac 240
cggcattcca tcaagaaaaa tctgatcgga gctctcctct ttgattcagg ggagaccgct 300
gaagcaaccc gcctcaagcg gactgctaga cggcggtaca ccaggaggaa gaaccggatt 360
tgttaccttc aagagatatt ctccaacgaa atggcaaagg tcgacgacag cttcttccat 420
aggctggaag aatcattcct cgtggaagag gataagaagc atgaacggca tcccatcttc 480
ggtaatatcg tcgacgaggt ggcctatcac gagaaatacc caaccatcta ccatcttcgc 540
aaaaagctgg tggactcaac cgacaaggca gacctccggc ttatctacct ggccctggcc 600
cacatgatca agttcagagg ccacttcctg atcgagggcg acctcaatcc tgacaatagc 660
gatgtggata aactgttcat ccagctggtg cagacttaca accagctctt tgaagagaac 720
cccatcaatg caagcggagt cgatgccaag gccattctgt cagcccggct gtcaaagagc 780
cgcagacttg agaatcttat cgctcagctg ccgggtgaaa agaaaaatgg actgttcggg 840
aacctgattg ctctttcact tgggctgact cccaatttca agtctaattt cgacctggca 900
gaggatgcca agctgcaact gtccaaggac acctatgatg acgatctcga caacctcctg 960
gcccagatcg gtgaccaata cgccgacctt ttccttgctg ctaagaatct ttctgacgcc 1020
atcctgctgt ctgacattct ccgcgtgaac actgaaatca ccaaggcccc tctttcagct 1080
tcaatgatta agcggtatga tgagcaccac caggacctga ccctgcttaa ggcactcgtc 1140
cggcagcagc ttccggagaa gtacaaggaa atcttctttg accagtcaaa gaatggatac 1200
gccggctaca tcgacggagg tgcctcccaa gaggaatttt ataagtttat caaacctatc 1260
cttgagaaga tggacggcac cgaagagctc ctcgtgaaac tgaatcggga ggatctgctg 1320
cggaagcagc gcactttcga caatgggagc attccccacc agatccatct tggggagctt 1380
cacgccatcc ttcggcgcca agaggacttc tacccctttc ttaaggacaa cagggagaag 1440
attgagaaaa ttctcacttt ccgcatcccc tactacgtgg gacccctcgc cagaggaaat 1500
agccggtttg cttggatgac cagaaagtca gaagaaacta tcactccctg gaacttcgaa 1560
gaggtggtgg acaagggagc cagcgctcag tcattcatcg aacggatgac taacttcgat 1620
aagaacctcc ccaatgagaa ggtcctgccg aaacattccc tgctctacga gtactttacc 1680
gtgtacaacg agctgaccaa ggtgaaatat gtcaccgaag ggatgaggaa gcccgcattc 1740
ctgtcaggcg aacaaaagaa ggcaattgtg gaccttctgt tcaagaccaa tagaaaggtg 1800
accgtgaagc agctgaagga ggactatttc aagaaaattg aatgcttcga ctctgtggag 1860
attagcgggg tcgaagatcg gttcaacgca agcctgggta cctaccatga tctgcttaag 1920
atcatcaagg acaaggattt tctggacaat gaggagaacg aggacatcct tgaggacatt 1980
gtcctgactc tcactctgtt cgaggaccgg gaaatgatcg aggagaggct taagacctac 2040
gcccatctgt tcgacgataa agtgatgaag caacttaaac ggagaagata taccggatgg 2100
ggacgcctta gccgcaaact catcaacgga atccgggaca aacagagcgg aaagaccatt 2160
cttgatttcc ttaagagcga cggattcgct aatcgcaact tcatgcaact tatccatgat 2220
gattccctga cctttaagga ggacatccag aaggcccaag tgtctggaca aggtgactca 2280
ctgcacgagc atatcgcaaa tctggctggt tcacccgcta ttaagaaggg tattctccag 2340
accgtgaaag tcgtggacga gctggtcaag gtgatgggtc gccataaacc agagaacatt 2400
gtcatcgaga tggccaggga aaaccagact acccagaagg gacagaagaa cagcagggag 2460
cggatgaaaa gaattgagga agggattaag gagctcgggt cacagatcct taaagagcac 2520
ccggtggaaa acacccagct tcagaatgag aagctctatc tgtactacct tcaaaatgga 2580
cgcgatatgt atgtggacca agagcttgat atcaacaggc tctcagacta cgacgtggac 2640
gccatcgtcc ctcagagctt cctcaaagac gactcaattg acaataaggt gctgactcgc 2700
tcagacaaga accggggaaa gtcagataac gtgccctcag aggaagtcgt gaaaaagatg 2760
aagaactatt ggcgccagct tctgaacgca aagctgatca ctcagcggaa gttcgacaat 2820
ctcactaagg ctgagagggg cggactgagc gaactggaca aagcaggatt cattaaacgg 2880
caacttgtgg agactcggca gattactaaa catgtcgccc aaatccttga ctcacgcatg 2940
aataccaagt acgacgaaaa cgacaaactt atccgcgagg tgaaggtgat taccctgaag 3000
tccaagctgg tcagcgattt cagaaaggac tttcaattct acaaagtgcg ggagatcaat 3060
aactatcatc atgctcatga cgcatatctg aatgccgtgg tgggaaccgc cctgatcaag 3120
aagtacccaa agctggaaag cgagttcgtg tacggagact acaaggtcta cgacgtgcgc 3180
aagatgattg ccaaatctga gcaggagatc ggaaaggcca ccgcaaagta cttcttctac 3240
agcaacatca tgaatttctt caagaccgaa atcacccttg caaacggtga gatccggaag 3300
aggccgctca tcgagactaa tggggagact ggcgaaatcg tgtgggacaa gggcagagat 3360
ttcgctaccg tgcgcaaagt gctttctatg cctcaagtga acatcgtgaa gaaaaccgag 3420
gtgcaaaccg gaggcttttc taaggaatca atcctcccca agcgcaactc cgacaagctc 3480
attgcaagga agaaggattg ggaccctaag aagtacggcg gattcgattc accaactgtg 3540
gcttattctg tcctggtcgt ggctaaggtg gaaaaaggaa agtctaagaa gctcaagagc 3600
gtgaaggaac tgctgggtat caccattatg gagcgcagct ccttcgagaa gaacccaatt 3660
gactttctcg aagccaaagg ttacaaggaa gtcaagaagg accttatcat caagctccca 3720
aagtatagcc tgttcgaact ggagaatggg cggaagcgga tgctcgcctc cgctggcgaa 3780
cttcagaagg gtaatgagct ggctctcccc tccaagtacg tgaatttcct ctaccttgca 3840
agccattacg agaagctgaa ggggagcccc gaggacaacg agcaaaagca actgtttgtg 3900
gagcagcata agcattatct ggacgagatc attgagcaga tttccgagtt ttctaaacgc 3960
gtcattctcg ctgatgccaa cctcgataaa gtccttagcg catacaataa gcacagagac 4020
aaaccaattc gggagcaggc tgagaatatc atccacctgt tcaccctcac caatcttggt 4080
gcccctgccg cattcaagta cttcgacacc accatcgacc ggaaacgcta tacctccacc 4140
aaagaagtgc tggacgccac cctcatccac cagagcatca ccggacttta cgaaactcgg 4200
attgacctct cacagctcgg aggggatgag ggagctccca agaaaaagcg caaggtaggt 4260
agttccggat ctccgaaaaa gaaacgcaaa gttggtagtg atgctttaga cgattttgac 4320
ttagatatgc ttggttcaga cgcgttagac gacttcggtg gaggatccat ggacagcctc 4380
ttgatgaacc ggaggaagtt tctttaccaa ttcaaaaatg tccgctgggc taagggtcgg 4440
cgtgagacct acctgtgcta cgtagtgaag aggcgtgaca gtgctacatc cttttcactg 4500
gactttggtt atcttcgcaa taagaacggc tgccacgtgg aattgctctt cctccgctac 4560
atctcggact gggacctaga ccctggccgc tgctaccgcg tcacctggtt cacctcctgg 4620
agcccctgct acgactgtgc ccgacatgtg gccgactttc tgcgagggaa ccccaacctc 4680
agtctgagga tcttcaccgc gcgcctctac ttctgtgagg accgcaaggc tgagcccgag 4740
gggctgcggc ggctgcaccg cgccggggtg caaatagcca tcatgacctt caaagattat 4800
ttttactgct ggaatacttt tgtagaaaac catgaaagaa ctttcaaagc ctgggaaggg 4860
ctgcatgaaa attcagttcg tctctccaga cagcttcggc gcatcctttt gcccgattac 4920
aaagacgatg acgataagtg a 4941
<210> 4
<211> 1646
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence amino acid sequence of dCAS9-AIDX
<400> 4
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Thr Met Asp Lys Lys Tyr Ser Ile
35 40 45
Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp
50 55 60
Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp
65 70 75 80
Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser
85 90 95
Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg
100 105 110
Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser
115 120 125
Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
130 135 140
Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe
145 150 155 160
Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile
165 170 175
Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu
180 185 190
Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His
195 200 205
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys
210 215 220
Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn
225 230 235 240
Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg
245 250 255
Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly
260 265 270
Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly
275 280 285
Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys
290 295 300
Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu
305 310 315 320
Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn
325 330 335
Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
340 345 350
Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu
355 360 365
His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu
370 375 380
Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
385 390 395 400
Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe
405 410 415
Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val
420 425 430
Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
435 440 445
Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu
450 455 460
Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys
465 470 475 480
Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu
485 490 495
Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu
500 505 510
Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser
515 520 525
Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro
530 535 540
Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr
545 550 555 560
Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg
565 570 575
Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu
580 585 590
Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
595 600 605
Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val
610 615 620
Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys
625 630 635 640
Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile
645 650 655
Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met
660 665 670
Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val
675 680 685
Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser
690 695 700
Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile
705 710 715 720
Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
725 730 735
Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala
740 745 750
Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu
755 760 765
Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
770 775 780
Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile
785 790 795 800
Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys
805 810 815
Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu
820 825 830
Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
835 840 845
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr
850 855 860
Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
865 870 875 880
Ala Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys
885 890 895
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
900 905 910
Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu
915 920 925
Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala
930 935 940
Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg
945 950 955 960
Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu
965 970 975
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
980 985 990
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg
995 1000 1005
Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His
1010 1015 1020
His Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu
1025 1030 1035
Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp
1040 1045 1050
Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln
1055 1060 1065
Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile
1070 1075 1080
Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile
1085 1090 1095
Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile
1100 1105 1110
Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu
1115 1120 1125
Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr
1130 1135 1140
Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp
1145 1150 1155
Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly
1160 1165 1170
Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala
1175 1180 1185
Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu
1190 1195 1200
Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn
1205 1210 1215
Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys
1220 1225 1230
Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu
1235 1240 1245
Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys
1250 1255 1260
Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr
1265 1270 1275
Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn
1280 1285 1290
Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp
1295 1300 1305
Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu
1310 1315 1320
Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His
1325 1330 1335
Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu
1340 1345 1350
Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe
1355 1360 1365
Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val
1370 1375 1380
Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu
1385 1390 1395
Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Glu Gly Ala Pro
1400 1405 1410
Lys Lys Lys Arg Lys Val Gly Ser Ser Gly Ser Pro Lys Lys Lys
1415 1420 1425
Arg Lys Val Gly Ser Asp Ala Leu Asp Asp Phe Asp Leu Asp Met
1430 1435 1440
Leu Gly Ser Asp Ala Leu Asp Asp Phe Gly Gly Gly Ser Met Asp
1445 1450 1455
Ser Leu Leu Met Asn Arg Arg Lys Phe Leu Tyr Gln Phe Lys Asn
1460 1465 1470
Val Arg Trp Ala Lys Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val
1475 1480 1485
Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp Phe Gly
1490 1495 1500
Tyr Leu Arg Asn Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu
1505 1510 1515
Arg Tyr Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr Arg
1520 1525 1530
Val Thr Trp Phe Thr Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg
1535 1540 1545
His Val Ala Asp Phe Leu Arg Gly Asn Pro Asn Leu Ser Leu Arg
1550 1555 1560
Ile Phe Thr Ala Arg Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu
1565 1570 1575
Pro Glu Gly Leu Arg Arg Leu His Arg Ala Gly Val Gln Ile Ala
1580 1585 1590
Ile Met Thr Phe Lys Asp Tyr Phe Tyr Cys Trp Asn Thr Phe Val
1595 1600 1605
Glu Asn His Glu Arg Thr Phe Lys Ala Trp Glu Gly Leu His Glu
1610 1615 1620
Asn Ser Val Arg Leu Ser Arg Gln Leu Arg Arg Ile Leu Leu Pro
1625 1630 1635
Asp Tyr Lys Asp Asp Asp Asp Lys
1640 1645
<210> 5
<211> 28
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence, primer
<400> 5
gcggatccat ggacagcctc ttgatgaa 28
<210> 6
<211> 54
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence, primer
<400> 6
actcgagtca cttatcgtca tcgtctttgt aatcacgtcc ccaagtacga aatg 54
<210> 7
<211> 55
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence, primer
<400> 7
gactcgagtc acttatcgtc atcgtctttg taatcgggca aaaggatgcg ccgaa 55
<210> 8
<211> 34
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence, primer
<400> 8
gcagatctac catggacaag aagtattcta tcgg 34
<210> 9
<211> 35
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence, primer
<400> 9
gactcgagtc acttatcgtc atcgtctttg taatc 35
<210> 10
<211> 33
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence, primer
<400> 10
aaactcgagt gtacaaaaaa gcaggcttta aag 33
<210> 11
<211> 37
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence, primer
<220>
<221> misc_feature
<222> (2)..(20)
<223> N is a, c, g or t
<400> 11
gnnnnnnnnn nnnnnnnnnn ggtgtttcgt cctttcc 37
<210> 12
<211> 42
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence, primer
<220>
<221> misc_feature
<222> (2)..(20)
<223> N is a, c, g or t
<400> 12
gnnnnnnnnn nnnnnnnnnn gttttagagc tagaaatagc aa 42
<210> 13
<211> 36
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence, primer
<400> 13
aaagctagct aatgccaact ttgtacaaga aagctg 36
<210> 14
<211> 21
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 14
gcatgcccga aggctacgtc c 21
<210> 15
<211> 21
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 15
gcaactagta tacccgcgcc g 21
<210> 16
<211> 21
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 16
gcctcgaact tcacctcggc g 21
<210> 17
<211> 21
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 17
gtcagctcga tgcggttcac c 21
<210> 18
<211> 21
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 18
gattcccagg gccggttaat g 21
<210> 19
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 19
gtcccctcca ccccacagtg 20
<210> 20
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 20
ggggccacta gggacaggat 20
<210> 21
<211> 21
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, linker
<400> 21
Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Leu
1 5 10 15
Gly Ser Thr Glu Phe
20
<210> 22
<211> 21
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, linker
<400> 22
Arg Ser Thr Ser Gly Leu Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly
1 5 10 15
Gly Gly Gly Ser Gly
20
<210> 23
<211> 21
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, linker
<400> 23
Gln Leu Thr Ser Gly Leu Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly
1 5 10 15
Gly Gly Gly Ser Gly
20
<210> 24
<211> 4
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, linker
<400> 24
Gly Gly Gly Ser
1
<210> 25
<211> 5
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, linker
<400> 25
Gly Gly Gly Gly Ser
1 5
<210> 26
<211> 5
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, linker
<400> 26
Ser Ser Ser Ser Gly
1 5
<210> 27
<211> 5
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, linker
<400> 27
Gly Ser Gly Ser Ala
1 5
<210> 28
<211> 20
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, linker
<400> 28
Gly Gly Ser Gly Gly Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly
1 5 10 15
Gly Gly Gly Ser
20
<210> 29
<211> 15
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, linker
<400> 29
Ser Ser Ser Ser Gly Ser Ser Ser Ser Gly Ser Ser Ser Ser Gly
1 5 10 15
<210> 30
<211> 15
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, linker
<400> 30
Gly Ser Gly Ser Ala Gly Ser Gly Ser Ala Gly Ser Gly Ser Ala
1 5 10 15
<210> 31
<211> 15
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, linker
<400> 31
Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly
1 5 10 15
<210> 32
<211> 8
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, FLAG tag
<400> 32
Asp Tyr Lys Asp Asp Asp Asp Lys
1 5
<210> 33
<211> 7
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 33
Pro Lys Lys Lys Arg Lys Val
1 5
<210> 34
<211> 16
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 34
Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys
1 5 10 15
<210> 35
<211> 9
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 35
Pro Ala Ala Lys Arg Val Lys Leu Asp
1 5
<210> 36
<211> 11
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 36
Arg Gln Arg Arg Asn Glu Leu Lys Arg Ser Pro
1 5 10
<210> 37
<211> 38
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 37
Asn Gln Ser Ser Asn Phe Gly Pro Met Lys Gly Gly Asn Phe Gly Gly
1 5 10 15
Arg Ser Ser Gly Pro Tyr Gly Gly Gly Gly Gln Tyr Phe Ala Lys Pro
20 25 30
Arg Asn Gln Gly Gly Tyr
35
<210> 38
<211> 42
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 38
Arg Met Arg Ile Glx Phe Lys Asn Lys Gly Lys Asp Thr Ala Glu Leu
1 5 10 15
Arg Arg Arg Arg Val Glu Val Ser Val Glu Leu Arg Lys Ala Lys Lys
20 25 30
Asp Glu Gln Ile Leu Lys Arg Arg Asn Val
35 40
<210> 39
<211> 8
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 39
Val Ser Arg Lys Arg Pro Arg Pro
1 5
<210> 40
<211> 8
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 40
Pro Pro Lys Lys Ala Arg Glu Asp
1 5
<210> 41
<211> 12
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 41
Ser Ala Leu Ile Lys Lys Lys Lys Lys Met Ala Pro
1 5 10
<210> 42
<211> 5
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 42
Asp Arg Leu Arg Arg
1 5
<210> 43
<211> 7
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 43
Pro Lys Gln Lys Lys Arg Lys
1 5
<210> 44
<211> 10
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 44
Arg Lys Leu Lys Lys Lys Ile Lys Lys Leu
1 5 10
<210> 45
<211> 10
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 45
Arg Glu Lys Lys Lys Phe Leu Lys Arg Arg
1 5 10
<210> 46
<211> 20
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 46
Lys Arg Lys Gly Asp Glu Val Asp Gly Val Asp Glu Val Ala Lys Lys
1 5 10 15
Lys Ser Lys Lys
20
<210> 47
<211> 17
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, nuclear positioning sequence
<400> 47
Arg Lys Cys Leu Gln Ala Gly Met Asn Leu Glu Ala Arg Lys Thr Lys
1 5 10 15
Lys
<210> 48
<211> 644
<212> DNA
<213> Homo sapiens (Homo sapiens)
<400> 48
acaagttcag cgtgtctggc gagggcgagg gcgatgccac ctacggcaag ctgaccctga 60
agttcatctg caccaccggc aagctgcccg tgccctggcc caccctcgtg accaccctga 120
cctacggcgt gcagtgcttc agccgctacc ccgaccacat gaagcagcac gacttcttca 180
agtccgccat gcccgaaggc tacgtccagg agcgcaccat cttcttcaag gacgacggca 240
actagtatac ccgcgccgag gtgaagttcg agggcgacac cctggtgaac cgcatcgagc 300
tgaagggcat cgacttcaag gaggacggca acatcctggg gcacaagctg gagtacaact 360
acaacagcca caacgtctat atcatggccg acaagcagaa gaacggcatc aaggcgaact 420
tcaagatccg ccacaacatc gaggacggca gcgtgcagct cgccgaccac taccagcaga 480
acacccccat cggcgacggc cccgtgctgc tgcccgacaa ccactacctg agcacccagt 540
ccgccctgag caaagacccc aacgagaagc gcgatcacat ggtcctgctg gagttcgtga 600
ccgccgccgg gatcactctc ggcatggacg agctgtacaa gtaa 644
<210> 49
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 49
tagacagttg tttgttcagt 20
<210> 50
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 50
gtcctcgttg tcttgttggc 20
<210> 51
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 51
gttggcaggg gtctgcaccc 20
<210> 52
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 52
tcactgagtt catgacctac 20
<210> 53
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 53
catgacctac gggaacctcc 20
<210> 54
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 54
cctgagggag tgcaaccggc 20
<210> 55
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 55
ccggcaggag gtgaacgccg 20
<210> 56
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 56
cgccgtggtg ctgctgtaca 20
<210> 57
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 57
ctcgtcagcc atggagtacc 20
<210> 58
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 58
aaaaacttca tccacaggta 20
<210> 59
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 59
agcctgcgcc atggagtcac 20
<210> 60
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 60
ggagtcacag ggcgtggagc 20
<210> 61
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 61
acaacgagga cttcaacacg 20
<210> 62
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 62
tcagtgatga tatagaacgg 20
<210> 63
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 63
tgcactccct caggtagtcc 20
<210> 64
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 64
gccctgtgac tccatggcgc 20
<210> 65
<211> 4731
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence AIDX-XTEN-dCAS9 coding sequence
<400> 65
atggacagcc tcttgatgaa ccggaggaag tttctttacc aattcaaaaa tgtccgctgg 60
gctaagggtc ggcgtgagac ctacctgtgc tacgtagtga agaggcgtga cagtgctaca 120
tccttttcac tggactttgg ttatcttcgc aataagaacg gctgccacgt ggaattgctc 180
ttcctccgct acatctcgga ctgggaccta gaccctggcc gctgctaccg cgtcacctgg 240
ttcacctcct ggagcccctg ctacgactgt gcccgacatg tggccgactt tctgcgaggg 300
aaccccaacc tcagtctgag gatcttcacc gcgcgcctct acttctgtga ggaccgcaag 360
gctgagcccg aggggctgcg gcggctgcac cgcgccgggg tgcaaatagc catcatgacc 420
ttcaaagatt atttttactg ctggaatact tttgtagaaa accatgaaag aactttcaaa 480
gcctgggaag ggctgcatga aaattcagtt cgtctctcca gacagcttcg gcgcatcctt 540
ttgcccagcg gcagcgagac tcccgggacc tcagagtccg ccacacccga aagtgataaa 600
aagtattcta ttggtttagc catcggcact aattccgttg gatgggctgt cataaccgat 660
gaatacaaag taccttcaaa gaaatttaag gtgttgggga acacagaccg tcattcgatt 720
aaaaagaatc ttatcggtgc cctcctattc gatagtggcg aaacggcaga ggcgactcgc 780
ctgaaacgaa ccgctcggag aaggtataca cgtcgcaaga accgaatatg ttacttacaa 840
gaaattttta gcaatgagat ggccaaagtt gacgattctt tctttcaccg tttggaagag 900
tccttccttg tcgaagagga caagaaacat gaacggcacc ccatctttgg aaacatagta 960
gatgaggtgg catatcatga aaagtaccca acgatttatc acctcagaaa aaagctagtt 1020
gactcaactg ataaagcgga cctgaggtta atctacttgg ctcttgccca tatgataaag 1080
ttccgtgggc actttctcat tgagggtgat ctaaatccgg acaactcgga tgtcgacaaa 1140
ctgttcatcc agttagtaca aacctataat cagttgtttg aagagaaccc tataaatgca 1200
agtggcgtgg atgcgaaggc tattcttagc gcccgcctct ctaaatcccg acggctagaa 1260
aacctgatcg cacaattacc cggagagaag aaaaatgggt tgttcggtaa ccttatagcg 1320
ctctcactag gcctgacacc aaattttaag tcgaacttcg acttagctga agatgccaaa 1380
ttgcagctta gtaaggacac gtacgatgac gatctcgaca atctactggc acaaattgga 1440
gatcagtatg cggacttatt tttggctgcc aaaaacctta gcgatgcaat cctcctatct 1500
gacatactga gagttaatac tgagattacc aaggcgccgt tatccgcttc aatgatcaaa 1560
aggtacgatg aacatcacca agacttgaca cttctcaagg ccctagtccg tcagcaactg 1620
cctgagaaat ataaggaaat attctttgat cagtcgaaaa acgggtacgc aggttatatt 1680
gacggcggag cgagtcaaga ggaattctac aagtttatca aacccatatt agagaagatg 1740
gatgggacgg aagagttgct tgtaaaactc aatcgcgaag atctactgcg aaagcagcgg 1800
actttcgaca acggtagcat tccacatcaa atccacttag gcgaattgca tgctatactt 1860
agaaggcagg aggattttta tccgttcctc aaagacaatc gtgaaaagat tgagaaaatc 1920
ctaacctttc gcatacctta ctatgtggga cccctggccc gagggaactc tcggttcgca 1980
tggatgacaa gaaagtccga agaaacgatt actccatgga attttgagga agttgtcgat 2040
aaaggtgcgt cagctcaatc gttcatcgag aggatgacca actttgacaa gaatttaccg 2100
aacgaaaaag tattgcctaa gcacagttta ctttacgagt atttcacagt gtacaatgaa 2160
ctcacgaaag ttaagtatgt cactgagggc atgcgtaaac ccgcctttct aagcggagaa 2220
cagaagaaag caatagtaga tctgttattc aagaccaacc gcaaagtgac agttaagcaa 2280
ttgaaagagg actactttaa gaaaattgaa tgcttcgatt ctgtcgagat ctccggggta 2340
gaagatcgat ttaatgcgtc acttggtacg tatcatgacc tcctaaagat aattaaagat 2400
aaggacttcc tggataacga agagaatgaa gatatcttag aagatatagt gttgactctt 2460
accctctttg aagatcggga aatgattgag gaaagactaa aaacatacgc tcacctgttc 2520
gacgataagg ttatgaaaca gttaaagagg cgtcgctata cgggctgggg acgattgtcg 2580
cggaaactta tcaacgggat aagagacaag caaagtggta aaactattct cgattttcta 2640
aagagcgacg gcttcgccaa taggaacttt atgcagctga tccatgatga ctctttaacc 2700
ttcaaagagg atatacaaaa ggcacaggtt tccggacaag gggactcatt gcacgaacat 2760
attgcgaatc ttgctggttc gccagccatc aaaaagggca tactccagac agtcaaagta 2820
gtggatgagc tagttaaggt catgggacgt cacaaaccgg aaaacattgt aatcgagatg 2880
gcacgcgaaa atcaaacgac tcagaagggg caaaaaaaca gtcgagagcg gatgaagaga 2940
atagaagagg gtattaaaga actgggcagc cagatcttaa aggagcatcc tgtggaaaat 3000
acccaattgc agaacgagaa actttacctc tattacctac aaaatggaag ggacatgtat 3060
gttgatcagg aactggacat aaaccgttta tctgattacg acgtcgatgc cattgtaccc 3120
caatcctttt tgaaggacga ttcaatcgac aataaagtgc ttacacgctc ggataagaac 3180
cgagggaaaa gtgacaatgt tccaagcgag gaagtcgtaa agaaaatgaa gaactattgg 3240
cggcagctcc taaatgcgaa actgataacg caaagaaagt tcgataactt aactaaagct 3300
gagaggggtg gcttgtctga acttgacaag gccggattta ttaaacgtca gctcgtggaa 3360
acccgccaaa tcacaaagca tgttgcacag atactagatt cccgaatgaa tacgaaatac 3420
gacgagaacg ataagctgat tcgggaagtc aaagtaatca ctttaaagtc aaaattggtg 3480
tcggacttca gaaaggattt tcaattctat aaagttaggg agataaataa ctaccaccat 3540
gcgcacgacg cttatcttaa tgccgtcgta gggaccgcac tcattaagaa atacccgaag 3600
ctagaaagtg agtttgtgta tggtgattac aaagtttatg acgtccgtaa gatgatcgcg 3660
aaaagcgaac aggagatagg caaggctaca gccaaatact tcttttattc taacattatg 3720
aatttcttta agacggaaat cactctggca aacggagaga tacgcaaacg acctttaatt 3780
gaaaccaatg gggagacagg tgaaatcgta tgggataagg gccgggactt cgcgacggtg 3840
agaaaagttt tgtccatgcc ccaagtcaac atagtaaaga aaactgaggt gcagaccgga 3900
gggttttcaa aggaatcgat tcttccaaaa aggaatagtg ataagctcat cgctcgtaaa 3960
aaggactggg acccgaaaaa gtacggtggc ttcgatagcc ctacagttgc ctattctgtc 4020
ctagtagtgg caaaagttga gaagggaaaa tccaagaaac tgaagtcagt caaagaatta 4080
ttggggataa cgattatgga gcgctcgtct tttgaaaaga accccatcga cttccttgag 4140
gcgaaaggtt acaaggaagt aaaaaaggat ctcataatta aactaccaaa gtatagtctg 4200
tttgagttag aaaatggccg aaaacggatg ttggctagcg ccggagagct tcaaaagggg 4260
aacgaactcg cactaccgtc taaatacgtg aatttcctgt atttagcgtc ccattacgag 4320
aagttgaaag gttcacctga agataacgaa cagaagcaac tttttgttga gcagcacaaa 4380
cattatctcg acgaaatcat agagcaaatt tcggaattca gtaagagagt catcctagct 4440
gatgccaatc tggacaaagt attaagcgca tacaacaagc acagggataa acccatacgt 4500
gagcaggcgg aaaatattat ccatttgttt actcttacca acctcggcgc tccagccgca 4560
ttcaagtatt ttgacacaac gatagatcgc aaacgataca cttctaccaa ggaggtgcta 4620
gacgcgacac tgattcacca atccatcacg ggattatatg aaactcggat agatttgtca 4680
cagcttgggg gtgactctgg tggttctccc aagaagaaga ggaaagtcta a 4731
<210> 66
<211> 1576
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence AIDX-XTEN-dCAS9 amino acid sequence
<400> 66
Met Asp Ser Leu Leu Met Asn Arg Arg Lys Phe Leu Tyr Gln Phe Lys
1 5 10 15
Asn Val Arg Trp Ala Lys Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val
20 25 30
Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp Phe Gly Tyr
35 40 45
Leu Arg Asn Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu Arg Tyr
50 55 60
Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr Arg Val Thr Trp
65 70 75 80
Phe Thr Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg His Val Ala Asp
85 90 95
Phe Leu Arg Gly Asn Pro Asn Leu Ser Leu Arg Ile Phe Thr Ala Arg
100 105 110
Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu Pro Glu Gly Leu Arg Arg
115 120 125
Leu His Arg Ala Gly Val Gln Ile Ala Ile Met Thr Phe Lys Asp Tyr
130 135 140
Phe Tyr Cys Trp Asn Thr Phe Val Glu Asn His Glu Arg Thr Phe Lys
145 150 155 160
Ala Trp Glu Gly Leu His Glu Asn Ser Val Arg Leu Ser Arg Gln Leu
165 170 175
Arg Arg Ile Leu Leu Pro Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu
180 185 190
Ser Ala Thr Pro Glu Ser Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile
195 200 205
Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val
210 215 220
Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile
225 230 235 240
Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala
245 250 255
Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg
260 265 270
Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala
275 280 285
Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val
290 295 300
Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val
305 310 315 320
Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg
325 330 335
Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr
340 345 350
Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu
355 360 365
Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln
370 375 380
Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala
385 390 395 400
Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser
405 410 415
Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn
420 425 430
Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn
435 440 445
Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser
450 455 460
Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly
465 470 475 480
Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala
485 490 495
Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala
500 505 510
Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His Gln Asp
515 520 525
Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr
530 535 540
Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile
545 550 555 560
Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile
565 570 575
Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg
580 585 590
Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro
595 600 605
His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu
610 615 620
Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile
625 630 635 640
Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn
645 650 655
Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro
660 665 670
Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe
675 680 685
Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val
690 695 700
Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu
705 710 715 720
Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe
725 730 735
Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr
740 745 750
Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys
755 760 765
Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe
770 775 780
Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp
785 790 795 800
Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile
805 810 815
Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg
820 825 830
Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys Gln Leu
835 840 845
Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile
850 855 860
Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu
865 870 875 880
Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp
885 890 895
Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly
900 905 910
Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly Ser Pro
915 920 925
Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu
930 935 940
Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met
945 950 955 960
Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu
965 970 975
Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile
980 985 990
Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu
995 1000 1005
Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp Gln
1010 1015 1020
Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp Ala Ile
1025 1030 1035
Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val
1040 1045 1050
Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
1055 1060 1065
Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu
1070 1075 1080
Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr
1085 1090 1095
Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe
1100 1105 1110
Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val
1115 1120 1125
Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn
1130 1135 1140
Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys
1145 1150 1155
Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
1160 1165 1170
Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala
1175 1180 1185
Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser
1190 1195 1200
Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met
1205 1210 1215
Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr
1220 1225 1230
Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr
1235 1240 1245
Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn
1250 1255 1260
Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala
1265 1270 1275
Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys
1280 1285 1290
Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu
1295 1300 1305
Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp
1310 1315 1320
Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr
1325 1330 1335
Ser Val Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys
1340 1345 1350
Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg
1355 1360 1365
Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly
1370 1375 1380
Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr
1385 1390 1395
Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser
1400 1405 1410
Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys
1415 1420 1425
Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys
1430 1435 1440
Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln
1445 1450 1455
His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe
1460 1465 1470
Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu
1475 1480 1485
Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala
1490 1495 1500
Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro
1505 1510 1515
Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr
1520 1525 1530
Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser
1535 1540 1545
Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly
1550 1555 1560
Gly Asp Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys Val
1565 1570 1575
<210> 67
<211> 4890
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence coding sequence of dCAS9-XTEN-AIDX (K10E T82I E156G)
<400> 67
atggactata aggaccacga cggagactac aaggatcatg atattgatta caaagacgat 60
gacgataaga tggccccaaa gaagaagcgg aaggtcggta tccacggagt cccagcagct 120
accatggaca agaagtattc tatcggactg gccatcggga ctaatagcgt cgggtgggcc 180
gtgatcactg acgagtacaa ggtgccctct aagaagttca aggtgctcgg gaacaccgac 240
cggcattcca tcaagaaaaa tctgatcgga gctctcctct ttgattcagg ggagaccgct 300
gaagcaaccc gcctcaagcg gactgctaga cggcggtaca ccaggaggaa gaaccggatt 360
tgttaccttc aagagatatt ctccaacgaa atggcaaagg tcgacgacag cttcttccat 420
aggctggaag aatcattcct cgtggaagag gataagaagc atgaacggca tcccatcttc 480
ggtaatatcg tcgacgaggt ggcctatcac gagaaatacc caaccatcta ccatcttcgc 540
aaaaagctgg tggactcaac cgacaaggca gacctccggc ttatctacct ggccctggcc 600
cacatgatca agttcagagg ccacttcctg atcgagggcg acctcaatcc tgacaatagc 660
gatgtggata aactgttcat ccagctggtg cagacttaca accagctctt tgaagagaac 720
cccatcaatg caagcggagt cgatgccaag gccattctgt cagcccggct gtcaaagagc 780
cgcagacttg agaatcttat cgctcagctg ccgggtgaaa agaaaaatgg actgttcggg 840
aacctgattg ctctttcact tgggctgact cccaatttca agtctaattt cgacctggca 900
gaggatgcca agctgcaact gtccaaggac acctatgatg acgatctcga caacctcctg 960
gcccagatcg gtgaccaata cgccgacctt ttccttgctg ctaagaatct ttctgacgcc 1020
atcctgctgt ctgacattct ccgcgtgaac actgaaatca ccaaggcccc tctttcagct 1080
tcaatgatta agcggtatga tgagcaccac caggacctga ccctgcttaa ggcactcgtc 1140
cggcagcagc ttccggagaa gtacaaggaa atcttctttg accagtcaaa gaatggatac 1200
gccggctaca tcgacggagg tgcctcccaa gaggaatttt ataagtttat caaacctatc 1260
cttgagaaga tggacggcac cgaagagctc ctcgtgaaac tgaatcggga ggatctgctg 1320
cggaagcagc gcactttcga caatgggagc attccccacc agatccatct tggggagctt 1380
cacgccatcc ttcggcgcca agaggacttc tacccctttc ttaaggacaa cagggagaag 1440
attgagaaaa ttctcacttt ccgcatcccc tactacgtgg gacccctcgc cagaggaaat 1500
agccggtttg cttggatgac cagaaagtca gaagaaacta tcactccctg gaacttcgaa 1560
gaggtggtgg acaagggagc cagcgctcag tcattcatcg aacggatgac taacttcgat 1620
aagaacctcc ccaatgagaa ggtcctgccg aaacattccc tgctctacga gtactttacc 1680
gtgtacaacg agctgaccaa ggtgaaatat gtcaccgaag ggatgaggaa gcccgcattc 1740
ctgtcaggcg aacaaaagaa ggcaattgtg gaccttctgt tcaagaccaa tagaaaggtg 1800
accgtgaagc agctgaagga ggactatttc aagaaaattg aatgcttcga ctctgtggag 1860
attagcgggg tcgaagatcg gttcaacgca agcctgggta cctaccatga tctgcttaag 1920
atcatcaagg acaaggattt tctggacaat gaggagaaag aggacatcct tgaggacatt 1980
gtcctgactc tcactctgtt cgaggaccgg gaaatgatcg aggagaggct taagacctac 2040
gcccatctgt tcgacgataa agtgatgaag caacttaaac ggagaagata taccggatgg 2100
ggacgcctta gccgcaaact catcaacgga atccgggaca aacagagcgg aaagaccatt 2160
cttgatttcc ttaagagcga cggattcgct aatcgcaact tcatgcaact tatccatgat 2220
gattccctga cctttaagga ggacatccag aaggcccaag tgtctggaca aggtgactca 2280
ctgcacgagc atatcgcaaa tctggctggt tcacccgcta ttaagaaggg tattctccag 2340
accgtgaaag tcgtggacga gctggtcaag gtgatgggtc gccataaacc agagaacatt 2400
gtcatcgaga tggccaggga aaaccagact acccagaagg gacagaagaa cagcagggag 2460
cggatgaaaa gaattgagga agggattaag gagctcgggt cacagatcct taaagagcac 2520
ccggtggaaa acacccagct tcagaatgag aagctctatc tgtactacct tcaaaatgga 2580
cgcgatatgt atgtggacca agagcttgat atcaacaggc tctcagacta cgacgtggac 2640
gccatcgtcc ctcagagctt cctcaaagac gactcaattg acaataaggt gctgactcgc 2700
tcagacaaga accggggaaa gtcagataac gtgccctcag aggaagtcgt gaaaaagatg 2760
aagaactatt ggcgccagct tctgaacgca aagctgatca ctcagcggaa gttcgacaat 2820
ctcactaagg ctgagagggg cggactgagc gaactggaca aagcaggatt cattaaacgg 2880
caacttgtgg agactcggca gattactaaa catgtagccc aaatccttga ctcacgcatg 2940
aataccaagt acgacgaaaa cgacaaactt atccgcgagg tgaaggtgat taccctgaag 3000
tccaagctgg tcagcgattt cagaaaggac tttcaattct acaaagtgcg ggagatcaat 3060
aactatcatc atgctcatga cgcatatctg aatgccgtgg tgggaaccgc cctgatcaag 3120
aagtacccaa agctggaaag cgagttcgtg tacggagact acaaggtcta cgacgtgcgc 3180
aagatgattg ccaaatctga gcaggagatc ggaaaggcca ccgcaaagta cttcttctac 3240
agcaacatca tgaatttctt caagaccgaa atcacccttg caaacggtga gatccggaag 3300
aggccgctca tcgagactaa tggggagact ggcgaaatcg tgtgggacaa gggcagagat 3360
ttcgctaccg tgcgcaaagt gctttctatg cctcaagtga acatcgtgaa gaaaaccgag 3420
gtgcaaaccg gaggcttttc taaggaatca atcctcccca agcgcaactc cgacaagctc 3480
attgcaagga agaaggattg ggaccctaag aagtacggcg gattcgattc accaactgtg 3540
gcttattctg tcctggtcgt ggctaaggtg gaaaaaggaa agtctaagaa gctcaagagc 3600
gtgaaggaac tgctgggtat caccattatg gagcgcagct ccttcgagaa gaacccaatt 3660
gactttctcg aagccaaagg ttacaaggaa gtcaagaagg accttatcat caagctccca 3720
aagtatagcc tgttcgaact ggagaatggg cggaagcgga tgctcgcctc cgctggcgaa 3780
cttcagaagg gtaatgagct ggctctcccc tccaagtacg tgaatttcct ctaccttgca 3840
agccattacg agaagctgaa ggggagcccc gaggacaacg agcaaaagca actgtttgtg 3900
gagcagcata agcattatct ggacgagatc attgagcaga tttccgagtt ttctaaacgc 3960
gtcattctcg ctgatgccaa cctcgataaa gtccttagcg catacaataa gcacagagac 4020
aaaccaattc gggagcaggc tgagaatatc atccacctgt tcaccctcac caatcttggt 4080
gcccctgccg cattcaagta cttcgacacc accatcgacc ggaaacgcta tacctccacc 4140
aaagaagtgc tggacgccac cctcatccac cagagcatca ccggacttta cgaaactcgg 4200
attgacctct cacagctcgg aggggatgag ggagctccca agaaaaagcg caaggtaggt 4260
agttccggat ctccgaaaaa gaaacgcaaa gttagcggca gcgagactcc cgggacctca 4320
gagtccgcca cacccgaaag tatggacagc ctcttgatga accggaggga gtttctttac 4380
caattcaaaa atgtccgctg ggctaagggt cggcgtgaga cctacctgtg ctacgtagtg 4440
aagaggcgtg acagtgctac atccttttca ctggactttg gttatcttcg caataagaac 4500
ggctgccacg tggaattgct cttcctccgc tacatctcgg actgggacct agaccctggc 4560
cgctgctacc gcgtcacctg gttcatctcc tggagcccct gctacgactg tgcccgacat 4620
gtggccgact ttctgcgagg gaaccccaac ctcagtctga ggatcttcac cgcgcgcctc 4680
tacttctgtg aggaccgcaa ggctgagccc gaggggctgc ggcggctgca ccgcgccggg 4740
gtgcaaatag ccatcatgac cttcaaagat tatttttact gctggaatac ttttgtagaa 4800
aaccatggaa gaactttcaa agcctgggaa gggctgcatg aaaattcagt tcgtctctcc 4860
agacagcttc ggcgcatcct tttgccctga 4890
<210> 68
<211> 1629
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence the amino acid sequence of dCAS9-XTEN-AIDX (K10E T82I E156G)
<400> 68
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Thr Met Asp Lys Lys Tyr Ser Ile
35 40 45
Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp
50 55 60
Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp
65 70 75 80
Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser
85 90 95
Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg
100 105 110
Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser
115 120 125
Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
130 135 140
Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe
145 150 155 160
Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile
165 170 175
Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu
180 185 190
Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His
195 200 205
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys
210 215 220
Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn
225 230 235 240
Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg
245 250 255
Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly
260 265 270
Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly
275 280 285
Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys
290 295 300
Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu
305 310 315 320
Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn
325 330 335
Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
340 345 350
Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu
355 360 365
His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu
370 375 380
Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
385 390 395 400
Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe
405 410 415
Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val
420 425 430
Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
435 440 445
Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu
450 455 460
Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys
465 470 475 480
Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu
485 490 495
Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu
500 505 510
Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser
515 520 525
Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro
530 535 540
Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr
545 550 555 560
Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg
565 570 575
Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu
580 585 590
Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
595 600 605
Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val
610 615 620
Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys
625 630 635 640
Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Lys Glu Asp Ile
645 650 655
Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met
660 665 670
Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val
675 680 685
Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser
690 695 700
Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile
705 710 715 720
Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
725 730 735
Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala
740 745 750
Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu
755 760 765
Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
770 775 780
Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile
785 790 795 800
Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys
805 810 815
Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu
820 825 830
Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
835 840 845
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr
850 855 860
Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
865 870 875 880
Ala Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys
885 890 895
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
900 905 910
Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu
915 920 925
Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala
930 935 940
Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg
945 950 955 960
Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu
965 970 975
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
980 985 990
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg
995 1000 1005
Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His
1010 1015 1020
His Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu
1025 1030 1035
Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp
1040 1045 1050
Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln
1055 1060 1065
Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile
1070 1075 1080
Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile
1085 1090 1095
Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile
1100 1105 1110
Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu
1115 1120 1125
Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr
1130 1135 1140
Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp
1145 1150 1155
Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly
1160 1165 1170
Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala
1175 1180 1185
Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu
1190 1195 1200
Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn
1205 1210 1215
Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys
1220 1225 1230
Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu
1235 1240 1245
Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys
1250 1255 1260
Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr
1265 1270 1275
Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn
1280 1285 1290
Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp
1295 1300 1305
Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu
1310 1315 1320
Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His
1325 1330 1335
Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu
1340 1345 1350
Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe
1355 1360 1365
Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val
1370 1375 1380
Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu
1385 1390 1395
Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Glu Gly Ala Pro
1400 1405 1410
Lys Lys Lys Arg Lys Val Gly Ser Ser Gly Ser Pro Lys Lys Lys
1415 1420 1425
Arg Lys Val Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala
1430 1435 1440
Thr Pro Glu Ser Met Asp Ser Leu Leu Met Asn Arg Arg Glu Phe
1445 1450 1455
Leu Tyr Gln Phe Lys Asn Val Arg Trp Ala Lys Gly Arg Arg Glu
1460 1465 1470
Thr Tyr Leu Cys Tyr Val Val Lys Arg Arg Asp Ser Ala Thr Ser
1475 1480 1485
Phe Ser Leu Asp Phe Gly Tyr Leu Arg Asn Lys Asn Gly Cys His
1490 1495 1500
Val Glu Leu Leu Phe Leu Arg Tyr Ile Ser Asp Trp Asp Leu Asp
1505 1510 1515
Pro Gly Arg Cys Tyr Arg Val Thr Trp Phe Ile Ser Trp Ser Pro
1520 1525 1530
Cys Tyr Asp Cys Ala Arg His Val Ala Asp Phe Leu Arg Gly Asn
1535 1540 1545
Pro Asn Leu Ser Leu Arg Ile Phe Thr Ala Arg Leu Tyr Phe Cys
1550 1555 1560
Glu Asp Arg Lys Ala Glu Pro Glu Gly Leu Arg Arg Leu His Arg
1565 1570 1575
Ala Gly Val Gln Ile Ala Ile Met Thr Phe Lys Asp Tyr Phe Tyr
1580 1585 1590
Cys Trp Asn Thr Phe Val Glu Asn His Gly Arg Thr Phe Lys Ala
1595 1600 1605
Trp Glu Gly Leu His Glu Asn Ser Val Arg Leu Ser Arg Gln Leu
1610 1615 1620
Arg Arg Ile Leu Leu Pro
1625
<210> 69
<211> 4890
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence coding sequence of dCAS9-XTEN-AIDX
<400> 69
atggactata aggaccacga cggagactac aaggatcatg atattgatta caaagacgat 60
gacgataaga tggccccaaa gaagaagcgg aaggtcggta tccacggagt cccagcagct 120
accatggaca agaagtattc tatcggactg gccatcggga ctaatagcgt cgggtgggcc 180
gtgatcactg acgagtacaa ggtgccctct aagaagttca aggtgctcgg gaacaccgac 240
cggcattcca tcaagaaaaa tctgatcgga gctctcctct ttgattcagg ggagaccgct 300
gaagcaaccc gcctcaagcg gactgctaga cggcggtaca ccaggaggaa gaaccggatt 360
tgttaccttc aagagatatt ctccaacgaa atggcaaagg tcgacgacag cttcttccat 420
aggctggaag aatcattcct cgtggaagag gataagaagc atgaacggca tcccatcttc 480
ggtaatatcg tcgacgaggt ggcctatcac gagaaatacc caaccatcta ccatcttcgc 540
aaaaagctgg tggactcaac cgacaaggca gacctccggc ttatctacct ggccctggcc 600
cacatgatca agttcagagg ccacttcctg atcgagggcg acctcaatcc tgacaatagc 660
gatgtggata aactgttcat ccagctggtg cagacttaca accagctctt tgaagagaac 720
cccatcaatg caagcggagt cgatgccaag gccattctgt cagcccggct gtcaaagagc 780
cgcagacttg agaatcttat cgctcagctg ccgggtgaaa agaaaaatgg actgttcggg 840
aacctgattg ctctttcact tgggctgact cccaatttca agtctaattt cgacctggca 900
gaggatgcca agctgcaact gtccaaggac acctatgatg acgatctcga caacctcctg 960
gcccagatcg gtgaccaata cgccgacctt ttccttgctg ctaagaatct ttctgacgcc 1020
atcctgctgt ctgacattct ccgcgtgaac actgaaatca ccaaggcccc tctttcagct 1080
tcaatgatta agcggtatga tgagcaccac caggacctga ccctgcttaa ggcactcgtc 1140
cggcagcagc ttccggagaa gtacaaggaa atcttctttg accagtcaaa gaatggatac 1200
gccggctaca tcgacggagg tgcctcccaa gaggaatttt ataagtttat caaacctatc 1260
cttgagaaga tggacggcac cgaagagctc ctcgtgaaac tgaatcggga ggatctgctg 1320
cggaagcagc gcactttcga caatgggagc attccccacc agatccatct tggggagctt 1380
cacgccatcc ttcggcgcca agaggacttc tacccctttc ttaaggacaa cagggagaag 1440
attgagaaaa ttctcacttt ccgcatcccc tactacgtgg gacccctcgc cagaggaaat 1500
agccggtttg cttggatgac cagaaagtca gaagaaacta tcactccctg gaacttcgaa 1560
gaggtggtgg acaagggagc cagcgctcag tcattcatcg aacggatgac taacttcgat 1620
aagaacctcc ccaatgagaa ggtcctgccg aaacattccc tgctctacga gtactttacc 1680
gtgtacaacg agctgaccaa ggtgaaatat gtcaccgaag ggatgaggaa gcccgcattc 1740
ctgtcaggcg aacaaaagaa ggcaattgtg gaccttctgt tcaagaccaa tagaaaggtg 1800
accgtgaagc agctgaagga ggactatttc aagaaaattg aatgcttcga ctctgtggag 1860
attagcgggg tcgaagatcg gttcaacgca agcctgggta cctaccatga tctgcttaag 1920
atcatcaagg acaaggattt tctggacaat gaggagaaag aggacatcct tgaggacatt 1980
gtcctgactc tcactctgtt cgaggaccgg gaaatgatcg aggagaggct taagacctac 2040
gcccatctgt tcgacgataa agtgatgaag caacttaaac ggagaagata taccggatgg 2100
ggacgcctta gccgcaaact catcaacgga atccgggaca aacagagcgg aaagaccatt 2160
cttgatttcc ttaagagcga cggattcgct aatcgcaact tcatgcaact tatccatgat 2220
gattccctga cctttaagga ggacatccag aaggcccaag tgtctggaca aggtgactca 2280
ctgcacgagc atatcgcaaa tctggctggt tcacccgcta ttaagaaggg tattctccag 2340
accgtgaaag tcgtggacga gctggtcaag gtgatgggtc gccataaacc agagaacatt 2400
gtcatcgaga tggccaggga aaaccagact acccagaagg gacagaagaa cagcagggag 2460
cggatgaaaa gaattgagga agggattaag gagctcgggt cacagatcct taaagagcac 2520
ccggtggaaa acacccagct tcagaatgag aagctctatc tgtactacct tcaaaatgga 2580
cgcgatatgt atgtggacca agagcttgat atcaacaggc tctcagacta cgacgtggac 2640
gccatcgtcc ctcagagctt cctcaaagac gactcaattg acaataaggt gctgactcgc 2700
tcagacaaga accggggaaa gtcagataac gtgccctcag aggaagtcgt gaaaaagatg 2760
aagaactatt ggcgccagct tctgaacgca aagctgatca ctcagcggaa gttcgacaat 2820
ctcactaagg ctgagagggg cggactgagc gaactggaca aagcaggatt cattaaacgg 2880
caacttgtgg agactcggca gattactaaa catgtagccc aaatccttga ctcacgcatg 2940
aataccaagt acgacgaaaa cgacaaactt atccgcgagg tgaaggtgat taccctgaag 3000
tccaagctgg tcagcgattt cagaaaggac tttcaattct acaaagtgcg ggagatcaat 3060
aactatcatc atgctcatga cgcatatctg aatgccgtgg tgggaaccgc cctgatcaag 3120
aagtacccaa agctggaaag cgagttcgtg tacggagact acaaggtcta cgacgtgcgc 3180
aagatgattg ccaaatctga gcaggagatc ggaaaggcca ccgcaaagta cttcttctac 3240
agcaacatca tgaatttctt caagaccgaa atcacccttg caaacggtga gatccggaag 3300
aggccgctca tcgagactaa tggggagact ggcgaaatcg tgtgggacaa gggcagagat 3360
ttcgctaccg tgcgcaaagt gctttctatg cctcaagtga acatcgtgaa gaaaaccgag 3420
gtgcaaaccg gaggcttttc taaggaatca atcctcccca agcgcaactc cgacaagctc 3480
attgcaagga agaaggattg ggaccctaag aagtacggcg gattcgattc accaactgtg 3540
gcttattctg tcctggtcgt ggctaaggtg gaaaaaggaa agtctaagaa gctcaagagc 3600
gtgaaggaac tgctgggtat caccattatg gagcgcagct ccttcgagaa gaacccaatt 3660
gactttctcg aagccaaagg ttacaaggaa gtcaagaagg accttatcat caagctccca 3720
aagtatagcc tgttcgaact ggagaatggg cggaagcgga tgctcgcctc cgctggcgaa 3780
cttcagaagg gtaatgagct ggctctcccc tccaagtacg tgaatttcct ctaccttgca 3840
agccattacg agaagctgaa ggggagcccc gaggacaacg agcaaaagca actgtttgtg 3900
gagcagcata agcattatct ggacgagatc attgagcaga tttccgagtt ttctaaacgc 3960
gtcattctcg ctgatgccaa cctcgataaa gtccttagcg catacaataa gcacagagac 4020
aaaccaattc gggagcaggc tgagaatatc atccacctgt tcaccctcac caatcttggt 4080
gcccctgccg cattcaagta cttcgacacc accatcgacc ggaaacgcta tacctccacc 4140
aaagaagtgc tggacgccac cctcatccac cagagcatca ccggacttta cgaaactcgg 4200
attgacctct cacagctcgg aggggatgag ggagctccca agaaaaagcg caaggtaggt 4260
agttccggat ctccgaaaaa gaaacgcaaa gttagcggca gcgagactcc cgggacctca 4320
gagtccgcca cacccgaaag tatggacagc ctcttgatga accggaggaa gtttctttac 4380
caattcaaaa atgtccgctg ggctaagggt cggcgtgaga cctacctgtg ctacgtagtg 4440
aagaggcgtg acagtgctac atccttttca ctggactttg gttatcttcg caataagaac 4500
ggctgccacg tggaattgct cttcctccgc tacatctcgg actgggacct agaccctggc 4560
cgctgctacc gcgtcacctg gttcacctcc tggagcccct gctacgactg tgcccgacat 4620
gtggccgact ttctgcgagg gaaccccaac ctcagtctga ggatcttcac cgcgcgcctc 4680
tacttctgtg aggaccgcaa ggctgagccc gaggggctgc ggcggctgca ccgcgccggg 4740
gtgcaaatag ccatcatgac cttcaaagat tatttttact gctggaatac ttttgtagaa 4800
aaccatgaaa gaactttcaa agcctgggaa gggctgcatg aaaattcagt tcgtctctcc 4860
agacagcttc ggcgcatcct tttgccctga 4890
<210> 70
<211> 1629
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence amino acid sequence of dCAS9-XTEN-AIDX
<400> 70
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Thr Met Asp Lys Lys Tyr Ser Ile
35 40 45
Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp
50 55 60
Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp
65 70 75 80
Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser
85 90 95
Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg
100 105 110
Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser
115 120 125
Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
130 135 140
Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe
145 150 155 160
Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile
165 170 175
Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu
180 185 190
Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His
195 200 205
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys
210 215 220
Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn
225 230 235 240
Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg
245 250 255
Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly
260 265 270
Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly
275 280 285
Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys
290 295 300
Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu
305 310 315 320
Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn
325 330 335
Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
340 345 350
Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu
355 360 365
His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu
370 375 380
Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
385 390 395 400
Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe
405 410 415
Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val
420 425 430
Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
435 440 445
Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu
450 455 460
Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys
465 470 475 480
Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu
485 490 495
Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu
500 505 510
Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser
515 520 525
Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro
530 535 540
Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr
545 550 555 560
Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg
565 570 575
Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu
580 585 590
Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
595 600 605
Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val
610 615 620
Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys
625 630 635 640
Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Lys Glu Asp Ile
645 650 655
Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met
660 665 670
Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val
675 680 685
Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser
690 695 700
Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile
705 710 715 720
Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
725 730 735
Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala
740 745 750
Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu
755 760 765
Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
770 775 780
Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile
785 790 795 800
Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys
805 810 815
Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu
820 825 830
Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
835 840 845
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr
850 855 860
Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
865 870 875 880
Ala Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys
885 890 895
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
900 905 910
Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu
915 920 925
Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala
930 935 940
Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg
945 950 955 960
Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu
965 970 975
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
980 985 990
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg
995 1000 1005
Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His
1010 1015 1020
His Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu
1025 1030 1035
Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp
1040 1045 1050
Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln
1055 1060 1065
Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile
1070 1075 1080
Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile
1085 1090 1095
Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile
1100 1105 1110
Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu
1115 1120 1125
Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr
1130 1135 1140
Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp
1145 1150 1155
Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly
1160 1165 1170
Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala
1175 1180 1185
Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu
1190 1195 1200
Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn
1205 1210 1215
Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys
1220 1225 1230
Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu
1235 1240 1245
Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys
1250 1255 1260
Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr
1265 1270 1275
Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn
1280 1285 1290
Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp
1295 1300 1305
Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu
1310 1315 1320
Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His
1325 1330 1335
Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu
1340 1345 1350
Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe
1355 1360 1365
Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val
1370 1375 1380
Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu
1385 1390 1395
Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Glu Gly Ala Pro
1400 1405 1410
Lys Lys Lys Arg Lys Val Gly Ser Ser Gly Ser Pro Lys Lys Lys
1415 1420 1425
Arg Lys Val Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala
1430 1435 1440
Thr Pro Glu Ser Met Asp Ser Leu Leu Met Asn Arg Arg Lys Phe
1445 1450 1455
Leu Tyr Gln Phe Lys Asn Val Arg Trp Ala Lys Gly Arg Arg Glu
1460 1465 1470
Thr Tyr Leu Cys Tyr Val Val Lys Arg Arg Asp Ser Ala Thr Ser
1475 1480 1485
Phe Ser Leu Asp Phe Gly Tyr Leu Arg Asn Lys Asn Gly Cys His
1490 1495 1500
Val Glu Leu Leu Phe Leu Arg Tyr Ile Ser Asp Trp Asp Leu Asp
1505 1510 1515
Pro Gly Arg Cys Tyr Arg Val Thr Trp Phe Thr Ser Trp Ser Pro
1520 1525 1530
Cys Tyr Asp Cys Ala Arg His Val Ala Asp Phe Leu Arg Gly Asn
1535 1540 1545
Pro Asn Leu Ser Leu Arg Ile Phe Thr Ala Arg Leu Tyr Phe Cys
1550 1555 1560
Glu Asp Arg Lys Ala Glu Pro Glu Gly Leu Arg Arg Leu His Arg
1565 1570 1575
Ala Gly Val Gln Ile Ala Ile Met Thr Phe Lys Asp Tyr Phe Tyr
1580 1585 1590
Cys Trp Asn Thr Phe Val Glu Asn His Glu Arg Thr Phe Lys Ala
1595 1600 1605
Trp Glu Gly Leu His Glu Asn Ser Val Arg Leu Ser Arg Gln Leu
1610 1615 1620
Arg Arg Ile Leu Leu Pro
1625
<210> 71
<211> 4917
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence coding sequence nCas-AIDX
<400> 71
atggactata aggaccacga cggagactac aaggatcatg atattgatta caaagacgat 60
gacgataaga tggccccaaa gaagaagcgg aaggtcggta tccacggagt cccagcagct 120
accatggaca agaagtattc tatcggactg gccatcggga ctaatagcgt cgggtgggcc 180
gtgatcactg acgagtacaa ggtgccctct aagaagttca aggtgctcgg gaacaccgac 240
cggcattcca tcaagaaaaa tctgatcgga gctctcctct ttgattcagg ggagaccgct 300
gaagcaaccc gcctcaagcg gactgctaga cggcggtaca ccaggaggaa gaaccggatt 360
tgttaccttc aagagatatt ctccaacgaa atggcaaagg tcgacgacag cttcttccat 420
aggctggaag aatcattcct cgtggaagag gataagaagc atgaacggca tcccatcttc 480
ggtaatatcg tcgacgaggt ggcctatcac gagaaatacc caaccatcta ccatcttcgc 540
aaaaagctgg tggactcaac cgacaaggca gacctccggc ttatctacct ggccctggcc 600
cacatgatca agttcagagg ccacttcctg atcgagggcg acctcaatcc tgacaatagc 660
gatgtggata aactgttcat ccagctggtg cagacttaca accagctctt tgaagagaac 720
cccatcaatg caagcggagt cgatgccaag gccattctgt cagcccggct gtcaaagagc 780
cgcagacttg agaatcttat cgctcagctg ccgggtgaaa agaaaaatgg actgttcggg 840
aacctgattg ctctttcact tgggctgact cccaatttca agtctaattt cgacctggca 900
gaggatgcca agctgcaact gtccaaggac acctatgatg acgatctcga caacctcctg 960
gcccagatcg gtgaccaata cgccgacctt ttccttgctg ctaagaatct ttctgacgcc 1020
atcctgctgt ctgacattct ccgcgtgaac actgaaatca ccaaggcccc tctttcagct 1080
tcaatgatta agcggtatga tgagcaccac caggacctga ccctgcttaa ggcactcgtc 1140
cggcagcagc ttccggagaa gtacaaggaa atcttctttg accagtcaaa gaatggatac 1200
gccggctaca tcgacggagg tgcctcccaa gaggaatttt ataagtttat caaacctatc 1260
cttgagaaga tggacggcac cgaagagctc ctcgtgaaac tgaatcggga ggatctgctg 1320
cggaagcagc gcactttcga caatgggagc attccccacc agatccatct tggggagctt 1380
cacgccatcc ttcggcgcca agaggacttc tacccctttc ttaaggacaa cagggagaag 1440
attgagaaaa ttctcacttt ccgcatcccc tactacgtgg gacccctcgc cagaggaaat 1500
agccggtttg cttggatgac cagaaagtca gaagaaacta tcactccctg gaacttcgaa 1560
gaggtggtgg acaagggagc cagcgctcag tcattcatcg aacggatgac taacttcgat 1620
aagaacctcc ccaatgagaa ggtcctgccg aaacattccc tgctctacga gtactttacc 1680
gtgtacaacg agctgaccaa ggtgaaatat gtcaccgaag ggatgaggaa gcccgcattc 1740
ctgtcaggcg aacaaaagaa ggcaattgtg gaccttctgt tcaagaccaa tagaaaggtg 1800
accgtgaagc agctgaagga ggactatttc aagaaaattg aatgcttcga ctctgtggag 1860
attagcgggg tcgaagatcg gttcaacgca agcctgggta cctaccatga tctgcttaag 1920
atcatcaagg acaaggattt tctggacaat gaggagaaag aggacatcct tgaggacatt 1980
gtcctgactc tcactctgtt cgaggaccgg gaaatgatcg aggagaggct taagacctac 2040
gcccatctgt tcgacgataa agtgatgaag caacttaaac ggagaagata taccggatgg 2100
ggacgcctta gccgcaaact catcaacgga atccgggaca aacagagcgg aaagaccatt 2160
cttgatttcc ttaagagcga cggattcgct aatcgcaact tcatgcaact tatccatgat 2220
gattccctga cctttaagga ggacatccag aaggcccaag tgtctggaca aggtgactca 2280
ctgcacgagc atatcgcaaa tctggctggt tcacccgcta ttaagaaggg tattctccag 2340
accgtgaaag tcgtggacga gctggtcaag gtgatgggtc gccataaacc agagaacatt 2400
gtcatcgaga tggccaggga aaaccagact acccagaagg gacagaagaa cagcagggag 2460
cggatgaaaa gaattgagga agggattaag gagctcgggt cacagatcct taaagagcac 2520
ccggtggaaa acacccagct tcagaatgag aagctctatc tgtactacct tcaaaatgga 2580
cgcgatatgt atgtggacca agagcttgat atcaacaggc tctcagacta cgacgtggac 2640
catatcgtcc ctcagagctt cctcaaagac gactcaattg acaataaggt gctgactcgc 2700
tcagacaaga accggggaaa gtcagataac gtgccctcag aggaagtcgt gaaaaagatg 2760
aagaactatt ggcgccagct tctgaacgca aagctgatca ctcagcggaa gttcgacaat 2820
ctcactaagg ctgagagggg cggactgagc gaactggaca aagcaggatt cattaaacgg 2880
caacttgtgg agactcggca gattactaaa catgtagccc aaatccttga ctcacgcatg 2940
aataccaagt acgacgaaaa cgacaaactt atccgcgagg tgaaggtgat taccctgaag 3000
tccaagctgg tcagcgattt cagaaaggac tttcaattct acaaagtgcg ggagatcaat 3060
aactatcatc atgctcatga cgcatatctg aatgccgtgg tgggaaccgc cctgatcaag 3120
aagtacccaa agctggaaag cgagttcgtg tacggagact acaaggtcta cgacgtgcgc 3180
aagatgattg ccaaatctga gcaggagatc ggaaaggcca ccgcaaagta cttcttctac 3240
agcaacatca tgaatttctt caagaccgaa atcacccttg caaacggtga gatccggaag 3300
aggccgctca tcgagactaa tggggagact ggcgaaatcg tgtgggacaa gggcagagat 3360
ttcgctaccg tgcgcaaagt gctttctatg cctcaagtga acatcgtgaa gaaaaccgag 3420
gtgcaaaccg gaggcttttc taaggaatca atcctcccca agcgcaactc cgacaagctc 3480
attgcaagga agaaggattg ggaccctaag aagtacggcg gattcgattc accaactgtg 3540
gcttattctg tcctggtcgt ggctaaggtg gaaaaaggaa agtctaagaa gctcaagagc 3600
gtgaaggaac tgctgggtat caccattatg gagcgcagct ccttcgagaa gaacccaatt 3660
gactttctcg aagccaaagg ttacaaggaa gtcaagaagg accttatcat caagctccca 3720
aagtatagcc tgttcgaact ggagaatggg cggaagcgga tgctcgcctc cgctggcgaa 3780
cttcagaagg gtaatgagct ggctctcccc tccaagtacg tgaatttcct ctaccttgca 3840
agccattacg agaagctgaa ggggagcccc gaggacaacg agcaaaagca actgtttgtg 3900
gagcagcata agcattatct ggacgagatc attgagcaga tttccgagtt ttctaaacgc 3960
gtcattctcg ctgatgccaa cctcgataaa gtccttagcg catacaataa gcacagagac 4020
aaaccaattc gggagcaggc tgagaatatc atccacctgt tcaccctcac caatcttggt 4080
gcccctgccg cattcaagta cttcgacacc accatcgacc ggaaacgcta tacctccacc 4140
aaagaagtgc tggacgccac cctcatccac cagagcatca ccggacttta cgaaactcgg 4200
attgacctct cacagctcgg aggggatgag ggagctccca agaaaaagcg caaggtaggt 4260
agttccggat ctccgaaaaa gaaacgcaaa gttggtagtg atgctttaga cgattttgac 4320
ttagatatgc ttggttcaga cgcgttagac gacttcggtg gaggatccat ggacagcctc 4380
ttgatgaacc ggaggaagtt tctttaccaa ttcaaaaatg tccgctgggc taagggtcgg 4440
cgtgagacct acctgtgcta cgtagtgaag aggcgtgaca gtgctacatc cttttcactg 4500
gactttggtt atcttcgcaa taagaacggc tgccacgtgg aattgctctt cctccgctac 4560
atctcggact gggacctaga ccctggccgc tgctaccgcg tcacctggtt cacctcctgg 4620
agcccctgct acgactgtgc ccgacatgtg gccgactttc tgcgagggaa ccccaacctc 4680
agtctgagga tcttcaccgc gcgcctctac ttctgtgagg accgcaaggc tgagcccgag 4740
gggctgcggc ggctgcaccg cgccggggtg caaatagcca tcatgacctt caaagattat 4800
ttttactgct ggaatacttt tgtagaaaac catgaaagaa ctttcaaagc ctgggaaggg 4860
ctgcatgaaa attcagttcg tctctccaga cagcttcggc gcatcctttt gccctga 4917
<210> 72
<211> 1638
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence nCas amino acid sequence of 9-AIDX
<400> 72
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Thr Met Asp Lys Lys Tyr Ser Ile
35 40 45
Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp
50 55 60
Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp
65 70 75 80
Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser
85 90 95
Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg
100 105 110
Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser
115 120 125
Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
130 135 140
Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe
145 150 155 160
Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile
165 170 175
Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu
180 185 190
Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His
195 200 205
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys
210 215 220
Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn
225 230 235 240
Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg
245 250 255
Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly
260 265 270
Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly
275 280 285
Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys
290 295 300
Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu
305 310 315 320
Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn
325 330 335
Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
340 345 350
Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu
355 360 365
His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu
370 375 380
Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
385 390 395 400
Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe
405 410 415
Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val
420 425 430
Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
435 440 445
Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu
450 455 460
Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys
465 470 475 480
Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu
485 490 495
Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu
500 505 510
Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser
515 520 525
Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro
530 535 540
Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr
545 550 555 560
Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg
565 570 575
Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu
580 585 590
Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
595 600 605
Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val
610 615 620
Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys
625 630 635 640
Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Lys Glu Asp Ile
645 650 655
Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met
660 665 670
Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val
675 680 685
Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser
690 695 700
Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile
705 710 715 720
Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
725 730 735
Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala
740 745 750
Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu
755 760 765
Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
770 775 780
Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile
785 790 795 800
Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys
805 810 815
Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu
820 825 830
Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
835 840 845
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr
850 855 860
Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
865 870 875 880
His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys
885 890 895
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
900 905 910
Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu
915 920 925
Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala
930 935 940
Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg
945 950 955 960
Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu
965 970 975
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
980 985 990
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg
995 1000 1005
Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His
1010 1015 1020
His Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu
1025 1030 1035
Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp
1040 1045 1050
Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln
1055 1060 1065
Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile
1070 1075 1080
Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile
1085 1090 1095
Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile
1100 1105 1110
Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu
1115 1120 1125
Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr
1130 1135 1140
Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp
1145 1150 1155
Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly
1160 1165 1170
Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala
1175 1180 1185
Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu
1190 1195 1200
Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn
1205 1210 1215
Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys
1220 1225 1230
Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu
1235 1240 1245
Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys
1250 1255 1260
Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr
1265 1270 1275
Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn
1280 1285 1290
Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp
1295 1300 1305
Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu
1310 1315 1320
Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His
1325 1330 1335
Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu
1340 1345 1350
Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe
1355 1360 1365
Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val
1370 1375 1380
Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu
1385 1390 1395
Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Glu Gly Ala Pro
1400 1405 1410
Lys Lys Lys Arg Lys Val Gly Ser Ser Gly Ser Pro Lys Lys Lys
1415 1420 1425
Arg Lys Val Gly Ser Asp Ala Leu Asp Asp Phe Asp Leu Asp Met
1430 1435 1440
Leu Gly Ser Asp Ala Leu Asp Asp Phe Gly Gly Gly Ser Met Asp
1445 1450 1455
Ser Leu Leu Met Asn Arg Arg Lys Phe Leu Tyr Gln Phe Lys Asn
1460 1465 1470
Val Arg Trp Ala Lys Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val
1475 1480 1485
Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp Phe Gly
1490 1495 1500
Tyr Leu Arg Asn Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu
1505 1510 1515
Arg Tyr Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr Arg
1520 1525 1530
Val Thr Trp Phe Thr Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg
1535 1540 1545
His Val Ala Asp Phe Leu Arg Gly Asn Pro Asn Leu Ser Leu Arg
1550 1555 1560
Ile Phe Thr Ala Arg Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu
1565 1570 1575
Pro Glu Gly Leu Arg Arg Leu His Arg Ala Gly Val Gln Ile Ala
1580 1585 1590
Ile Met Thr Phe Lys Asp Tyr Phe Tyr Cys Trp Asn Thr Phe Val
1595 1600 1605
Glu Asn His Glu Arg Thr Phe Lys Ala Trp Glu Gly Leu His Glu
1610 1615 1620
Asn Ser Val Arg Leu Ser Arg Gln Leu Arg Arg Ile Leu Leu Pro
1625 1630 1635
<210> 73
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 73
tccctcacct gttctgtcac 20
<210> 74
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 74
gctccagtaa tcactggtga 20
<210> 75
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 75
gatccagctc cagtaatcac 20
<210> 76
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 76
gtgattactg gagctggatc 20
<210> 77
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 77
atggggtacg taagctacag 20
<210> 78
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 78
gagattcgac ttttgagaga 20
<210> 79
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 79
tattactgtg caaactggga 20
<210> 80
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 80
caaactggga cggtgattac 20
<210> 81
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 81
gacggtgatt actggggcca 20
<210> 82
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 82
gttgttgcca atactttggc 20
<210> 83
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 83
atagcgtcag tctttcctgc 20
<210> 84
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 84
gtattggcaa caacctacac 20
<210> 85
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 85
aggggatccc agagatggac 20
<210> 86
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 86
tatgcttccc agtccatctc 20
<210> 87
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 87
tctgtcaaca gagtaacagc 20
<210> 88
<211> 20
<212> DNA
<213> Artificial sequence
<220>
<223> Description of artificial sequence target binding region of sgRNA
<400> 88
gtcccccctc cgaacgtgta 20
<210> 89
<211> 4
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, repetitive motif of linker
<400> 89
Ser Gly Gly Ser
1
<210> 90
<211> 5
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, repetitive motif of linker
<400> 90
Gly Ser Ser Gly Ser
1 5
<210> 91
<211> 4
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, repetitive motif of linker
<400> 91
Gly Gly Gly Ser
1
<210> 92
<211> 5
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, repetitive motif of linker
<400> 92
Gly Gly Gly Gly Ser
1 5
<210> 93
<211> 5
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, repetitive motif of linker
<400> 93
Ser Ser Ser Ser Gly
1 5
<210> 94
<211> 5
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, repetitive motif of linker
<400> 94
Gly Ser Gly Ser Ala
1 5
<210> 95
<211> 5
<212> PRT
<213> Artificial sequence
<220>
<223> Description of artificial sequence, repetitive motif of linker
<400> 95
Gly Gly Ser Gly Gly
1 5