[go: up one dir, main page]

CN113930413B - Novel CRISPR-Cas12j.23 enzyme and system - Google Patents

Novel CRISPR-Cas12j.23 enzyme and system Download PDF

Info

Publication number
CN113930413B
CN113930413B CN202010605399.1A CN202010605399A CN113930413B CN 113930413 B CN113930413 B CN 113930413B CN 202010605399 A CN202010605399 A CN 202010605399A CN 113930413 B CN113930413 B CN 113930413B
Authority
CN
China
Prior art keywords
sequence
nucleic acid
protein
cell
acid molecule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010605399.1A
Other languages
Chinese (zh)
Other versions
CN113930413A (en
Inventor
赖锦盛
宋伟彬
吕梦璐
刘宇新
赵海铭
张继红
李英男
王莹莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN202010605399.1A priority Critical patent/CN113930413B/en
Publication of CN113930413A publication Critical patent/CN113930413A/en
Application granted granted Critical
Publication of CN113930413B publication Critical patent/CN113930413B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Mycology (AREA)
  • Medicinal Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Peptides Or Proteins (AREA)

Abstract

本发明涉及核酸编辑领域,特别是规律成簇的间隔短回文重复(CRISPR)技术领域。具体而言,本发明涉及Cas效应蛋白,包含此类蛋白的融合蛋白,以及编码它们的核酸分子。本发明还涉及用于核酸编辑(例如,基因或基因组编辑)的复合物和组合物,其包含本发明的蛋白或融合蛋白,或编码它们的核酸分子。本发明还涉及用于核酸编辑(例如,基因或基因组编辑)的方法,其使用包含本发明的蛋白或融合蛋白。The present invention relates to the field of nucleic acid editing, in particular to the technical field of regularly clustered interspaced short palindromic repeats (CRISPR). Specifically, the present invention relates to Cas effector proteins, fusion proteins comprising such proteins, and nucleic acid molecules encoding them. The present invention also relates to complexes and compositions for nucleic acid editing (e.g., gene or genome editing), comprising proteins or fusion proteins of the present invention, or nucleic acid molecules encoding them. The present invention also relates to methods for nucleic acid editing (e.g., gene or genome editing), which use proteins or fusion proteins comprising the present invention.

Description

Novel CRISPR-Cas12j.23 enzymes and systems
Technical Field
The invention relates to the field of nucleic acid editing, in particular to the technical field of regularly clustered interval short palindromic repeat (CRISPR). In particular, the present invention relates to Cas effect proteins, fusion proteins comprising such proteins, and nucleic acid molecules encoding them. The invention also relates to complexes and compositions for nucleic acid editing (e.g., gene or genome editing) comprising the proteins or fusion proteins of the invention, or nucleic acid molecules encoding them. The invention also relates to methods for nucleic acid editing (e.g., gene or genome editing) using a protein or fusion protein comprising the invention.
Background
CRISPR/Cas technology is a widely used gene editing technology that uses RNA-guided specific binding of target sequences on the genome and cleavage of DNA to create double strand breaks, site-directed gene editing using biological non-homologous end joining or homologous recombination.
The CRISPR/Cas9 system is the most commonly used type II CRISPR system that recognizes the PAM motif of 3' -NGG and blunt-ends the target sequence. The CRISPR/Cas Type V system is a class of CRISPR systems newly discovered in recent two years that have a 5' -TTN motif that makes cohesive end cuts to target sequences, such as Cpf1, C2C1, casX, casY. However, the different CRISPR/Cas currently in existence each have different advantages and disadvantages. For example, cas9, C2C1 and CasX each require two RNAs for guide RNAs, whereas Cpf1 requires only one guide RNA and can be used for multiplex gene editing. CasX has a size of 980 amino acids, whereas common Cas9, C2C1, casY and Cpf1 are typically around 1300 amino acids in size. In addition, PAM sequences of Cas9, cpf1, casX, casY are all relatively complex and diverse, while C2C1 recognizes the stringent 5' -TTN, so its target site is easily predicted compared to other systems, thereby reducing potential off-target effects.
In summary, given that the currently available CRISPR/Cas systems are limited by several drawbacks, the development of a more robust new CRISPR/Cas system with versatile good performance is of great importance for the development of biotechnology.
Disclosure of Invention
The inventors of the present application have unexpectedly found a novel RNA-directed endonuclease through a large number of experiments and repeated experiments. Based on this finding, the present inventors developed a new CRISPR/Cas system and a gene editing method based on the system.
Cas effector proteins
Thus, in a first aspect, the present invention provides a protein having the amino acid sequence shown in SEQ ID NO. 1 or an ortholog, homolog, variant or functional fragment thereof, wherein said ortholog, homolog, variant or functional fragment substantially retains the biological function of the sequence from which it is derived.
In the present invention, the biological functions of the above sequences include, but are not limited to, activity of binding to a guide RNA, endonuclease activity, activity of binding to a specific site of a target sequence under the guidance of a guide RNA and cleavage.
In certain embodiments, the ortholog, homolog, variant has at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity as compared to the sequence from which it is derived.
In certain embodiments, the ortholog, homolog, variant has at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to the sequence set forth in SEQ ID NO. 1 and substantially retains the biological function of the sequence from which it is derived (e.g., activity of binding to guide RNA, endonuclease activity, activity of binding to and cleaving at a specific site of a target sequence under guide RNA).
In certain embodiments, the protein is an effector protein in a CRISPR/Cas system.
In certain embodiments, the protein of the invention comprises or consists of a sequence selected from the group consisting of seq id no:
(i) A sequence shown in SEQ ID NO. 1;
(ii) A sequence having one or more amino acid substitutions, deletions or additions (e.g., 1, 2,3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions) as compared to the sequence set forth in SEQ ID NO:1, or
(Iii) A sequence having at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to the sequence set forth in SEQ ID NO. 1.
In certain embodiments, the proteins of the invention have the amino acid sequence shown in SEQ ID NO. 1.
Derived proteins
The proteins of the invention may be derivatized, e.g., linked to another molecule (e.g., another polypeptide or protein). In general, derivatization (e.g., labeling) of a protein does not adversely affect the desired activity of the protein (e.g., binding to a guide RNA, endonuclease activity, binding to a specific site of a target sequence under the guidance of a guide RNA, and cleavage activity). Thus, the proteins of the invention are also intended to include such derivatized forms. For example, the proteins of the invention may be functionally linked (by chemical coupling, gene fusion, non-covalent linkage or otherwise) to one or more other molecular groups, such as another protein or polypeptide, detection reagents, pharmaceutical reagents, and the like.
In particular, the proteins of the invention may be linked to other functional units. For example, it may be linked to a Nuclear Localization Signal (NLS) sequence to increase the ability of the proteins of the invention to enter the nucleus. For example, it may be linked to a targeting moiety to render the proteins of the invention targeted. For example, it may be linked to a detectable label to facilitate detection of the protein of the invention. For example, it may be linked to an epitope tag to facilitate expression, detection, tracking and/or purification of the protein of the invention.
Conjugate(s)
Accordingly, in a second aspect, the present invention provides a conjugate comprising a protein as described above and a modifying moiety.
In certain embodiments, the modifying moiety is selected from an additional protein or polypeptide, a detectable label, or any combination thereof.
In certain embodiments, the additional protein or polypeptide is selected from the group consisting of an epitope tag, a reporter gene sequence, a Nuclear Localization Signal (NLS) sequence, a targeting moiety, a transcriptional activation domain (e.g., VP 64), a transcriptional repression domain (e.g., KRAB domain or SID domain), a nuclease domain (e.g., fok 1), a domain having an activity selected from the group consisting of nucleotide deaminase, methylase activity, demethylase, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity, and nucleic acid binding activity, and any combination thereof.
In certain embodiments, the conjugates of the invention comprise one or more NLS sequences, e.g., NLS of SV40 viral large T antigen. In certain exemplary embodiments, the NLS sequence is set forth in SEQ ID NO. 7. In certain embodiments, the NLS sequence is located at, near, or near the terminus (e.g., N-terminus or C-terminus) of a protein of the invention. In certain exemplary embodiments, the NLS sequence is located at, near or near the C-terminus of the protein of the invention.
In certain embodiments, the conjugates of the invention comprise an epitope tag (epi tag). Such epitope tags are well known to those skilled in the art, examples of which include, but are not limited to, his, V5, FLAG, HA, myc, VSV-G, trx, etc., and it is known to those skilled in the art how to select an appropriate epitope tag according to the desired purpose (e.g., purification, detection, or labeling).
In certain embodiments, the conjugates of the invention comprise a reporter sequence. Such reporter genes are well known to those skilled in the art, examples of which include, but are not limited to GST, HRP, CAT, GFP, hcRed, dsRed, CFP, YFP, BFP, etc.
In certain embodiments, the conjugates of the invention comprise a domain capable of binding to a DNA molecule or an intracellular molecule, such as Maltose Binding Protein (MBP), the DNA binding domain of Lex a (DBD), the DBD of GAL4, and the like.
In certain embodiments, the conjugates of the invention comprise a detectable label, such as a fluorescent dye, e.g., FITC or DAPI.
In certain embodiments, the proteins of the invention are coupled, conjugated or fused to the modifying moiety, optionally via a linker.
In certain embodiments, the modification is directly linked to the N-terminus or the C-terminus of the protein of the invention.
In certain embodiments, the modification is linked to the N-terminus or the C-terminus of the protein of the invention by a linker. Such linkers are well known in the art, examples of which include, but are not limited to, linkers comprising one or more (e.g., 1, 2,3, 4, or 5) amino acids (e.g., glu or Ser) or amino acid derivatives (e.g., ahx, β -Ala, GABA, or Ava), or PEG, etc.
Fusion proteins
In a third aspect, the present invention provides a fusion protein comprising a protein of the invention and an additional protein or polypeptide.
In certain embodiments, the additional protein or polypeptide is selected from the group consisting of an epitope tag, a reporter gene sequence, a Nuclear Localization Signal (NLS) sequence, a targeting moiety, a transcriptional activation domain (e.g., VP 64), a transcriptional repression domain (e.g., KRAB domain or SID domain), a nuclease domain (e.g., fok 1), a domain having an activity selected from the group consisting of nucleotide deaminase, methylase activity, demethylase, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity, and nucleic acid binding activity, and any combination thereof.
In certain embodiments, the fusion proteins of the invention comprise one or more NLS sequences, e.g., NLS of SV40 viral large T antigen. In certain embodiments, the NLS sequence is located at, near, or near the terminus (e.g., N-terminus or C-terminus) of a protein of the invention. In certain exemplary embodiments, the NLS sequence is located at, near or near the C-terminus of the protein of the invention.
In certain embodiments, the fusion proteins of the invention comprise an epitope tag.
In certain embodiments, the fusion proteins of the invention comprise a reporter gene sequence.
In certain embodiments, the fusion proteins of the invention comprise a domain capable of binding to a DNA molecule or an intracellular molecule.
In certain embodiments, the protein of the invention is fused to the additional protein or polypeptide, optionally via a linker.
In certain embodiments, the additional protein or polypeptide is directly linked to the N-terminus or C-terminus of the protein of the invention.
In certain embodiments, the additional protein or polypeptide is linked to the N-terminus or C-terminus of the protein of the invention by a linker.
In certain exemplary embodiments, the fusion proteins of the present invention have the amino acid sequence set forth in SEQ ID NO. 8.
The protein of the present invention, the conjugate of the present invention or the fusion protein of the present invention is not limited to the manner of production thereof, and for example, it may be produced by genetic engineering methods (recombinant techniques) or may be produced by chemical synthesis methods.
Orthotropic repeat sequences
In a fourth aspect, the invention provides an isolated nucleic acid molecule comprising or consisting of a sequence selected from the group consisting of seq id no:
(i) A sequence shown in SEQ ID NO. 4 or 6;
(ii) A sequence having a substitution, deletion, or addition of one or more bases (e.g., a substitution, deletion, or addition of 1,2, 3, 4, 5, 6, 7,8, 9, or 10 bases) as compared to the sequence set forth in SEQ ID NO. 4 or 6;
(iii) A sequence having at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to the sequence set forth in SEQ ID NO. 4 or 6;
(iv) A sequence which hybridizes under stringent conditions to a sequence as set forth in any one of (i) to (iii), or
(V) A complement of the sequence set forth in any one of (i) - (iii);
And, the sequence of any one of (ii) - (v) substantially retains the biological function of the sequence from which it is derived, which is the activity as a homeotropic repeat in a CRISPR-Cas system.
In certain embodiments, the isolated nucleic acid molecule is a homeotropic repeat in a CRISPR-Cas system.
In certain embodiments, the nucleic acid molecule comprises or consists of a sequence selected from the group consisting of seq id no:
(a) A nucleotide sequence shown as SEQ ID NO.4 or 6;
(b) A sequence which hybridizes under stringent conditions to the sequence described in (a), or
(C) The complement of the sequence set forth in (a).
In certain embodiments, the isolated nucleic acid molecule is RNA.
CRISPR/Cas complexes
In a fifth aspect, the present invention provides a complex comprising:
(i) A protein component selected from the group consisting of a protein, conjugate or fusion protein of the invention, and any combination thereof, and
(Ii) A nucleic acid component comprising, in the 5 'to 3' direction, an isolated nucleic acid molecule as described above and a targeting sequence capable of hybridising to the target sequence,
Wherein the protein component and the nucleic acid component are bound to each other to form a complex.
In certain embodiments, the targeting sequence is attached to the 3' end of the nucleic acid molecule.
In certain embodiments, the targeting sequence comprises a complement of the target sequence.
In certain embodiments, the nucleic acid component is a guide RNA in a CRISPR-Cas system.
In certain embodiments, the nucleic acid molecule is RNA.
In certain embodiments, the complex does not comprise trans-action crRNA (tracrRNA).
In certain embodiments, the complex targets a third component that is a double-stranded polynucleotide comprising a target sequence adjacent to the motif sequence.
In certain embodiments, the target sequence is located 3' to the motif sequence.
In certain embodiments, the target sequence is less than 40, 35, 30, 25, 20, 15, 10, 5, 4, 3, 2, 1 nucleotides in length from the motif sequence.
In certain embodiments, the targeting sequence is at least 5, at least 10, at least 15, at least 20, at least 25, at least 30 nucleotides in length. In certain embodiments, the targeting sequence is 10-30, or 15-25, or 15-22, or 19-25, or 19-22 nucleotides in length.
In certain embodiments, the isolated nucleic acid molecule is 55-70 nucleotides, such as 55-65 nucleotides, such as 60-65 nucleotides, such as 62-65 nucleotides, such as 63-64 nucleotides in length. In certain embodiments, the isolated nucleic acid molecule is 15-30 nucleotides, such as 15-25 nucleotides, such as 20-25 nucleotides, such as 22-24 nucleotides, such as 23 nucleotides in length.
Coding nucleic acids, vectors and host cells
In a sixth aspect, the invention provides an isolated nucleic acid molecule comprising:
(i) A nucleotide sequence encoding a protein or fusion protein of the invention;
(ii) A nucleotide sequence encoding an isolated nucleic acid molecule as described in the fourth aspect, or
(Iii) Comprising the nucleotide sequences of (i) and (ii).
In certain embodiments, the nucleotide sequence set forth in any one of (i) - (iii) is codon optimized for expression in a prokaryotic cell. In certain embodiments, the nucleotide sequence set forth in any one of (i) - (iii) is codon optimized for expression in a eukaryotic cell.
In a seventh aspect, the invention also provides a vector comprising an isolated nucleic acid molecule as described in the sixth aspect. The vector of the present invention may be a cloning vector or an expression vector. In certain embodiments, the vectors of the invention are, for example, plasmids, cosmids, phages, cosmids, and the like. In certain embodiments, the vector is capable of expressing a protein, fusion protein, isolated nucleic acid molecule as described in the fourth aspect, or complex as described in the fifth aspect of the invention in a subject (e.g., a mammal, e.g., a human).
In an eighth aspect, the invention also provides a host cell comprising an isolated nucleic acid molecule or vector as described above. Such host cells include, but are not limited to, prokaryotic cells, such as E.coli cells, and eukaryotic cells, such as yeast cells, insect cells, plant cells, and animal cells (e.g., mammalian cells, e.g., mouse cells, human cells, etc.). The cells of the invention may also be cell lines, such as 293T cells.
Composition and carrier composition
In a ninth aspect, the present invention also provides a composition comprising:
(i) A first component selected from the group consisting of a protein, conjugate, fusion protein, nucleotide sequence encoding the protein or fusion protein of the invention, and any combination thereof, and
(Ii) A second component that is, or encodes, a nucleotide sequence comprising a guide RNA;
Wherein the guide RNA comprises a direct repeat sequence and a guide sequence in the 5 'to 3' direction, the guide sequence being capable of hybridizing to a target sequence;
The guide RNA is capable of forming a complex with the protein, conjugate or fusion protein described in (i).
In certain embodiments, the orthostatic sequence is an isolated nucleic acid molecule as defined in the fourth aspect.
In certain embodiments, the targeting sequence is linked to the 3' end of the homeotropic repeat. In certain embodiments, the targeting sequence comprises a complement of the target sequence.
In certain embodiments, the composition does not comprise crRNA (tracrRNA).
In certain embodiments, the composition is non-naturally occurring or modified. In certain embodiments, at least one component of the composition is non-naturally occurring or modified. In certain embodiments, the first component is non-naturally occurring or modified, and/or the second component is non-naturally occurring or modified.
In certain embodiments, when the target sequence is DNA, the target sequence is located adjacent to the 3' end of the motif (PAM) of the protospacer sequence.
In certain embodiments, when the target sequence is RNA, the target sequence does not have PAM domain restrictions.
In certain embodiments, the target sequence is a DNA or RNA sequence from a prokaryotic or eukaryotic cell. In certain embodiments, the target sequence is a non-naturally occurring DNA or RNA sequence.
In certain embodiments, the target sequence is present in a cell. In certain embodiments, the target sequence is present in the nucleus or in the cytoplasm (e.g., organelle). In certain embodiments, the cell is a eukaryotic cell. In certain embodiments, the cell is a prokaryotic cell.
In certain embodiments, the protein has one or more NLS sequences attached. In certain embodiments, the conjugate or fusion protein comprises one or more NLS sequences. In certain embodiments, the NLS sequence is linked to the N-terminus or C-terminus of the protein. In certain embodiments, the NLS sequence is fused to the N-terminus or C-terminus of the protein.
In a tenth aspect, the present invention also provides a composition comprising one or more carriers comprising:
(i) A first nucleic acid comprising a nucleotide sequence encoding a protein or fusion protein of the invention, optionally operably linked to a first regulatory element, and
(Ii) A second nucleic acid comprising a nucleotide sequence encoding a guide RNA; optionally the second nucleic acid is operably linked to a second regulatory element;
Wherein:
The first nucleic acid and the second nucleic acid are present on the same or different vectors;
The guide RNA comprises a direct repeat sequence and a targeting sequence in the 5 'to 3' direction, the targeting sequence being capable of hybridizing to a target sequence;
the guide RNA is capable of forming a complex with the effector protein or fusion protein described in (i).
In certain embodiments, the orthostatic sequence is an isolated nucleic acid molecule as defined in the fourth aspect.
In certain embodiments, the targeting sequence is linked to the 3' end of the homeotropic repeat. In certain embodiments, the targeting sequence comprises a complement of the target sequence.
In certain embodiments, the composition does not comprise trans-action crRNA (tracrRNA).
In certain embodiments, the composition is non-naturally occurring or modified. In certain embodiments, at least one component of the composition is non-naturally occurring or modified.
In certain embodiments, the first regulatory element is a promoter, such as an inducible promoter.
In certain embodiments, the second regulatory element is a promoter, such as an inducible promoter.
In certain embodiments, when the target sequence is DNA, the target sequence is located adjacent to the 3' end of the motif (PAM) of the protospacer sequence.
In certain embodiments, when the target sequence is RNA, the target sequence does not have PAM domain restrictions.
In certain embodiments, the target sequence is a DNA or RNA sequence from a prokaryotic or eukaryotic cell. In certain embodiments, the target sequence is a non-naturally occurring DNA or RNA sequence.
In certain embodiments, the target sequence is present in a cell. In certain embodiments, the target sequence is present in the nucleus or in the cytoplasm (e.g., organelle). In certain embodiments, the cell is a eukaryotic cell. In certain embodiments, the cell is a prokaryotic cell.
In certain embodiments, the protein has one or more NLS sequences attached. In certain embodiments, the conjugate or fusion protein comprises one or more NLS sequences. In certain embodiments, the NLS sequence is linked to the N-terminus or C-terminus of the protein. In certain embodiments, the NLS sequence is fused to the N-terminus or C-terminus of the protein.
In certain embodiments, one type of vector is a plasmid, which refers to a circular double stranded DNA loop into which additional DNA fragments may be inserted, for example, by standard molecular cloning techniques. Another type of vector is a viral vector in which a virus-derived DNA or RNA sequence is present in a vector used to package a virus (e.g., retrovirus, replication-defective retrovirus, adenovirus, replication-defective adenovirus, and adeno-associated virus). Viral vectors also comprise polynucleotides carried by a virus for transfection into a host cell. Certain vectors (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors) are capable of autonomous replication in a host cell into which they are introduced. Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operably linked. Such vectors are referred to herein as "expression vectors". The common expression vectors used in recombinant DNA technology are typically in the form of plasmids.
Recombinant expression vectors may comprise the nucleic acid molecules of the invention in a form suitable for expression of the nucleic acid in a host cell, which means that these recombinant expression vectors comprise one or more regulatory elements selected on the basis of the host cell to be used for expression, said regulatory elements being operably linked to the nucleic acid sequence to be expressed.
Delivery and delivery compositions
The proteins, conjugates, fusion proteins of the invention, the isolated nucleic acid molecules according to the fourth aspect, the complexes of the invention, the isolated nucleic acid molecules according to the sixth aspect, the vectors according to the seventh aspect, the compositions according to the ninth and tenth aspects may be delivered by any method known in the art. Such methods include, but are not limited to, electroporation, lipofection, nuclear transfection, microinjection, sonoporation, gene gun, calcium phosphate mediated transfection, cationic transfection, lipofection, dendritic transfection, heat shock transfection, nuclear transfection, magnetic transfection, lipofection, puncture transfection, optical transfection, reagent enhanced nucleic acid uptake, and delivery via liposomes, immunoliposomes, virosomes, artificial virosomes, and the like.
Accordingly, in another aspect, the present invention provides a delivery composition comprising a delivery vehicle and one or more selected from the group consisting of a protein of the invention, a conjugate, a fusion protein, an isolated nucleic acid molecule as described in the fourth aspect, a complex of the invention, an isolated nucleic acid molecule as described in the sixth aspect, a vehicle as described in the seventh aspect, and a composition as described in the ninth and tenth aspects.
In certain embodiments, the delivery vehicle is a particle.
In certain embodiments, the delivery vehicle is selected from a lipid particle, a sugar particle, a metal particle, a protein particle, a liposome, an exosome, a microbubble, a gene gun, or a viral vector (e.g., replication defective retrovirus, lentivirus, adenovirus, or adeno-associated virus).
Kit for detecting a substance in a sample
In another aspect, the invention provides a kit comprising one or more of the components described above. In certain embodiments, the kit comprises one or more components selected from the group consisting of a protein of the invention, a conjugate, a fusion protein, an isolated nucleic acid molecule as described in the fourth aspect, a complex of the invention, an isolated nucleic acid molecule as described in the sixth aspect, a vector as described in the seventh aspect, and a composition as described in the ninth and tenth aspects.
In certain embodiments, the kits of the invention comprise a composition as described in the ninth aspect. In certain embodiments, the kit further comprises instructions for using the composition.
In certain embodiments, the kits of the invention comprise a composition as described in the tenth aspect. In certain embodiments, the kit further comprises instructions for using the composition.
In certain embodiments, the components contained in the kits of the invention may be provided in any suitable container.
In certain embodiments, the kit further comprises one or more buffers. The buffer may be any buffer including, but not limited to, sodium carbonate buffer, sodium bicarbonate buffer, borate buffer, tris buffer, MOPS buffer, HEPES buffer, and combinations thereof. In certain embodiments, the buffer is alkaline. In certain embodiments, the buffer has a pH of from about 7 to about 10.
In certain embodiments, the kit further comprises one or more oligonucleotides corresponding to a targeting sequence for insertion into a vector, such that the targeting sequence and the regulatory element are operably linked. In certain embodiments, the kit comprises a homologous recombination template polynucleotide.
Method and use
In another aspect, the invention provides a method of modifying a target gene comprising contacting a complex according to the fifth aspect, a composition according to the ninth aspect or a composition according to the tenth aspect with the target gene or delivering into a cell comprising the target gene, the target sequence being present in the target gene.
In certain embodiments, the target gene is present in a cell. In certain embodiments, the cell is a prokaryotic cell. In certain embodiments, the cell is a eukaryotic cell. In certain embodiments, the cell is a mammalian cell. In certain embodiments, the cell is a human cell. In certain embodiments, the cell is selected from a non-human primate, bovine, porcine, or rodent cell. In certain embodiments, the cell is a non-mammalian eukaryotic cell, such as poultry or fish, and the like. In certain embodiments, the cell is a plant cell, e.g., a cell of a cultivated plant (e.g., cassava, maize, sorghum, wheat, or rice), algae, tree, or vegetable.
In certain embodiments, the target gene is present in an in vitro nucleic acid molecule (e.g., plasmid). In certain embodiments, the target gene is present in a plasmid.
In certain embodiments, the method results in cleavage of the target sequence (e.g., cleavage of a DNA double strand or RNA single strand)
In certain embodiments, the disruption results in reduced transcription of the target gene.
In certain embodiments, the method further comprises contacting an editing template (e.g., an exogenous nucleic acid) with the target gene, or delivering into a cell comprising the target gene. In such embodiments, the method repairs the fragmented target gene by homologous recombination with an editing template (e.g., an exogenous nucleic acid), wherein the repair results in a mutation comprising an insertion, deletion, or substitution of one or more nucleotides of the target gene. In certain embodiments, the mutation results in one or more amino acid changes in a protein expressed from a gene comprising the target sequence.
Thus, in certain embodiments, the modification further comprises inserting an editing template (e.g., an exogenous nucleic acid) into the break.
In certain embodiments, the protein, conjugate, fusion protein, isolated nucleic acid molecule, complex, vector or composition is contained in a delivery vehicle.
In certain embodiments, the delivery vehicle is selected from the group consisting of a lipid particle, a sugar particle, a metal particle, a protein particle, a liposome, an exosome, a viral vector (e.g., replication defective retrovirus, lentivirus, adenovirus, or adeno-associated virus).
In certain embodiments, the methods are used to alter one or more target sequences in a target gene or nucleic acid molecule encoding a target gene product to modify a cell, cell line, or organism.
In another aspect, the invention provides a method of altering expression of a gene product comprising contacting a complex according to the fifth aspect, a composition according to the ninth aspect or a composition according to the tenth aspect with a nucleic acid molecule encoding said gene product, or delivering into a cell comprising said nucleic acid molecule, said target sequence being present in said nucleic acid molecule.
In certain embodiments, the nucleic acid molecule is present in a cell. In certain embodiments, the cell is a prokaryotic cell. In certain embodiments, the cell is a eukaryotic cell. In certain embodiments, the cell is a mammalian cell. In certain embodiments, the cell is a human cell. In certain embodiments, the cell is selected from a non-human primate, bovine, porcine, or rodent cell. In certain embodiments, the cell is a non-mammalian eukaryotic cell, such as poultry or fish, and the like. In certain embodiments, the cell is a plant cell, e.g., a cell of a cultivated plant (e.g., cassava, maize, sorghum, wheat, or rice), algae, tree, or vegetable.
In certain embodiments, the nucleic acid molecule is present in an in vitro nucleic acid molecule (e.g., a plasmid). In certain embodiments, the nucleic acid molecule is present in a plasmid.
In certain embodiments, expression of the gene product is altered (e.g., increased or decreased). In certain embodiments, expression of the gene product is enhanced. In certain embodiments, expression of the gene product is reduced.
In certain embodiments, the gene product is a protein.
In certain embodiments, the protein, conjugate, fusion protein, isolated nucleic acid molecule, complex, vector or composition is contained in a delivery vehicle.
In certain embodiments, the delivery vehicle is selected from the group consisting of a lipid particle, a sugar particle, a metal particle, a protein particle, a liposome, an exosome, a viral vector (e.g., replication defective retrovirus, lentivirus, adenovirus, or adeno-associated virus).
In certain embodiments, the methods are used to alter one or more target sequences in a target gene or nucleic acid molecule encoding a target gene product to modify a cell, cell line, or organism.
In another aspect, the invention relates to a protein according to the first aspect, a conjugate according to the second aspect, a fusion protein according to the third aspect, an isolated nucleic acid molecule according to the fourth aspect, a complex according to the fifth aspect, an isolated nucleic acid molecule according to the sixth aspect, a vector according to the seventh aspect, a composition according to the ninth aspect, a composition according to the tenth aspect, a kit of the invention or a delivery composition for nucleic acid editing.
In certain embodiments, the nucleic acid editing comprises genetic or genomic editing, such as modifying a gene, knocking out a gene, altering expression of a gene product, repairing a mutation, and/or inserting a polynucleotide.
In another aspect, the invention relates to the use of a protein according to the first aspect, a conjugate according to the second aspect, a fusion protein according to the third aspect, an isolated nucleic acid molecule according to the fourth aspect, a complex according to the fifth aspect, an isolated nucleic acid molecule according to the sixth aspect, a vector according to the seventh aspect, a composition according to the ninth aspect, a composition according to the tenth aspect, a kit of the invention or a delivery composition for the preparation of a formulation for:
(i) Ex vivo gene or genome editing;
(ii) Detecting in-vitro single-stranded DNA;
(iii) Editing a target sequence in a target locus to modify an organism or a non-human organism;
(iv) Treating a condition caused by a defect in a target sequence in a target locus.
Cell and cell progeny
In some cases, the modification introduced into the cells by the methods of the invention may cause the cells and their progeny to be altered to improve the production of their biological products (e.g., antibodies, starch, ethanol, or other desired cell output). In some cases, the modification introduced into the cells by the methods of the invention may be such that the cells and their progeny include alterations that alter the biological product produced.
Thus, in a further aspect, the invention also relates to a cell or progeny thereof obtained by the method as described above, wherein the cell contains a modification not present in its wild type.
The invention also relates to a cell product of a cell or progeny thereof as described above.
The present invention also relates to an in vitro, ex vivo or in vivo cell or cell line or progeny thereof comprising a protein according to the first aspect, a conjugate according to the second aspect, a fusion protein according to the third aspect, an isolated nucleic acid molecule according to the fourth aspect, a complex according to the fifth aspect, an isolated nucleic acid molecule according to the sixth aspect, a vector according to the seventh aspect, a composition according to the ninth aspect, a composition according to the tenth aspect, a kit of the invention or a delivery composition.
In certain embodiments, the cell is a prokaryotic cell.
In certain embodiments, the cell is a eukaryotic cell. In certain embodiments, the cell is a mammalian cell. In certain embodiments, the cell is a human cell. In certain embodiments, the cell is a non-human mammalian cell, e.g., a non-human primate, bovine, ovine, porcine, canine, simian, rabbit, rodent (e.g., rat or mouse) cell. In certain embodiments, the cells are non-mammalian eukaryotic cells, such as cells of poultry birds (e.g., chickens), fish, or crustaceans (e.g., clams, shrimps). In certain embodiments, the cell is a plant cell, e.g., a cell of a monocot or dicot or a cell of a cultivated plant or a food crop such as tapioca, corn, sorghum, soybean, wheat, oat, or rice, e.g., an algae, tree, or production plant, fruit, or vegetable (e.g., a tree such as citrus, nut, eggplant, cotton, tobacco, tomato, grape, coffee, cocoa, etc.).
In certain embodiments, the cell is a stem cell or stem cell line.
Definition of terms
In the present invention, unless otherwise indicated, scientific and technical terms used herein have the meanings commonly understood by one of ordinary skill in the art. Further, the procedures of molecular genetics, nucleic acid chemistry, molecular biology, biochemistry, cell culture, microbiology, cell biology, genomics and recombinant DNA, etc., as used herein, are all conventional procedures widely used in the corresponding field. Meanwhile, in order to better understand the present invention, definitions and explanations of related terms are provided below.
In the present invention, the expression "cas12j.23" refers to a Cas effector protein, which the inventors first found and identified, having an amino acid sequence selected from the group consisting of:
(i) A sequence shown in SEQ ID NO. 1;
(ii) A sequence having one or more amino acid substitutions, deletions or additions (e.g., 1, 2,3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions) as compared to the sequence set forth in SEQ ID NO:1, or
(Iii) A sequence having at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to the sequence set forth in SEQ ID NO. 1.
The cas12j.23 of the present invention is an endonuclease that binds to and cleaves specific sites of a target sequence under the guide of guide RNA, and has both DNA and RNA endonuclease activity.
As used herein, the term "regularly clustered, spaced short palindromic repeats (CRISPR) -CRISPR-associated (Cas) (CRISPR-Cas) system" or "CRISPR system" is used interchangeably and has the meaning commonly understood by those skilled in the art, which generally comprises transcripts or other elements related to the expression of a CRISPR-associated ("Cas") gene, or transcripts or other elements capable of directing the activity of the Cas gene. Such transcripts or other elements may comprise sequences encoding Cas effect proteins and guide RNAs comprising CRISPR RNA (crrnas), as well as trans-acting crRNA (tracrRNA) sequences contained in the CRISPR-Cas9 system, or other sequences or transcripts from the CRISPR locus. In the CRISPR system based on cas12j.23 according to the present invention, no tracrRNA sequence is required.
As used herein, the terms "Cas effector protein", "Cas effector enzyme" are used interchangeably and refer to any protein that is present in a CRISPR-Cas system that is greater than 800 amino acids in length. In certain instances, such proteins refer to proteins identified from Cas loci.
As used herein, the terms "targeting RNA (guide RNA)", "mature crRNA" are used interchangeably and have the meaning commonly understood by those of skill in the art. In general, the guide RNA can comprise, consist essentially of, or consist of, a direct (direct) repeat sequence and a guide sequence (spacer), also referred to in the context of endogenous CRISPR systems. In certain instances, a targeting sequence is any polynucleotide sequence that has sufficient complementarity to a target sequence to hybridize to the target sequence and direct specific binding of a CRISPR/Cas complex to the target sequence. In certain embodiments, the degree of complementarity between a targeting sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99% when optimally aligned. It is within the ability of one of ordinary skill in the art to determine the optimal alignment. For example, there are published and commercially available alignment algorithms and programs such as, but not limited to, the Smith-Waterman algorithm (Smith-Waterman), bowtie, geneious, biopython, and SeqMan in ClustalW, matlab.
In certain instances, the targeting sequence is at least 5, at least 10, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 35, at least 40, at least 45, or at least 50 nucleotides in length. In some cases, the targeting sequence is no more than 50, 45, 40, 35, 30, 25, 24, 23, 22, 21, 20, 15, 10 or fewer nucleotides in length. In certain embodiments, the targeting sequence is 10-30, or 15-25, or 15-22, or 19-25, or 19-22 nucleotides in length.
In certain instances, the orthostatic repeat is at least 10, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, or at least 70 nucleotides in length. In some cases, the orthostatic repeat is no more than 70, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 50, 45, 40, 35, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 15, 10 or fewer nucleotides in length. In certain embodiments, the orthostatic repeat is 55-70 nucleotides, such as 55-65 nucleotides, such as 60-65 nucleotides, such as 62-65 nucleotides, such as 63-64 nucleotides in length. In certain embodiments, the orthostatic repeat is 15-30 nucleotides, such as 15-25 nucleotides, such as 20-25 nucleotides, such as 22-24 nucleotides, such as 23 nucleotides in length.
As used herein, the term "CRISPR/Cas complex" refers to a ribonucleoprotein complex formed by the binding of a guide RNA (guide RNA) or mature crRNA to a Cas protein that comprises a guide sequence that hybridizes to a target sequence and binds to a Cas protein. The ribonucleoprotein complex is capable of recognizing and cleaving a polynucleotide that hybridizes to the guide RNA or mature crRNA.
Thus, in the context of forming a CRISPR/Cas complex, a "target sequence" refers to a polynucleotide that is targeted by a guide sequence designed to have targeting, e.g., a sequence that has complementarity to the guide sequence, wherein hybridization between the target sequence and the guide sequence will promote the formation of the CRISPR/Cas complex. Complete complementarity is not necessary so long as sufficient complementarity exists to cause hybridization and promote the formation of a CRISPR/Cas complex. The target sequence may comprise any polynucleotide, such as DNA or RNA. In some cases, the target sequence is located in the nucleus or cytoplasm of the cell. In some cases, the target sequence may be located within an organelle of a eukaryotic cell, such as a mitochondria or chloroplast. Sequences or templates that can be used for recombination into a target locus comprising the target sequence are referred to as "editing templates" or "editing polynucleotides" or "editing sequences". In certain embodiments, the editing template is an exogenous nucleic acid. In certain embodiments, the recombination is homologous recombination.
In the present invention, the expression "target sequence" or "target polynucleotide" may be any polynucleotide that is endogenous or exogenous to a cell (e.g., eukaryotic cell). For example, the target polynucleotide may be a polynucleotide that is present in the nucleus of a eukaryotic cell. The target polynucleotide may be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or unwanted DNA). In some cases, it is believed that the target sequence should be associated with a Protospacer Adjacent Motif (PAM). The exact sequence and length requirements for PAM vary depending on the Cas effector enzyme used, but PAM is typically a 2-5 base pair sequence adjacent to the protospacer sequence (i.e., target sequence). Those skilled in the art are able to identify PAM sequences for use with a given Cas effector protein. Herein, "specific motif sequence recognized by Cas protein" or "motif sequence" refers to PAM sequence.
In some cases, the target sequence or target polynucleotide may include a plurality of disease-related genes and polynucleotides and signaling biochemical pathway-related genes and polynucleotides. Non-limiting examples of such target sequences or target polynucleotides include those listed in U.S. provisional patent applications 61/736,527 and 61/748,427, filed 12/2012 and 1/2/2013, respectively, international application PCT/US2013/074667 filed 12/2013, which is incorporated herein by reference in its entirety.
In some cases, examples of target sequences or target polynucleotides include sequences associated with signaling biochemical pathways, such as signaling biochemical pathway-associated genes or polynucleotides. Examples of target polynucleotides include disease-related genes or polynucleotides. By "disease-related" gene or polynucleotide is meant any gene or polynucleotide that produces a transcriptional or translational product at an abnormal level or in an abnormal form in cells derived from a tissue affected by a disease, as compared to a tissue or cell not affected by the disease. Where altered expression is associated with the appearance and/or progression of a disease, it may be a gene that is expressed at an abnormally high level, or it may be a gene that is expressed at an abnormally low level. Disease-related genes also refer to genes having one or more mutations or genetic variations directly responsible for or in linkage disequilibrium with one or more genes responsible for the etiology of the disease. The transcribed or translated product may be known or unknown and may be at normal or abnormal levels.
As used herein, the term "wild-type" has the meaning commonly understood by those skilled in the art, which refers to a typical form of an organism, strain, gene, or a characteristic that, when it exists in nature, differs from a mutant or variant form, which may be isolated from a source in nature and not intentionally modified by man.
As used herein, the terms "non-naturally occurring" or "engineered" are used interchangeably and refer to human involvement. When these terms are used to describe a nucleic acid molecule or polypeptide, it means that the nucleic acid molecule or polypeptide is at least substantially free from at least one other component to which it is associated in nature or as found in nature.
As used herein, the term "ortholog (orthologue, ortholog)" has the meaning commonly understood by those skilled in the art. As a further guidance, an "ortholog" of a protein as described herein refers to a protein belonging to a different species that performs the same or similar function as the protein as its ortholog.
As used herein, the term "identity" is used to refer to the match of sequences between two polypeptides or between two nucleic acids. When a position in both sequences being compared is occupied by the same base or amino acid monomer subunit (e.g., a position in each of two DNA molecules is occupied by adenine, or a position in each of two polypeptides is occupied by lysine), then the molecules are identical at that position. The "percent identity" between two sequences is a function of the number of matched positions shared by the two sequences divided by the number of positions to be compared x 100. For example, if 6 out of 10 positions of two sequences match, then the two sequences have 60% identity. For example, the DNA sequences CTGACT and CAGGTT share 50% identity (3 out of 6 positions in total are matched). Typically, the comparison is made when two sequences are aligned to produce maximum identity. Such alignment may be achieved using, for example, the method of Needleman et al (1970) J.mol.biol.48:443-453, which may be conveniently performed by a computer program such as the Align program (DNAstar, inc.). The percent identity between two amino acid sequences can also be determined using the algorithm of E.Meyers and W.Miller (Comput. Appl biosci.,4:11-17 (1988)) which has been integrated into the ALIGN program (version 2.0), using the PAM120 weight residue table (weight residue table), the gap length penalty of 12 and the gap penalty of 4. Furthermore, percent identity between two amino acid sequences can be determined using the Needleman and Wunsch (J MoI biol.48:444-453 (1970)) algorithms that have been incorporated into the GAP program of the GCG software package (available on www.gcg.com) using the Blossum 62 matrix or PAM250 matrix and the GAP weights (GAP WEIGHT) of 16, 14, 12, 10, 8, 6 or 4 and the length weights of 1,2, 3, 4, 5 or 6.
As used herein, the term "vector" refers to a nucleic acid vehicle into which a polynucleotide may be inserted. When a vector enables expression of a protein encoded by an inserted polynucleotide, the vector is referred to as an expression vector. The vector may be introduced into a host cell by transformation, transduction or transfection such that the genetic material elements carried thereby are expressed in the host cell. Vectors are well known to those skilled in the art and include, but are not limited to, plasmids, phagemids, cosmids, artificial chromosomes, such as Yeast Artificial Chromosomes (YACs), bacterial Artificial Chromosomes (BACs) or P1-derived artificial chromosomes (PACs), phages, such as lambda or M13 phages, animal viruses and the like. Animal viruses that may be used as vectors include, but are not limited to, retrovirus (including lentivirus), adenovirus, adeno-associated virus, herpes virus (e.g., herpes simplex virus), poxvirus, baculovirus, papilloma virus, papilloma vacuolation virus (e.g., SV 40). A vector may contain a variety of elements that control expression, including, but not limited to, promoter sequences, transcription initiation sequences, enhancer sequences, selection elements, and reporter genes. In addition, the vector may also contain a replication origin.
As used herein, the term "host cell" refers to a cell that can be used to introduce a vector, including, but not limited to, a prokaryotic cell such as e.g. escherichia coli or bacillus subtilis, a fungal cell such as e.g. yeast cells or aspergillus, an insect cell such as e.g. S2 drosophila cells or Sf9, or an animal cell such as e.g. fibroblasts, CHO cells, COS cells, NSO cells, heLa cells, BHK cells, HEK 293 cells or human cells.
Those skilled in the art will appreciate that the design of the expression vector may depend on factors such as the choice of host cell to be transformed, the desired level of expression, and the like. A vector may be introduced into a host cell to thereby produce a transcript, protein, or peptide, including from a protein, fusion protein, isolated nucleic acid molecule, or the like (e.g., a CRISPR transcript, such as a nucleic acid transcript, protein, or enzyme) as described herein.
As used herein, the term "regulatory element" is intended to include promoters, enhancers, internal Ribosome Entry Sites (IRES), and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals and poly U sequences), a detailed description of which may be found in Goeddel, gene expression techniques: enzymatic methods (GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY) 185, academic Press (ACADEMIC PRESS), san Diego, calif., 1990. In some cases, regulatory elements include those sequences that direct constitutive expression of a nucleotide sequence in many types of host cells as well as those sequences that direct expression of the nucleotide sequence in only certain host cells (e.g., tissue-specific regulatory sequences). Tissue-specific promoters may primarily direct expression in a desired tissue of interest, such as muscle, neurons, bone, skin, blood, specific organs (e.g., liver, pancreas), or specific cell types (e.g., lymphocytes). In some cases, regulatory elements may also direct expression in a time-dependent manner (e.g., in a cell cycle-dependent or developmental stage-dependent manner), which may or may not be tissue or cell type specific. In some cases, the term "regulatory element" encompasses enhancer elements such as WPRE, CMV enhancer, R-U5' fragment in the LTR of HTLV-I ((mol. Cell. Biol., vol. 8 (1), pp. 466-472, 1988)), SV40 enhancer, and intron sequences between exons 2 and 3 of rabbit beta-globin (Proc. Natl. Acad. Sci. USA., vol. 78 (3), pp. 1527-31, 1981).
As used herein, the term "promoter" has a meaning well known to those skilled in the art and refers to a non-coding nucleotide sequence located upstream of a gene that is capable of initiating expression of a downstream gene. A constitutive (constitutive) promoter is a nucleotide sequence which, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of the gene product in a cell under most or all physiological conditions of the cell. An inducible promoter is a nucleotide sequence which, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of the gene product in a cell, essentially only when an inducer corresponding to the promoter is present in the cell. A tissue-specific promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of the gene product in a cell, essentially only if the cell is a cell of the tissue type to which the promoter corresponds.
As used herein, the term "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the one or more regulatory elements in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
As used herein, the term "complementarity" refers to the ability of a nucleic acid to form one or more hydrogen bonds with another nucleic acid sequence by means of a conventional watson-crick or other non-conventional type. Percent complementarity means the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (e.g., watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5,6,7, 8, 9, 10 of 10 are 50%, 60%, 70%, 80%, 90%, and 100% complementary). "fully complementary" means that all consecutive residues of one nucleic acid sequence form hydrogen bonds with the same number of consecutive residues in one second nucleic acid sequence. "substantially complementary" as used herein refers to a degree of complementarity of at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50 or more nucleotides, or to two nucleic acids that hybridize under stringent conditions.
As used herein, "stringent conditions" for hybridization refers to conditions under which a nucleic acid having complementarity to a target sequence hybridizes predominantly to the target sequence and does not substantially hybridize to non-target sequences. Stringent conditions are typically sequence-dependent and will vary depending on a number of factors. In general, the longer the sequence, the higher the temperature at which the sequence specifically hybridizes to its target sequence. Non-limiting examples of stringent conditions are described in thesen (Tijssen) (1993) in biochemistry and molecular biology, laboratory techniques-nucleic acid probe hybridization "(Laboratory Techniques In BiochemistryAnd Molecular Biology-Hybridization With Nucleic Acid Probes),, section I, second chapter, "overview of hybridization principles and nucleic acid probe analysis strategies" ("Overview of principles of hybridization ANDTHE STRATEGY of nucleic acid probe assay"), elsevier, new york.
As used herein, the term "hybridization" refers to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding of bases between the nucleotide residues. Hydrogen bonding may occur by watson-crick base pairing, hoogstein binding, or in any other sequence-specific manner. The complex may comprise two strands forming a duplex, three or more strands forming a multi-strand complex, a single self-hybridizing strand, or any combination of these. Hybridization reactions may constitute a step in a broader process, such as the start of PCR, or cleavage of polynucleotides via an enzyme. A sequence that hybridizes to a given sequence is referred to as the "complement" of the given sequence.
As used herein, the term "expression" refers to a process whereby a polynucleotide is transcribed from a DNA template (e.g., into mRNA or other RNA transcript) and/or a process whereby the transcribed mRNA is subsequently translated into a peptide, polypeptide, or protein. Transcripts and encoded polypeptides may be collectively referred to as "gene products". If the polynucleotide is derived from genomic DNA, expression may include splicing of mRNA in eukaryotic cells.
As used herein, the term "linker" refers to a linear polypeptide formed from multiple amino acid residues joined by peptide bonds. The linker of the invention may be an amino acid sequence that is synthesized artificially, or a naturally occurring polypeptide sequence, such as a polypeptide having the function of a hinge region. Such linker polypeptides are well known in the art (see, e.g., holliger, P. Et al (1993) Proc. Natl. Acad. Sci. USA 90:6444-6448; poljak, R.J. Et al (1994) Structure 2:1121-1123).
As used herein, the term "treating" refers to treating or curing a disorder, delaying the onset of symptoms of a disorder, and/or delaying the progression of a disorder.
As used herein, the term "subject" includes, but is not limited to, various animals, such as mammals, e.g., bovine, equine, ovine, porcine, canine, feline, lagomorph, rodent (e.g., mouse or rat), non-human primate (e.g., cynomolgus monkey or cynomolgus monkey), or human. In certain embodiments, the subject (e.g., human) has a disorder (e.g., a disorder resulting from a disease-related gene defect).
Advantageous effects of the invention
Compared with the prior art, the Cas protein and the system have obvious advantages. For example, the Cas effector proteins of the present invention are smaller in molecular size than Cas9, C2C1, casY, and Cpf1 proteins, and thus are superior to Cas9, C2C1, casY, and Cpf1 proteins in transfection efficiency. For example, the Cas effector protein of the present invention is capable of DNA cleavage in eukaryotic organisms, and the cleavage activity of the Cas protein in human cell lines is significantly stronger than FnCpf's 1, which have been reported to have a PAM domain of 5' -TTN. For example, cas proteins of the invention have a more stringent PAM recognition method, which can reduce off-target effects.
Embodiments of the present invention will be described in detail below by way of examples, but it will be understood by those skilled in the art that the following examples are only for illustrating the present invention and are not to be construed as limiting the scope of the present invention. Various objects and advantageous aspects of the present invention will become apparent to those skilled in the art from the following detailed description of the preferred embodiments.
Sequence information
The information of the partial sequences to which the present invention relates is provided in table 1 below.
TABLE 1 description of the sequences
Detailed Description
The invention will now be described with reference to the following examples, which are intended to illustrate the invention, but not to limit it.
The experiments and methods described in the examples were performed substantially in accordance with conventional methods well known in the art and described in various references unless specifically indicated. For example, conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cytobiology, genomics and recombinant DNA used in the present invention can be seen in Sammbruk (Sambrook), fries (Fritsch) and Meniere's (Maniatis), molecular cloning: laboratory Manual (MOLECULAR CLONING: A LABORATORY MANUAL), 2 nd edit (1989), contemporary molecular biology laboratory Manual (CURRENT PROTOCOLS IN MOLECULAR BIOLOGY) (F.M. Ausubel et al edit (1987)), enzymatic methods (METHODS IN ENZYMOLOGY) series (academic Press): PCR 2: practical methods (PCR 2:A PRACTICAL APPROACH) (M.J. MaxPherson), B.D. Black (Hames) and G.R. Taylor edit (1995)), and Harlos (French's) and French (1987)) and antibody (French 7) in CULTURE (French 7) series (French 7) and animal Ind (French 7) in CULTURE (French (1987).
In addition, the specific conditions are not specified in the examples, and the process is carried out according to conventional conditions or conditions recommended by the manufacturer. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention. Those skilled in the art will appreciate that the examples describe the invention by way of example and are not intended to limit the scope of the invention as claimed. All publications and other references mentioned herein are incorporated by reference in their entirety.
The sources of some of the reagents involved in the following examples are as follows:
LB liquid medium, 10g Tryptone (Tryptone), 5g Yeast Extract (Yeast Extract), 10g NaCl, constant volume to 1L, and sterilizing. If antibiotics are to be added, the medium is cooled and then added to a final concentration of 50. Mu.g/ml.
Chloroform/isoamyl alcohol 240ml chloroform was added with 10ml isoamyl alcohol and mixed well.
RNP buffer 100mM sodium chloride, 50mM Tris-HCl,10mM MgCl 2, 100. Mu.g/ml BSA, pH 7.9.
Prokaryotic expression vectors pACYC-Duet-1 and pUC19 were purchased from Beijing full gold biotechnology Co.
Coli competent EC100 was purchased from Epicentre.
EXAMPLE 1 acquisition of the Cas12j.23 Gene and Cas12j.23 guide RNA
1. CRISPR and annotation of genes all proteins were obtained by gene annotation of microbial genome and metagenome data from NCBI and JGI databases using Prodigal, while CRISPR locus annotation was performed with Piler-CR, parameters were default parameters.
2. Protein filtering, namely removing redundancy of the annotated protein through sequence consistency, removing the protein with completely consistent sequence, and dividing the protein with the length of more than 800 amino acids into macromolecular proteins. Since all the second class of CRISPR/Cas systems currently found have effector proteins longer than 900 amino acids, we only consider macromolecular proteins when mining CRISPR effector proteins in order to reduce computational complexity.
3. And (3) obtaining the CRISPR related macromolecular proteins, namely extending each CRISPR seat by 10Kb from the upstream to the downstream, and identifying the non-redundant macromolecular proteins in the CRISPR adjacent interval.
4. And clustering CRISPR related macromolecular proteins, namely carrying out internal pairwise comparison on non-redundant macromolecular CRISPR related proteins by using BLASTP, and outputting the comparison result Evalue < 1E-10. The output of BLASTP was cluster analyzed using MCL, CRISPR-associated protein family.
5. Identification of CRISPR-enriched macromolecular protein family the comparison of proteins of CRISPR-related protein family using BLASTP to a non-redundant macromolecular protein database depleted of CRISPR-related proteins, output Evalue <1E-10 comparison. If a non-CRISPR-associated protein database finds less than 100% homologous protein, then this indicates that the proteins of this family are enriched in the CRISPR region, by which we identify a family of CRISPR-enriched macromolecular proteins.
6. Annotation of protein functions and domains CRISPR enriched macromolecular protein family was annotated with Pfam database, NR database and Cas protein collected from NCBI, resulting in a new CRISPR/Cas protein family. Multiple sequence alignments were performed for each CRISPR/Cas family protein using Mafft, followed by conserved domain analysis using JPred and HHpred to identify RuvC domain containing protein families.
Based on the above, the inventor obtains a brand new Cas effect protein, namely Cas12j.23, the protein sequence is shown as SEQ ID NO. 1, and the nucleotide sequence of the encoded protein is shown as SEQ ID NO. 2. The prototype orthostatic repeat sequence (repeat sequence contained in pre-crRNA) corresponding to Cas12j.23 is shown in SEQ ID NO. 3, and the mature orthostatic repeat sequence (repeat sequence contained in mature crRNA) corresponding to Cas12j.23 is shown in SEQ ID NO. 5.
Example 2 processing of original crRNA by the Cas12j.23 Gene
1. The double-stranded DNA molecule shown in SEQ ID NO.2 is synthesized artificially, and the double-stranded DNA molecule shown in SEQ ID NO. 4 is synthesized artificially.
2. The double-stranded DNA molecule synthesized in step 1 was ligated with the prokaryotic expression vector pACYC-Duet-1 to obtain a recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j.23.
The recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j.23 was sequenced. Sequencing results show that the recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j.23 contains sequences shown by SEQ ID NO. 2 and SEQ ID NO. 4, and expresses the original homologous repeated sequences of the Cas12j.23 protein shown by SEQ ID NO. 1 and the Cas12j.23 shown by SEQ ID NO. 3. The recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j.23 is introduced into escherichia coli EC100 to obtain recombinant bacteria, and the recombinant bacteria are named EC100/pACYC-Duet-1+CRISPR/Cas12j.23.
3. The monoclonal of EC100/pACYC-Duet-1+CRISPR/Cas12j.23 is inoculated into 100mL of LB liquid medium (containing 50 mug/mL of ampicillin) and cultured for 12h at 37 ℃ and 200rpm in an oscillating way, so as to obtain a culture bacterial liquid.
4. Bacterial RNA extraction 1.5mL of bacterial culture was transferred to a pre-chilled microcentrifuge tube and centrifuged at 6000 Xg for 5 minutes at 4 ℃. After centrifugation, the supernatant was discarded, and the cell pellet was resuspended in 200. Mu.L Max Bacterial ENHANCEMENT REAGENT preheated to 95℃and mixed by pipetting. Incubate at 95 ℃ for 4 min. 1mL was added to the lysateAnd the mixture was blown and sucked, and incubated at room temperature for 5 minutes. 0.2mL of cold chloroform was added, the tube was shaken by hand to mix for 15 seconds, and incubated at room temperature for 2-3 minutes. Centrifuge at 4 ℃,12,000Xg for 15 minutes. 600. Mu.L of the supernatant was placed in a fresh tube, 0.5mL of cold isopropanol was added to precipitate RNA, mixed upside down, and incubated at room temperature for 10 minutes. Centrifuge at 15,000Xg for 10min at 4 ℃, discard supernatant, add 1mL of 75% ethanol, vortex mix. Centrifuge at 4 ℃,7500 Xg for 5 minutes, discard supernatant and air dry. RNA pellet was dissolved in 50. Mu.L RNase-FREE WATER and incubated at 60℃for 10min.
5. Digestion of DNA 20ugRNA was dissolved to 39.5. Mu.L ddH2O,65℃for 5min. On ice for 5min, 0.5. Mu.L RNAI, 5. Mu.L buffer, 5. Mu.L DNaseI,37℃for 45min (50. Mu.L system) was added. Add 50. Mu.L ddH2O and adjust the volume to 100. Mu.L. After centrifugation of 2mL Phase-Lock tube 16000g for 30s, 100. Mu.L phenol/chloroform/isoamyl alcohol (25:24:1) and 100. Mu.L digested RNA were added and centrifuged at 16000g for 12min at 15s with shaking. The supernatant was placed in a new 1.5mL centrifuge tube, and 1/10NaoAC of isopropyl alcohol was added to the supernatant in equal volume, and reacted for 1 hour or overnight at-20 ℃. Centrifuge at 4 ℃,16000g for 30min, discard supernatant. The precipitate was washed with 350. Mu.L of 75% ethanol, centrifuged at 4℃for 10min at 16000g and the supernatant discarded. Air-dried, 20. Mu.L RNase-FREE WATER was added, and the precipitate was dissolved at 65℃for 5min. The concentration was measured by NanoDrop and run.
6. 3 'Dephosphorylation and 5' phosphorylation digested RNA 20ug, each was added water to 42.5. Mu.L at 90℃for 2min. Cooling on ice for 5min. Add 5. Mu.L of 10 XT 4 PNK buffer, 0.5. Mu. L RNaI, 2. Mu. L T4 PNK (50. Mu.L), 37℃for 6h. Mu. L T4 PNK, 1.25. Mu.L (100 mM) ATP,37℃for 1h were added. Add 47.75 μl ddH2O and adjust the volume to 100 μl. After centrifugation of 2mL Phase-Lock tube 16000g for 30s, 100. Mu.L phenol/chloroform/isoamyl alcohol (25:24:1) and 100. Mu.L digested RNA were added and centrifuged at 16000g for 12min at 15s with shaking. The supernatant was placed in a new 1.5mL centrifuge tube, added with isopropyl alcohol in the same volume as the supernatant, and reacted for 1h or overnight at-20℃in a total volume of 1/10 NaoAC. Centrifuge at 4 ℃,16000g for 30min, discard supernatant. The precipitate was washed with 350. Mu.L of 75% ethanol, centrifuged at 4℃for 10min at 16000g and the supernatant discarded. Air-dried, 21. Mu.L RNase-FREE WATER was added, dissolved and precipitated at 65℃for 5min, and the concentration was measured by NanoDrop.
7. RNA monophosphorylation 20. Mu.L RNA, 90℃for 1min and ice-cooling for 5min. mu.L of RNA 5'Polphosphatase 10 Xreaction buffer, 0.5. Mu.L of Inhibitor, 1. Mu.L of RNA 5' Polphosphatase (20 Units) and RNase-FREE WATER to 20. Mu.L were added at 37℃for 60min. Add 80. Mu.L ddH2O and adjust the volume to 100. Mu.L. After centrifugation of 2mL Phase-Lock tube 16000g for 30s, 100. Mu.L phenol/chloroform/isoamyl alcohol (25:24:1) and 100. Mu.L digested RNA were added and centrifuged at 16000g for 12min at 15s with shaking. The supernatant was placed in a new 1.5mL centrifuge tube, added with isopropyl alcohol in the same volume as the supernatant, and reacted for 1h or overnight at-20℃in a total volume of 1/10 NaoAC. Centrifuge at 4 ℃,16000g for 30min, discard supernatant, wash pellet with 350 μl of 75% ethanol, centrifuge at 4 ℃,16000g for 10min, discard supernatant. Air-dried, 21. Mu.L RNase-FREE WATER was added, dissolved and precipitated at 65℃for 5min, and the concentration was measured by NanoDrop.
8. Preparation of cDNA library :16.5μL RNase-free water;5μL Poly(A)Polymerase 10×Reaction buffer;5μL 10mM ATP;1.5μL RiboGuard RNase Inhibitor;20μL RNA Substrate.2μL Poly(A)Polymerase(4Units).50μL total volumes. 20min at 37 ℃.50 μl of dH2O was added and the volume was adjusted to 100 μl. After centrifugation of 2mL Phase-Lock tube16000g for 30s, 100. Mu.L phenol/chloroform/isoamyl alcohol (25:24:1) and 100. Mu.L digested RNA were added and centrifuged at 16000g for 12min at 15s with shaking. The supernatant was placed in a new 1.5mL centrifuge tube, added with isopropyl alcohol in the same volume as the supernatant, and reacted for 1h or overnight at-20℃in a total volume of 1/10 NaoAC. Centrifuging at 4 ℃ for 30min at 16000g, discarding supernatant, airing, adding 11 mu L RNase-FREE WATER at 65 ℃ for 5min to dissolve precipitate, and measuring the concentration by NanoDrop.
9. The cDNA library was sequenced by adding a sequencing adapter to Beijing Bei Ruige kang.
10. The raw data were mass filtered to remove sequences with base average homogeneity values below 30. After removal of the linker from the sequence, 25nt to 50nt of the RNA sequence was retained and aligned to the reference sequence of the CRISPR array using bowtie.
11. By comparison we found that the pre-crRNA of Cas12j.23 could be successfully processed in E.coli into mature crRNA consisting of Repeat and targeting sequences.
12. The mature crrnas were subjected to structural prediction and visual analysis using VIENNARNA and VARNA, respectively. We have found that the 3' end of the Repeat sequence of crRNA of Cas12j.23 can form a neck ring.
Example 3 PAM domain identification of the cas12j.23 Gene
1. Recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j.23 was constructed and sequenced. Based on the sequencing results, the recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j.23 is described by replacing a small fragment between the recognition sequences of the restriction enzymes Pml I and Kpn I of the vector pACYC-Duet-1 with a double-stranded DNA molecule shown in SEQ ID NO. 2 at positions 1 to 2436 from the 5' -end. The recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j.23 expresses the Cas12j.23 protein shown in SEQ ID NO. 1, the prototype orthostatic repeat sequence of Cas12j.23 shown in SEQ ID NO. 3 and the guide sequence identified by the PAM domain shown in SEQ ID NO. 11.
2. The recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j.23 contains an expression cassette, and the nucleotide sequence of the expression cassette is shown as SEQ ID NO. 9. In the sequence shown in SEQ ID No. 9, the nucleotide sequence of the pLacZ promoter at positions 1 to 44 from the 5' end, the nucleotide sequence of the Cas12j.23 gene at positions 45 to 3248, and the nucleotide sequence of the terminator at positions 3249 to 3335 (for termination of transcription). The nucleotide sequence of the J23119 promoter from the 5' end at positions 3336 to 3370, the nucleotide sequence of the CRISPR array at positions 3371 to 3436 and the nucleotide sequence of the rrnB-T1 terminator at positions 3437 to 3463 (for terminating transcription).
3. Recombinant E.coli was obtained by introducing the recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j.23 into E.coli EC100, designated EC100/pACYC-Duet-1+CRISPR/Cas12j.23. The recombinant plasmid pACYC-Duet-1 was introduced into E.coli EC100 to obtain a recombinant E.coli designated EC100/pACYC-Duet-1.
Construction of a PAM library the sequence shown in SEQ ID No. 10 was synthesized artificially and ligated to pUC19 vector, wherein the sequence shown in SEQ ID No. 10 comprises eight random bases at the 5' end and the target sequence. A plasmid library was constructed by designing 8 random bases in front of the 5' end of the target sequence of the PAM library. The plasmids were transferred into E.coli harboring the CRISPR/Cas12j.23 locus and E.coli harboring no CRISPR/Cas12j.23 locus, respectively. After 1 hour of treatment at 37 ℃, we extracted the plasmid and PCR amplified and sequenced PAM region sequences.
PAM library domains were obtained by counting the number of occurrences of the combined PAM sequences in the experimental and control groups, respectively, and normalizing the number of all PAM sequences in each group. For any PAM sequence, when log2 (control/experimental normalization) is greater than 3.5, we consider this PAM to be significantly consumed, and we have derived significantly consumed PAM sequences from all PAM sequences. And, the PAM domain of cas12j.23 was finally obtained by predicting the PAM sequence that was significantly consumed with weblog.
Verification of PAM library domains by PAM library consumption experiments we obtained the PAM domain of cas12j.23, and to verify the stringency of this domain we set up 10 groups of PAMs for in vivo experiments, sequencing the editing activity of cas12j.23 on these PAMs. First, we integrated 30nt of the target and PAM sequences into the plasmid at a non-conserved position of the anti-kana gene and then incubated with the CRSPR/cas12j.23 and guide RNA complex for 8 hours. By plating and counting the number of colonies we were able to determine the depleting activity of cas12j.23 on different PAM sequences. From experimental results, we can see that the CRISPR/cas12j.23 system can only edit target sequences with specific PAM domains effectively, while the rest target sequences have no editing activity, thus verifying the accuracy of PAM domain recognition of cas12j.23.
EXAMPLE 4 identification of the DNA cleavage means of the CRISPR/Cas12j.23 System
1. In vitro expression and purification of cas12j.23 protein
The in vitro expression and purification steps of the Cas12j.23 protein are specifically as follows:
1. The nucleotide sequence shown in SEQ ID NO.2 is synthesized artificially.
2. And (3) connecting the double-stranded DNA molecule synthesized in the step (1) with a prokaryotic expression vector pET-30a (+) to obtain a recombinant plasmid pET-30a-CRISPR/Cas12j.23. The recombinant plasmid pET-30a-CRISPR/Cas12j.23 was sequenced. Sequencing results show that the recombinant plasmid pET-30a-CRISPR/Cas12j.23 expresses Cas12j.23 protein with nuclear localization signal as shown in SEQ ID NO. 8.
3. The recombinant plasmid pET-30a-CRISPR/Cas12j.23 is introduced into escherichia coli EC100 to obtain recombinant bacteria, and the recombinant bacteria are named EC100-CRISPR/Cas12j.23. The monoclonal of EC100-CRISPR/Cas12j.23 is picked and inoculated to 100mL of LB liquid medium (containing 50 mug/mL of ampicillin) for 12h of shaking culture at 37 ℃ and 200rpm, and a culture bacterial liquid is obtained.
4. Inoculating the culture bacterial liquid into 50mL of LB liquid culture medium (containing 50 mug/mL of ampicillin) according to the volume ratio of 1:100, carrying out shaking culture at 37 ℃ and 200rpm until the OD 600nm value is 0.6, then adding IPTG, carrying out shaking culture at 28 ℃ and 220rpm for 4h, centrifuging at 10000rpm for 10min, and collecting bacterial precipitate.
5. The bacterial pellet was taken, 100mL of Tris-HCl buffer, pH 8.0, 100mM was added, and after resuspension, sonicated (ultrasonic power 600W, cycling procedure: disruption for 4s, stop for 6s, total 20 min), then centrifuged at 4℃and 10000rpm for 10min, and the supernatant A was collected.
6. Supernatant A was collected and centrifuged at 12000rpm at 4℃for 10min, and supernatant B was collected.
7. The supernatant B was purified using a nickel column from GE company (see description of the nickel column for specific steps of purification), and then the Cas12j.23 protein was quantified using a protein quantification kit from Simer Feier company.
2. Transcription and purification of cas12j.23 protein-directed RNA:
1. Templates directing RNA transcription are respectively designed, and the structure of the transcription templates is (1) the T7 promoter+Cas12j.23 mature homodromous repeat sequence (SEQ ID NO: 5) +the guide sequence (SEQ ID NO: 6). The primers were designed using Primer5.0 software to ensure that Forward primer and REWARD PRIMER have overlapping sequences of at least 18 bp.
2. The following reaction system is prepared, gently stirred and evenly mixed, centrifuged briefly and put into a PCR instrument for slow annealing, and the PCR system is as follows:
3. Purification of the template was performed using MinElute PCR Purifcation Kit as follows:
1) PB was added to the PCR product in 5 volumes, a MinElute column was placed on a 2ml collection tube, and left standing at room temperature for 2min,12000g/2min;
2) Discard waste solution, add 750. Mu.l Buffer PE (with previous remembering to add ethanol), 12000g/2min;
3) Discarding the waste liquid, adding 350 μl Buffer PE (polyethylene), 12000g/2min, discarding the waste liquid, 12000g, and performing air separation for 2min;
4) The MinElute column is replaced by a new 1.5ml centrifuge tube, the cover is opened, and the mixture is kept stand for 2min at 65 ℃;
5) Adding 20 μl of preheated EB solution, standing for 2min, 12000g/2min, and passing the centrifuge tube content through MinElute column for 2-3 times to improve recovery rate;
6) The concentration was determined by Nanodrop and stored frozen at-20℃for further use.
4. Purification of guide RNA, namely extracting and removing DNAseI in a system by using phenol and chloroform and isoamyl alcohol (25:24:1);
1) Adding 80 mu L RNA FREE H 2 O into the transcribed reaction system, and adjusting the volume to 100 mu l;
2) Taking out 2ml of Phase Lock Gel (PLG) heat, centrifuging for 2min, adding 100 μl of phenol/chloroform/isoamyl alcohol (25:24:1) and 100 μl of DNAseI digested RNA, lightly flicking the Phase-Lock tube 5-10 times by hand to mix them uniformly, and centrifuging for 12min at 15 ℃ per 16000 g;
3) Taking a new 1.5ml centrifuge tube of RNA-free, sucking the supernatant of the centrifugation in the previous step into the centrifuge tube, taking care not to suck the supernatant into gel, adding isopropanol with the same volume as the supernatant and sodium acetate solution with one tenth volume, sucking and beating the mixture uniformly by a gun head, and then placing the mixture into a-20 ℃ refrigerator for 1h or standing overnight;
4) Centrifuging at 4deg.C/16000 g for 30min, removing supernatant, adding 75% pre-cooled ethanol, mixing the precipitate, centrifuging at 4deg.C/16000 g for 12min, removing supernatant, standing in a fume hood for 2-3min, air drying ethanol on RNA surface, adding 100 μl of RNA free H 2 O, and mixing.
5. The concentration of the purified crRNA was measured by Nanodrop, and diluted uniformly to 250 ng/. Mu.l, and the crRNA was dispensed into 200. Mu.l PCR centrifuge tubes and frozen at-80℃for further use.
6. Establishment of a double-stranded DNA enzyme digestion system:
(1) The following reaction system was prepared, and after gentle stirring, the mixture was centrifuged briefly. The reaction system was set at 37℃for 15min and the DNA cleavage reaction system was as follows:
(2) 300ng of substrate DNA (100 ng/. Mu.l), 3. Mu.l, were added, gently swirled and mixed, and centrifuged briefly. Placing at 37 ℃ and 8 hours;
(3) Adding RNAse, standing at 37deg.C for 15min, and fully digesting RNA impurities in the system;
(4) Adding proteinase K, standing at 58 deg.C for 15min, and digesting Cas12j.23 protein;
(5) Agarose gel running detection.
The running gel showed that cas12j.23 was able to cleave double stranded DNA efficiently.
EXAMPLE 5 cleavage of Cas12j.23 in human cell lines
Eukaryotic expression vector containing Cas12j.23 gene and PCR product containing U6 promoter and guide RNA (containing prototype orthotropic sequence shown in SEQ ID NO:3 and eukaryotic edited guide sequence shown in SEQ ID NO: 5) are transferred into human HEK293T cell by liposome transfection method, and cultured at 37 deg.C under 5% carbon dioxide concentration for 72h. Extracting DNA of all cells, amplifying a sequence containing 700bp of a target site, connecting a PCR product with a B-simple carrier for first generation sequencing, comparing a sequencing result to a VEGFA gene of a human genome by a Simer femto company, identifying a cutting mode of the Cas12j.23 to the target site, identifying editing efficiency of the target site to the VEGFA, constructing a second generation sequencing library of the PCR product through Tn5, finishing sequencing by Beijing An Nuo high-priority gene technology Co, and identifying editing efficiency of the Cas12j.23 to the VEGFA. At the same time, cleavage of DNMT1 gene by cas12j.23 was also identified.
While specific embodiments of the invention have been described in detail, it will be appreciated by those skilled in the art that various modifications and alternatives to those details could be developed in light of the overall teachings of the disclosure and that such modifications would be within the scope of the invention. The full scope of the invention is given by the appended claims together with any equivalents thereof.

Claims (71)

1. An effector protein in a CRISPR/Cas system has an amino acid sequence shown in SEQ ID NO. 1.
2. A conjugate comprising the protein of claim 1 and a modifying moiety, wherein the modifying moiety is selected from the group consisting of an additional protein or polypeptide, a detectable label, and any combination thereof, wherein the additional protein or polypeptide is selected from the group consisting of an epitope tag, a reporter gene sequence, a nuclear localization signal sequence, a transcriptional activation domain, a transcriptional repression domain, a nuclease domain, and any combination thereof.
3. The conjugate of claim 2, wherein the modifying moiety is attached to the N-terminus or the C-terminus of the protein by a linker.
4. The conjugate of claim 2, which is characterized by one or more of (i) the transcriptional activation domain is VP64, (ii) the transcriptional repression domain is KRAB domain or SID domain, and (iii) the nuclease domain is Fok1.
5. The conjugate of claim 2, wherein the conjugate comprises an epitope tag.
6. The conjugate of claim 2, wherein the conjugate comprises an NLS sequence.
7. The conjugate of claim 6, wherein the NLS sequence is set forth in SEQ ID No. 7.
8. The conjugate of claim 7, wherein the NLS sequence is located at the N-terminus or C-terminus of the protein.
9. A fusion protein comprising the protein of claim 1 and an additional protein or polypeptide, wherein the additional protein or polypeptide is selected from the group consisting of an epitope tag, a reporter gene sequence, a nuclear localization signal sequence, a transcriptional activation domain, a transcriptional repression domain, a nuclease domain, and any combination thereof.
10. The fusion protein of claim 9, wherein the additional protein or polypeptide is linked to the N-terminus or C-terminus of the protein by a linker.
11. The fusion protein of claim 9, which is characterized by one or more of (i) the transcriptional activation domain is VP64, (ii) the transcriptional repression domain is KRAB domain or SID domain, and (iii) the nuclease domain is Fok1.
12. The fusion protein of claim 9, wherein the fusion protein comprises an epitope tag.
13. The fusion protein of claim 9, wherein the fusion protein comprises an NLS sequence.
14. The fusion protein of claim 13, wherein the NLS sequence is set forth in SEQ ID NO. 7.
15. The fusion protein of claim 14, wherein the NLS sequence is located at the N-terminus or C-terminus of the protein.
16. The fusion protein of claim 9, wherein the fusion protein has an amino acid sequence as set forth in SEQ ID NO. 8.
17. An isolated nucleic acid molecule having the sequence shown in SEQ ID NO. 4.
18. A complex, comprising:
(i) A protein component selected from the group consisting of a protein according to claim 1, a conjugate according to any one of claims 2 to 8 or a fusion protein according to any one of claims 9 to 16, and
(Ii) A nucleic acid component comprising, in the 5 'to 3' direction, the isolated nucleic acid molecule of claim 17 and a targeting sequence capable of hybridizing to a target sequence,
Wherein the protein component and the nucleic acid component are bound to each other to form a complex.
19. The complex of claim 18, wherein the targeting sequence is attached to the 3' end of the nucleic acid molecule.
20. The complex of claim 18, wherein the targeting sequence comprises a complement of the target sequence.
21. The complex of claim 18, wherein the nucleic acid component is a guide RNA in a CRISPR/Cas system.
22. The complex of claim 18, wherein the nucleic acid molecule is RNA.
23. The complex of claim 18, wherein the complex does not comprise trans-acting crRNA.
24. An isolated nucleic acid molecule having the sequence:
(i) A nucleotide sequence encoding the protein of claim 1, or the fusion protein of any one of claims 9-16;
(ii) The nucleotide sequence of an isolated nucleic acid molecule according to claim 17 and/or,
(Iii) Consists of the nucleotide sequences of (i) and (ii).
25. The isolated nucleic acid molecule of claim 24, wherein the nucleotide sequence of any one of (i) - (iii) is codon optimized for expression in a prokaryotic cell or a eukaryotic cell.
26. A vector comprising the isolated nucleic acid molecule of claim 24 or 25.
27. A host cell comprising the isolated nucleic acid molecule of claim 24 or 25 or the vector of claim 26.
28. A composition comprising:
(i) A first component selected from the group consisting of a protein according to claim 1, a conjugate according to any one of claims 2 to 8, a fusion protein according to any one of claims 9 to 16, a nucleotide sequence encoding said protein or fusion protein, and
(Ii) A second component that is a nucleotide sequence comprising a guide RNA;
Wherein the guide RNA comprises a direct repeat sequence and a guide sequence in the 5 'to 3' direction, the guide sequence being capable of hybridizing to a target sequence;
The guide RNA is capable of forming a complex with the protein, conjugate or fusion protein described in (i).
29. The composition of claim 28, wherein the orthostatic sequence is an isolated nucleic acid molecule as defined in claim 17.
30. The composition of claim 28, wherein the targeting sequence is linked to the 3' end of the homeotropic sequence.
31. The composition of claim 28, wherein the targeting sequence comprises a complement of the target sequence.
32. The composition of claim 28, wherein the composition does not comprise trans-acting crRNA.
33. A composition comprising one or more carriers, the one or more carriers comprising:
(i) A first nucleic acid which is a nucleotide sequence encoding the protein of claim 1 or the fusion protein of any one of claims 9-16, wherein the first nucleic acid is operably linked to a first regulatory element, and
(Ii) A second nucleic acid that is a nucleotide sequence encoding a guide RNA, wherein the second nucleic acid is operably linked to a second regulatory element;
Wherein:
The first nucleic acid and the second nucleic acid are present on the same or different vectors;
The guide RNA comprises a direct repeat sequence and a targeting sequence in the 5 'to 3' direction, the targeting sequence being capable of hybridizing to a target sequence;
the guide RNA is capable of forming a complex with the protein or fusion protein described in (i).
34. The composition of claim 33, wherein the orthostatic sequence is an isolated nucleic acid molecule as defined in claim 17.
35. The composition of claim 33, wherein the targeting sequence is linked to the 3' end of the homeotropic sequence.
36. The composition of claim 33, wherein the targeting sequence comprises a complement of the target sequence.
37. The composition of claim 33, wherein the composition does not comprise trans-acting crRNA.
38. The composition of claim 33, wherein the first regulatory element and/or the second first regulatory element is a promoter.
39. The composition of claim 38, wherein the promoter is an inducible promoter.
40. The composition of any one of claims 28-38, wherein when the target sequence is DNA, the target sequence is located 3' to the adjacent motif of the protospacer sequence.
41. The composition of claim 40, wherein the target sequence is a DNA sequence from a prokaryotic cell or a eukaryotic cell, or the target sequence is a non-naturally occurring DNA sequence.
42. The composition of any one of claims 28-39,41, wherein the protein has one or more NLS sequences attached, or the conjugate or fusion protein comprises one or more NLS sequences.
43. A kit comprising one or more components selected from the group consisting of the protein of claim 1, the conjugate of any one of claims 2-8, the fusion protein of any one of claims 9-16, the isolated nucleic acid molecule of claim 17, the complex of any one of claims 18-23, the isolated nucleic acid molecule of claim 24 or 25, the vector of claim 26, the host cell of claim 27, and the composition of any one of claims 28-42.
44. A delivery composition comprising a delivery vehicle and one or more selected from the group consisting of the protein of claim 1, the conjugate of any one of claims 2-8, the fusion protein of any one of claims 9-16, the isolated nucleic acid molecule of claim 17, the complex of any one of claims 18-23, the isolated nucleic acid molecule of claim 24 or 25, the vector of claim 26, the host cell of claim 27, and the composition of any one of claims 28-42.
45. The composition of claim 44, wherein the delivery vehicle is a particle.
46. The composition of claim 44, wherein the delivery vehicle is selected from the group consisting of a lipid particle, a metal particle, a protein particle, an exosome, a gene gun, and a viral vector.
47. A method for modifying a target gene for non-disease diagnosis and treatment comprising contacting the complex of any one of claims 18 to 23 or the composition of any one of claims 28 to 42 with the target gene or delivering into a cell comprising the target gene, wherein the target sequence is present in the target gene.
48. The method of claim 47, wherein the target gene is present in a cell.
49. The method of claim 47, wherein the cell is a prokaryotic cell.
50. The method of claim 47, wherein the cell is a eukaryotic cell.
51. The method of claim 47, wherein the cells are selected from the group consisting of animal cells and plant cells.
52. The method of claim 47, wherein the target gene is present in an in vitro nucleic acid molecule.
53. The method of claim 47, which results in a double strand break in DNA.
54. A method of altering expression of a gene product for non-disease diagnostic and therapeutic purposes comprising contacting the complex of any one of claims 18-23 or the composition of any one of claims 28-42 with a nucleic acid molecule encoding the gene product, or delivering into a cell comprising the nucleic acid molecule, the target sequence being present in the nucleic acid molecule.
55. The method of claim 54, wherein the nucleic acid molecule is present in a cell.
56. The method of claim 54, wherein the cell is a prokaryotic cell.
57. The method of claim 54, wherein the cell is a eukaryotic cell.
58. The method of claim 54, wherein the cells are selected from the group consisting of animal cells and plant cells.
59. The method of claim 54, wherein the nucleic acid molecule is in an in vitro nucleic acid molecule.
60. The method of claim 54, wherein expression of the gene product is altered.
61. The method of claim 54, wherein the gene product is a protein.
62. The method of any one of claims 47-61, wherein the protein, conjugate, fusion protein, complex, carrier or composition is contained in a delivery vehicle.
63. The method of claim 62, wherein the delivery vehicle is selected from the group consisting of lipid particles, metal particles, protein particles, exosomes, viral vectors.
64. The method of any one of claims 47-61, for modifying a cell, cell line or organism by altering one or more target sequences in a target gene or nucleic acid molecule encoding a target gene product.
65. An isolated cell or cell line or progeny thereof comprising the protein of claim 1, the conjugate of any one of claims 2-8, the fusion protein of any one of claims 9-16, the isolated nucleic acid molecule of claim 17, the complex of any one of claims 18-23, the isolated nucleic acid molecule of claim 24 or 25, the vector of claim 30, or the composition of any one of claims 28-42.
66. The cell or cell line of claim 65, or progeny thereof, which is a eukaryotic cell.
67. The cell or cell line of claim 65, or progeny thereof, which is an animal cell or plant cell.
68. The cell or cell line of claim 65, or progeny thereof, which is a stem cell or stem cell line.
69. The use of a protein according to claim 1, a conjugate according to any one of claims 2 to 8, a fusion protein according to any one of claims 9 to 16, an isolated nucleic acid molecule according to claim 17, a complex according to any one of claims 18 to 23, an isolated nucleic acid molecule according to claim 24 or 25, a vector according to claim 26, a composition according to any one of claims 28 to 42 or a kit according to claim 43 for the preparation of a reagent for nucleic acid editing, wherein the nucleic acid editing is gene editing.
70. The use of claim 69, wherein the gene editing comprises modifying a gene, knocking out a gene, altering expression of a gene product, repairing a mutation, and/or inserting a polynucleotide.
71. Use of the protein of claim 1, the conjugate of any one of claims 2-8, the fusion protein of any one of claims 9-16, the isolated nucleic acid molecule of claim 17, the complex of any one of claims 18-23, the isolated nucleic acid molecule of claim 24 or 25, the vector of claim 26, the composition of any one of claims 28-42, or the kit of claim 43 in the preparation of a formulation for:
(i) Ex vivo gene or genome editing;
(ii) Detecting in-vitro single-stranded DNA;
(iii) Editing target sequences in a target locus to modify an organism, and/or
(Iv) Treating a condition caused by a defect in a target sequence in a target locus.
CN202010605399.1A 2020-06-29 2020-06-29 Novel CRISPR-Cas12j.23 enzyme and system Active CN113930413B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010605399.1A CN113930413B (en) 2020-06-29 2020-06-29 Novel CRISPR-Cas12j.23 enzyme and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010605399.1A CN113930413B (en) 2020-06-29 2020-06-29 Novel CRISPR-Cas12j.23 enzyme and system

Publications (2)

Publication Number Publication Date
CN113930413A CN113930413A (en) 2022-01-14
CN113930413B true CN113930413B (en) 2024-12-27

Family

ID=79272945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010605399.1A Active CN113930413B (en) 2020-06-29 2020-06-29 Novel CRISPR-Cas12j.23 enzyme and system

Country Status (1)

Country Link
CN (1) CN113930413B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW202449148A (en) * 2023-06-09 2024-12-16 大陸商益杰立科(上海)生物科技有限公司 Cas enzyme, and system and applications thereof
CN119162150A (en) * 2023-08-29 2024-12-20 山东舜丰生物科技有限公司 Cas12j protein with improved editing activity and its application
CN119193540A (en) * 2023-09-04 2024-12-27 中国农业大学 Novel CRISPR-Casδ enzymes and systems

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020098772A1 (en) * 2018-11-15 2020-05-22 中国农业大学 Crispr-cas12j enzyme and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2017342543B2 (en) * 2016-10-14 2024-06-27 President And Fellows Of Harvard College AAV delivery of nucleobase editors
AU2018315731B2 (en) * 2017-08-09 2024-10-03 Ricetec, Inc. Compositions and methods for modifying genomes

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020098772A1 (en) * 2018-11-15 2020-05-22 中国农业大学 Crispr-cas12j enzyme and system

Also Published As

Publication number Publication date
CN113930413A (en) 2022-01-14

Similar Documents

Publication Publication Date Title
CN113136375B (en) Novel CRISPR/Cas12f enzymes and systems
JP7460178B2 (en) CRISPR-Cas12j enzyme and system
WO2019201331A1 (en) Crispr/cas effector protein and system
CN113015798B (en) CRISPR-Cas12a enzymes and systems
WO2019214604A1 (en) Crispr/cas effector protein and system
CN113881652B (en) Novel Cas enzymes and systems and applications
CN112020560B (en) RNA-edited CRISPR/Cas effect protein and system
CN114517190B (en) CRISPR enzymes and systems and uses
CN113930413B (en) Novel CRISPR-Cas12j.23 enzyme and system
CN113930411A (en) Novel CRISPR-Cas12M enzymes and systems
CN113930410A (en) Novel CRISPR-Cas12L enzymes and systems
WO2020087631A1 (en) System and method for genome editing based on c2c1 nucleases
CN117050971B (en) Cas mutant proteins and their applications
CN113930412A (en) Novel CRISPR-Cas12N enzymes and systems
US20250179534A1 (en) Novel CRISPR-Cas sigma enzyme and system
CN116355877A (en) Cas13 protein, CRISPR-Cas system and application thereof
CN119193540A (en) Novel CRISPR-Casδ enzymes and systems
CN119193539A (en) Novel Cas enzymes and their applications
WO2024175015A1 (en) Crispr/cas effector protein and system
CN118773168B (en) CRISPR/Cas effect protein HT001, system and application
HK40033058B (en) Novel crispr/cas12f enzyme and system
HK40033058A (en) Novel crispr/cas12f enzyme and system
WO2021098709A1 (en) Gene editing system derived from flavobacteria

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant