WO2023169454A1

WO2023169454A1 - Adenine deaminase and use thereof in base editing

Info

Publication number: WO2023169454A1
Application number: PCT/CN2023/080251
Authority: WO
Inventors: 高彩霞; 林秋鹏; 黄佳颖; K·T·赵
Original assignee: Institute of Genetics and Developmental Biology of CAS
Current assignee: Institute of Genetics and Developmental Biology of CAS
Priority date: 2022-03-08
Filing date: 2023-03-08
Publication date: 2023-09-14
Anticipated expiration: 2024-09-08
Also published as: CN117187220A

Abstract

An adenine deaminase and the use thereof in base editing. More specifically: a base editing system based on newly identified adenine deaminase; a method of using said system to carry out base editing of a target sequence in the genome of an organism (such as a plant); a genetically modified organism (such as a plant) produced by said method; and progeny of said organism.

Description

Adenine deaminase and its use in base editing

Technical field

本发明涉及基因工程领域。具体而言，本发明涉及腺嘌呤脱氨酶及其在碱基编辑中的用途。更具体而言，本发明涉及一种基于新鉴定的腺嘌呤脱氨酶的碱基编辑系统，使用该碱基编辑系统对生物体(例如植物)基因组中的靶序列进行碱基编辑的方法，以及通过所述方法产生的经遗传修饰的生物体(例如植物)及其后代。The present invention relates to the field of genetic engineering. In particular, the present invention relates to adenine deaminase and its use in base editing. More specifically, the present invention relates to a base editing system based on a newly identified adenine deaminase, and a method for base editing a target sequence in the genome of an organism (such as a plant) using the base editing system, As well as genetically modified organisms (eg plants) produced by such methods and their progeny.

发明背景Background of the invention

对生物体的基因组进行特定序列的修改，可以赋予生物体新的可稳定遗传的性状。其中，特定位点的单核苷酸的变异，有可能导致基因的氨基酸序列发生改变或提前终止，或者导致调控序列的改变，从而导致优良性状的产生。基因组编辑技术，例如CRISPR/Cas9系统，可以实现对目标序列的靶向功能。利用基因组编辑系统与目标序列结合的特性，将其与脱氨酶进行结合所开发出的碱基编辑系统，可以实现精准地对基因组上的靶位点进行脱氨的功能。目前，最常用的量大碱基编辑系统包括胞嘧啶碱基编辑系统和腺嘌呤碱基编辑系统。其中，通过融合大肠杆菌TadA(tRNA-specific adenosine deaminase)的变体，可以实现目标位点腺嘌呤(A)向次黄嘌呤(I)的转变。DNA上的I可以被细胞识别为鸟嘌呤(G)，并且在复制过程中I会被G所替换。因此，目标位点的A可以最终实现向G进行转变。此外，通过在对侧未发生脱氨的单链引入缺刻使其断裂，可以显著提升碱基编辑的效率。由于自然界中不存在可以直接对DNA的腺嘌呤(A)进行脱氨的腺嘌呤脱氨酶，目前可用于DNA水平上的腺嘌呤脱氨酶的系统只有由David R.Liu团队经过进化得到的一系列来源于大肠杆菌的ecTadA变体。因此，寻找新型的腺嘌呤脱氨酶，对于扩充现有的腺嘌呤碱基编辑系统、提升精确操纵目标DNA序列的能力具有非常重要的意义。Sequence-specific modifications to an organism's genome can give the organism new, stably heritable traits. Among them, single nucleotide variations at specific sites may lead to changes in the amino acid sequence of the gene or early termination, or may lead to changes in the regulatory sequence, leading to the production of excellent traits. Genome editing technologies, such as the CRISPR/Cas9 system, can achieve targeting functions to target sequences. The base editing system developed by combining the characteristics of the genome editing system with the target sequence and combining it with deaminase can achieve the function of accurately deaminating target sites on the genome. Currently, the most commonly used large-volume base editing systems include cytosine base editing systems and adenine base editing systems. Among them, by fusing a variant of E. coli TadA (tRNA-specific adenosine deaminase), the conversion of adenine (A) to hypoxanthine (I) at the target site can be achieved. The I on DNA can be recognized by cells as guanine (G), and the I is replaced by G during replication. Therefore, A at the target site can eventually transform into G. In addition, the efficiency of base editing can be significantly improved by introducing a nick into the single strand that has not undergone deamination on the opposite side to break it. Since there is no adenine deaminase in nature that can directly deaminate adenine (A) in DNA, the only system that can be used for adenine deaminase at the DNA level has been evolved by David R. Liu’s team. A series of ecTadA variants derived from E. coli. Therefore, the search for new adenine deaminase is of great significance for expanding the existing adenine base editing system and improving the ability to precisely manipulate target DNA sequences.

附图简述Brief description of the drawings

图1：No.135潜在腺嘌呤脱氨酶与大肠杆菌ecTadA的序列相似度。Figure 1: Sequence similarity between No.135 potential adenine deaminase and E. coli ecTadA.

图2：对NO.135潜在脱氨酶关键位点进行改造后可在报告系统中实现腺嘌呤碱基编辑。Figure 2: Adenine base editing can be achieved in the reporter system after modification of the key site of potential deaminase NO.135.

图3：随机挑选的具有VnxN10xHAEnxPCxMC特征序列并且分别被注释为鸟嘌呤脱氨酶、赖氨酸tRNA合成酶、HAD水解酶和尚未被注释蛋白功能的蛋白与TadA的结构相似度。其中浅色为大肠杆菌TadA，深色为候选蛋白。Figure 3: Structural similarity between randomly selected proteins with VnxN10xHAEnxPCxMC characteristic sequences and annotated as guanine deaminase, lysine tRNA synthetase, HAD hydrolase and unannotated protein functions and TadA respectively. The light color is E. coli TadA, and the dark color is the candidate protein.

图4：No.1299和No.1417鸟嘌呤脱氨酶与大肠杆菌ecTadA的序列相似度。Figure 4: Sequence similarity between No.1299 and No.1417 guanine deaminase and E. coli ecTadA.

图5：对NO.1299潜在脱氨酶关键位点进行改造后可在报告系统中实现腺嘌呤碱基编辑。Figure 5: After modifying the key site of the potential deaminase of NO.1299, the adenine base can be realized in the reporter system edit.

图6：对NO.1417潜在脱氨酶关键位点进行改造后可在报告系统中实现腺嘌呤碱基编辑。Figure 6: Adenine base editing can be achieved in the reporter system after modification of the key site of the potential deaminase of NO.1417.

发明详述Detailed description of the invention

一、定义1. Definition

在本发明中，除非另有说明，否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且，本文中所用的蛋白质和核酸化学、分子生物学、细胞和组织培养、微生物学、免疫学相关术语和实验室操作步骤均为相应领域内广泛使用的术语和常规步骤。同时，为了更好地理解本发明，下面提供相关术语的定义和解释。In the present invention, unless otherwise stated, scientific and technical terms used herein have the meanings commonly understood by those skilled in the art. Furthermore, the terms and laboratory procedures related to protein and nucleic acid chemistry, molecular biology, cell and tissue culture, microbiology, and immunology used in this article are terms and routine procedures widely used in the corresponding fields. Meanwhile, in order to better understand the present invention, definitions and explanations of relevant terms are provided below.

如本文所用，术语“和/或”涵盖由该术语连接的项目的所有组合，应视作各个组合已经单独地在本文列出。例如，“A和/或B”涵盖了“A”、“A和B”以及“B”。例如，“A、B和/或C”涵盖“A”、“B”、“C”、“A和B”、“A和C”、“B和C”以及“A和B和C”。As used herein, the term "and/or" encompasses all combinations of the items connected by this term, and each combination shall be deemed to have been individually set forth herein. For example, "A and/or B" encompasses "A", "A and B" and "B". For example, "A, B and/or C" encompasses "A", "B", "C", "A and B", "A and C", "B and C" and "A and B and C".

“基因组”如本文所用不仅涵盖存在于细胞核中的染色体DNA，而且还包括存在于细胞的亚细胞组分(如线粒体、质体)中的细胞器DNA。"Genome" as used herein encompasses not only chromosomal DNA present in the nucleus, but also organellar DNA present in subcellular components of the cell (eg, mitochondria, plastids).

如本文所用，“生物体”包括适于基因组编辑的任何生物体，优选真核生物。生物体的实例包括但不限于，哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫；家禽如鸡、鸭、鹅；植物包括单子叶植物和双子叶植物，例如水稻、玉米、小麦、高粱、大麦、大豆、花生、拟南芥等。As used herein, "organism" includes any organism suitable for genome editing, preferably eukaryotes. Examples of organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, and geese; plants including monocots and dicots, For example, rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis thaliana, etc.

“经遗传修饰的生物体”或“经遗传修饰的细胞”意指在其基因组内包含外源多核苷酸或经修饰的基因或表达调控序列的生物体或细胞。例如外源多核苷酸能够稳定地整合进生物体或细胞的基因组中，并遗传连续的世代。外源多核苷酸可单独地或作为重组DNA构建体的部分整合进基因组中。经修饰的基因或表达调控序列为在生物体或细胞基因组中所述序列包含单个或多个脱氧核苷酸取代、缺失和添加。"Genetically modified organism" or "genetically modified cell" means an organism or cell that contains exogenous polynucleotides or modified genes or expression regulatory sequences within its genome. For example, exogenous polynucleotides can be stably integrated into the genome of an organism or cell and inherited for successive generations. Exogenous polynucleotides can be integrated into the genome alone or as part of a recombinant DNA construct. A modified gene or expression control sequence is one in which the sequence contains single or multiple deoxynucleotide substitutions, deletions, and additions in the genome of an organism or cell.

针对序列而言的“外源”意指来自外来物种的序列，或者如果来自相同物种，则指通过蓄意的人为干预而从其天然形式发生了组成和/或基因座的显著改变的序列。"Foreign" with respect to a sequence means a sequence from an alien species or, if from the same species, a sequence that has undergone significant changes in composition and/or locus from its native form by deliberate human intervention.

“多核苷酸”、“核酸序列”、“核苷酸序列”或“核酸片段”可互换使用并且是单链或双链RNA或DNA聚合物，任选地可含有合成的、非天然的或改变的核苷酸碱基。核苷酸通过如下它们的单个字母名称来指代：“A”为腺苷或脱氧腺苷(分别对应RNA或DNA)，“C”表示胞苷或脱氧胞苷，“G”表示鸟苷或脱氧鸟苷，“U”表示尿苷，“T”表示脱氧胸苷，“R”表示嘌呤(A或G)，“Y”表示嘧啶(C或T)，“K”表示G或T，“H”表示A或C或T，“I”表示肌苷，并且“N”表示任何核苷酸。"Polynucleotide", "nucleic acid sequence", "nucleotide sequence" or "nucleic acid fragment" are used interchangeably and are single- or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural or altered nucleotide bases. Nucleotides are referred to by their single-letter names as follows: "A" for adenosine or deoxyadenosine (for RNA or DNA, respectively), "C" for cytidine or deoxycytidine, and "G" for guanosine or Deoxyguanosine, "U" represents uridine, "T" represents deoxythymidine, "R" represents purine (A or G), "Y" represents pyrimidine (C or T), "K" represents G or T, " H" represents A or C or T, "I" represents inosine, and "N" represents any nucleotide.

“多肽”、“肽”、和“蛋白质”在本发明中可互换使用，指氨基酸残基的聚合物。该术语适用于其中一个或多个氨基酸残基是相应的天然存在的氨基酸的人工化学类似物的氨基酸聚合物，以及适用于天然存在的氨基酸聚合物。术语“多肽”、“肽”、“氨基酸序列”和“蛋白质”还可包括修饰形式，包括但不限于糖基化、脂质连接、硫酸盐化、谷氨酸残基的γ羧化、羟化和ADP-核糖基化。"Polypeptide,""peptide," and "protein" are used interchangeably herein and refer to a polymer of amino acid residues. The term applies to amino acid polymers in which one or more amino acid residues are artificial chemical analogs of the corresponding naturally occurring amino acids, as well as to naturally occurring amino acid polymers. The terms "polypeptide", "peptide", "amino" "Acid sequence" and "protein" may also include modified forms including, but not limited to, glycosylation, lipid linkage, sulfation, gamma carboxylation of glutamic acid residues, hydroxylation, and ADP-ribosylation.

序列“相同性”具有本领域公认的含义，并且可以利用公开的技术计算两个核酸或多肽分子或区域之间序列相同性的百分比。可以沿着多核苷酸或多肽的全长或者沿着该分子的区域测量序列相同性。(参见，例如：Computational Molecular Biology,Lesk,A.M.,ed.,Oxford University Press,New York,1988；Biocomputing:Informatics and Genome Projects,Smith,D.W.,ed.,Academic Press,New York,1993；Computer Analysis of Sequence Data,Part I,Griffin,A.M.,and Griffin,H.G.,eds.,Humana Press,New Jersey,1994；Sequence Analysis in Molecular Biology,von Heinje,G.,Academic Press,1987；and Sequence Analysis Primer,Gribskov,M.and Devereux,J.,eds.,M Stockton Press,New York,1991)。虽然存在许多测量两个多核苷酸或多肽之间的相同性的方法，但是术语“相同性”是技术人员公知的(Carrillo,H.&Lipman,D.,SIAM J Applied Math 48:1073(1988))。Sequence "identity" has an art-recognized meaning, and the percentage of sequence identity between two nucleic acid or polypeptide molecules or regions can be calculated using published techniques. Sequence identity can be measured along the entire length of a polynucleotide or polypeptide or along a region of the molecule. (See, e.g., Computational Molecular Biology, Lesk, A.M., ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D.W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A.M., and Griffin, H.G., eds., Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991). Although there are many methods of measuring identity between two polynucleotides or polypeptides, the term "identity" is well known to those skilled in the art (Carrillo, H. & Lipman, D., SIAM J Applied Math 48:1073 (1988) ).

“包含”一词在本文中用于描述蛋白质或核酸的序列时，所述蛋白质或核酸可以是由所述序列组成，或者在所述蛋白质或核酸的一端或两端可以具有额外的氨基酸或核苷酸，但仍然具有本发明所述的活性。此外，本领域技术人员清楚多肽N端由起始密码子编码的甲硫氨酸在某些实际情况下(例如在特定表达系统表达时)会被保留，但不实质影响多肽的功能。因此，本申请说明书和权利要求书中在描述具体的多肽氨基酸序列时，尽管其可能不包含N端由起始密码子编码的甲硫氨酸，然而此时也涵盖包含该甲硫氨酸的序列，相应地，其编码核苷酸序列也可以包含起始密码子；反之亦然。When the word "comprising" is used herein to describe a sequence of a protein or nucleic acid, the protein or nucleic acid may consist of the sequence, or may have additional amino acids or nucleic acids at one or both ends of the protein or nucleic acid. glycosides, but still have the activity described in the present invention. In addition, those skilled in the art know that the methionine encoded by the start codon at the N-terminus of the polypeptide will be retained under certain practical circumstances (such as when expressed in a specific expression system), but will not substantially affect the function of the polypeptide. Therefore, when describing a specific polypeptide amino acid sequence in the description and claims of this application, although it may not contain the N-terminal methionine encoded by the start codon, it is also encompassed at this time that it contains the methionine. Sequence, correspondingly, its encoding nucleotide sequence may also contain an initiation codon; and vice versa.

在肽或蛋白中，合适的保守型氨基酸取代是本领域技术人员已知的，并且一般可以进行而不改变所得分子的生物活性。通常，本领域技术人员认识到多肽的非必需区中的单个氨基酸取代基本上不改变生物活性(参见，例如，Watson et al.,Molecular Biology of the Gene,4th Edition,1987,The Benjamin/Cummings Pub.co.,p.224)。In peptides or proteins, suitable conservative amino acid substitutions are known to those skilled in the art and can generally be made without altering the biological activity of the resulting molecule. Generally, those skilled in the art recognize that single amino acid substitutions in non-essential regions of polypeptides do not substantially alter biological activity (see, e.g., Watson et al., Molecular Biology of the Gene, 4th Edition, 1987, The Benjamin/Cummings Pub .co.,p.224).

如本发明所用，“表达构建体”是指适于感兴趣的核苷酸序列在生物体中表达的载体如重组载体。“表达”指功能产物的产生。例如，核苷酸序列的表达可指核苷酸序列的转录(如转录生成mRNA或功能RNA)和/或RNA翻译成前体或成熟蛋白质。As used herein, "expression construct" refers to a vector, such as a recombinant vector, suitable for expression of a nucleotide sequence of interest in an organism. "Expression" refers to the production of a functional product. For example, expression of a nucleotide sequence may refer to transcription of the nucleotide sequence (eg, transcription to produce mRNA or functional RNA) and/or translation of the RNA into a precursor or mature protein.

本发明的“表达构建体”可以是线性的核酸片段、环状质粒、病毒载体，或者，在一些实施方式中，可以是能够翻译的RNA(如mRNA)。The "expression construct" of the present invention can be a linear nucleic acid fragment, a circular plasmid, a viral vector, or, in some embodiments, an RNA capable of translation (such as mRNA).

本发明的“表达构建体”可包含不同来源的调控序列和感兴趣的核苷酸序列，或相同来源但以不同于通常天然存在的方式排列的调控序列和感兴趣的核苷酸序列。An "expression construct" of the present invention may comprise regulatory sequences and nucleotide sequences of interest from different sources, or control sequences and nucleotide sequences of interest from the same source but arranged in a manner different from that which normally occurs in nature.

“调控序列”和“调控元件”可互换使用，指位于编码序列的上游(5'非编码序列)、中间或下游(3'非编码序列)，并且影响相关编码序列的转录、RNA加工或稳定性或者翻译的核苷酸序列。调控序列可包括但不限于启动子、翻译前导序列、内含子和多腺苷酸化识别序列。"Regulatory sequence" and "regulatory element" are used interchangeably and refer to a coding sequence that is located upstream (5' non-coding sequence), intermediate or downstream (3' non-coding sequence) and affects the transcription, RNA processing or Stability or translated nucleotide sequence. Regulatory sequences may include, but are not limited to, promoters, translation leaders, introns, and polyadenylation recognition sequences.

“启动子”指能够控制另一核酸片段转录的核酸片段。在本发明的一些实施方案中，启动子是能够控制细胞中基因转录的启动子，无论其是否来源于所述细胞。启动子可以是组成型启动子或组织特异性启动子或发育调控启动子或诱导型启动子。"Promoter" refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment. In some embodiments of the invention, a promoter is a promoter capable of controlling the transcription of a gene in a cell, whether or not it is derived from said cell. The promoter can It is a constitutive promoter or a tissue-specific promoter or a developmentally regulated promoter or an inducible promoter.

“组成型启动子”指一般将引起基因在多数细胞类型中在多数情况下表达的启动子。“组织特异性启动子”和“组织优选启动子”可互换使用，并且指主要但非必须专一地在一种组织或器官中表达，而且也可在一种特定细胞或细胞型中表达的启动子。“发育调控启动子”指其活性由发育事件决定的启动子。“诱导型启动子”响应内源性或外源性刺激(环境、激素、化学信号等)而选择性表达可操纵连接的DNA序列。A "constitutive promoter" refers to a promoter that will generally cause expression of a gene in most cell types under most circumstances. "Tissue-specific promoter" and "tissue-preferred promoter" are used interchangeably and refer to expression primarily, but not necessarily exclusively, in one tissue or organ, but also in a specific cell or cell type promoter. "Developmentally regulated promoter" refers to a promoter whose activity is determined by developmental events. "Inducible promoters" selectively express operably linked DNA sequences in response to endogenous or exogenous stimuli (environment, hormones, chemical signals, etc.).

启动子的实例包括但不限于聚合酶(pol)I、pol II或pol III启动子。pol I启动子的实例包括鸡RNA pol I启动子。pol II启动子的实例包括但不限于巨细胞病毒立即早期(CMV)启动子、劳斯肉瘤病毒长末端重复(RSV-LTR)启动子和猿猴病毒40(SV40)立即早期启动子。pol III启动子的实例包括U6和H1启动子。可以使用诱导型启动子如金属硫蛋白启动子。启动子的其他实例包括T7噬菌体启动子、T3噬菌体启动子、β-半乳糖苷酶启动子和Sp6噬菌体启动子。当用于植物时，启动子可以是花椰菜花叶病毒35S启动子、玉米Ubi-1启动子、小麦U6启动子、水稻U3启动子、玉米U3启动子、水稻肌动蛋白启动子。Examples of promoters include, but are not limited to, polymerase (pol) I, pol II, or pol III promoters. Examples of pol I promoters include the chicken RNA pol I promoter. Examples of pol II promoters include, but are not limited to, the cytomegalovirus immediate early (CMV) promoter, the Rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and the simian virus 40 (SV40) immediate early promoter. Examples of pol III promoters include the U6 and H1 promoters. Inducible promoters such as metallothionein promoters can be used. Other examples of promoters include the T7 phage promoter, the T3 phage promoter, the β-galactosidase promoter, and the Sp6 phage promoter. When used in plants, the promoter may be cauliflower mosaic virus 35S promoter, corn Ubi-1 promoter, wheat U6 promoter, rice U3 promoter, corn U3 promoter, rice actin promoter.

如本文中所用，术语“可操作地连接”指调控元件(例如但不限于，启动子序列、转录终止序列等)与核酸序列(例如，编码序列或开放读码框)连接，使得核苷酸序列的转录被所述转录调控元件控制和调节。用于将调控元件区域可操作地连接于核酸分子的技术为本领域已知的。As used herein, the term "operably linked" means that a regulatory element (eg, but not limited to, a promoter sequence, a transcription termination sequence, etc.) is linked to a nucleic acid sequence (eg, a coding sequence or an open reading frame) such that the nucleotide Transcription of the sequence is controlled and regulated by the transcriptional regulatory elements. Techniques for operably linking regulatory element regions to nucleic acid molecules are known in the art.

将核酸分子(例如质粒、线性核酸片段、RNA等)或蛋白质“导入”生物体是指用所述核酸或蛋白质转化生物体细胞，使得所述核酸或蛋白质在细胞中能够发挥功能。本发明所用的“转化”包括稳定转化和瞬时转化。"Introducing" a nucleic acid molecule (eg, plasmid, linear nucleic acid fragment, RNA, etc.) or protein into an organism means transforming an organism's cells with the nucleic acid or protein so that the nucleic acid or protein can function in the cell. "Transformation" as used in the present invention includes stable transformation and transient transformation.

“稳定转化”指将外源核苷酸序列导入基因组中，导致外源基因稳定遗传。一旦稳定转化，外源核酸序列稳定地整合进所述生物体和其任何连续世代的基因组中。"Stable transformation" refers to the introduction of foreign nucleotide sequences into the genome, resulting in stable inheritance of foreign genes. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any successive generations thereof.

“瞬时转化”指将核酸分子或蛋白质导入细胞中，执行功能而没有外源基因稳定遗传。瞬时转化中，外源核酸序列不整合进基因组中。"Transient transformation" refers to the introduction of nucleic acid molecules or proteins into cells to perform functions without stable inheritance of foreign genes. In transient transformation, the foreign nucleic acid sequence is not integrated into the genome.

二、腺嘌呤脱氨酶和包含其的碱基编辑融合蛋白2. Adenine deaminase and base editing fusion proteins containing it

在一方面，本申请提供一种腺嘌呤脱氨酶，其In one aspect, the present application provides an adenine deaminase, which

1)包含特征序列基序VX_nNX₁₀HAEX_nPCXMC；和/或1) Contains the characteristic sequence motif VX _n NX ₁₀ HAEX _n PCXMC; and/or

2)包含与SEQ ID NO:1、10或12具有至少50％、至少60％、至少70％、至少75％、至少80％、至少85％、至少90％、至少91％、至少92％、93％、至少94％、至少95％、至少96％、至少97％、至少98％、至少99％序列相同性的氨基酸序列，且在对应于SEQ ID NO:14的第108位的氨基酸位置处的氨基酸为N。2) Contains at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, An amino acid sequence with 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% sequence identity at the amino acid position corresponding to position 108 of SEQ ID NO: 14 The amino acid is N.

在一些实施方案中，所述腺嘌呤脱氨酶在对应于SEQ ID NO:14的第106位的氨基酸位置处的氨基酸是A或V。在一些实施方案中，所述腺嘌呤脱氨酶在对应于SEQ ID NO:14的第107位的氨基酸位置处的氨基酸是L或R。在一些实施方案中，所述腺嘌呤脱氨酶在对应于SEQ ID NO:14的第109位的氨基酸位置处的氨基酸是K或S。In some embodiments, the amino acid of the adenine deaminase at amino acid position corresponding to amino acid position 106 of SEQ ID NO:14 is A or V. In some embodiments, the amino acid of the adenine deaminase at amino acid position corresponding to amino acid position 107 of SEQ ID NO:14 is L or R. In some embodiments, the adenine The amino acid at the deaminase position corresponding to amino acid position 109 of SEQ ID NO: 14 is K or S.

在一些实施方案中，所述腺嘌呤脱氨酶在对应于SEQ ID NO:14的第106-109位的氨基酸位置处的氨基酸为VRNS、ALNK、ALNS、ARNK、ARNS、VLNK、VLNS、或VRNK。In some embodiments, the adenine deaminase at the amino acid position corresponding to amino acid positions 106-109 of SEQ ID NO: 14 is VRNS, ALNK, ALNS, ARNK, ARNS, VLNK, VLNS, or VRNK .

在一些实施方案中，所述腺嘌呤脱氨酶包含选自SEQ ID NO:2-9、11和13的氨基酸序列。In some embodiments, the adenine deaminase comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 2-9, 11, and 13.

在一些实施方案中，所述“腺嘌呤脱氨酶”能够接受核酸例如单链DNA作为底物，催化腺苷或脱氧腺苷(A)形成肌苷(I)。In some embodiments, the "adenine deaminase" is capable of accepting a nucleic acid such as single-stranded DNA as a substrate and catalyzing the formation of inosine (I) from adenosine or deoxyadenosine (A).

如本文所用，“在对应于SEQ ID NO:14的第108位的氨基酸位置处的氨基酸”意思是指在与SEQ ID NO:14的氨基酸序列进行序列比对后，本文所述腺嘌呤脱氨酶中与SEQ ID NO:14的第108位的氨基酸对齐的氨基酸。本文其它类似术语/短句有着类似的含义。不同序列中的氨基酸的对应性可以根据本领域公知的序列比对方法确定。例如氨基酸对应性可以通过EMBL-EBI的在线比对工具来确定(https://www.ebi.ac.uk/Tools/psa/)，其中两个序列可以使用Needleman-Wunsch算法，使用默认参数来对齐。As used herein, "the amino acid at the amino acid position corresponding to amino acid position 108 of SEQ ID NO: 14" means that after sequence alignment with the amino acid sequence of SEQ ID NO: 14, the adenine deamination described herein is The amino acid in the enzyme that aligns with the amino acid at position 108 of SEQ ID NO:14. Other similar terms/phrases in this article have similar meanings. The correspondence of amino acids in different sequences can be determined according to sequence comparison methods known in the art. For example, amino acid correspondence can be determined through the EMBL-EBI online alignment tool (https://www.ebi.ac.uk/Tools/psa/), where two sequences can be determined using the Needleman-Wunsch algorithm using default parameters. Alignment.

在本文各个方面的所述特征序列基序VX_nNX₁₀HAEX_nPCXMC中，X代表任意氨基酸；n代表任意整数，例如1-100、1-50、1-20或1-10范围中的任意整数。In the characteristic sequence motifs VX _n NX ₁₀ HAEX _n PCXMC in various aspects of this article, X represents any amino acid; n represents any integer, such as any in the range of 1-100, 1-50, 1-20 or 1-10 integer.

在一方面，本申请涉及腺嘌呤脱氨酶用于在生物体或生物体细胞中进行基因编辑例如碱基编辑中的用途，其中所述腺嘌呤脱氨酶In one aspect, the present application relates to the use of an adenine deaminase for gene editing, such as base editing, in an organism or a cell of an organism, wherein the adenine deaminase

在一些实施方案中，所述腺嘌呤脱氨酶在对应于SEQ ID NO:14的第106位的氨基酸位置处的氨基酸是A或V。在一些实施方案中，所述腺嘌呤脱氨酶在对应于SEQ ID NO:14的第107位的氨基酸位置处的氨基酸是L或R。在一些实施方案中，所述腺嘌呤脱氨酶在对应于SEQ ID NO:14的第109位的氨基酸位置处的氨基酸是K或S。In some embodiments, the amino acid at the adenine deaminase enzyme position corresponding to amino acid position 106 of SEQ ID NO: 14 is A or V. In some embodiments, the amino acid at the adenine deaminase enzyme at amino acid position corresponding to position 107 of SEQ ID NO: 14 is L or R. In some embodiments, the amino acid at the adenine deaminase enzyme at amino acid position corresponding to position 109 of SEQ ID NO: 14 is K or S.

在一些实施方案中，所述腺嘌呤脱氨酶用于制备碱基编辑融合蛋白或碱基编辑系统，所述碱基编辑融合蛋白或碱基编辑系统用于在生物体或生物体细胞中进行碱基编辑。In some embodiments, the adenine deaminase is used to prepare base editing fusion proteins or base editing systems. The base editing fusion protein or base editing system is used for base editing in organisms or organism cells.

在另一方面，本发明提供一种碱基编辑融合蛋白，其包含核酸靶向结构域和腺嘌呤脱氨结构域，其中所述腺嘌呤脱氨结构域包含至少一个(例如一个或两个)腺嘌呤脱氨酶多肽，所述腺嘌呤脱氨酶In another aspect, the invention provides a base editing fusion protein comprising a nucleic acid targeting domain and an adenine deamination domain, wherein the adenine deamination domain comprises at least one (eg, one or two) Adenine deaminase polypeptide, the adenine deaminase

在本文实施方案中，“碱基编辑融合蛋白”和“碱基编辑器”可互换使用，指的是可以以序列特异性方式介导基因组中靶序列的一或多个核苷酸取代的蛋白。所述一或多个核苷酸取代例如是A至G的取代。In the embodiments herein, "base editing fusion protein" and "base editor" are used interchangeably to refer to a protein that can mediate one or more nucleotide substitutions of a target sequence in the genome in a sequence-specific manner. protein. The one or more nucleotide substitutions are, for example, A to G substitutions.

如本文所用，“核酸靶向结构域”指的是能够介导所述碱基编辑融合蛋白以序列特异性方式(例如通过向导RNA)附着至基因组中特定靶序列处的结构域。在一些实施方案中，所述核酸靶向结构域可以包括针对特定靶序列的一或多个锌指蛋白结构域(ZFP)或转录因子效应物结构域(TALE)。。在一些实施方案中，所述核酸靶向结构域包含至少一个(例如一个)CRISPR效应蛋白(CRISPR effector)多肽。As used herein, a "nucleic acid targeting domain" refers to a domain capable of mediating attachment of the base editing fusion protein to a specific target sequence in the genome in a sequence-specific manner (eg, via a guide RNA). In some embodiments, the nucleic acid targeting domain may include one or more zinc finger protein domains (ZFP) or transcription factor effector domains (TALE) directed to a specific target sequence. . In some embodiments, the nucleic acid targeting domain comprises at least one (eg, one) CRISPR effector polypeptide.

“锌指结蛋白结构域(ZFP)”通常含有3-6个单独的锌指重复序列，每个锌指重复序列可以识别例如3bp的独特序列。通过组合不同的锌指重复序列，可以靶向不同的基因组序列。"Zinc finger desmin domains (ZFPs)" typically contain 3-6 individual zinc finger repeats, each of which can recognize a unique sequence of, for example, 3 bp. By combining different zinc finger repeats, different genomic sequences can be targeted.

转录激活因子样效应物结构域”是转录激活因子样效应物(TALE)的DNA结合结构域。TALE经工程化后可以结合几乎任何想要的DNA序列。"Transcription activator-like effector domain" is the DNA-binding domain of a transcription activator-like effector (TALE). TALEs can be engineered to bind to almost any desired DNA sequence.

如本文所用，术语“CRISPR效应蛋白”通常指在天然存在的CRISPR系统中存在的核酸酶(CRISPR核酸酶)或其功能性变体。该术语涵盖基于CRISPR系统的能够在细胞内实现序列特异性靶向的任何效应蛋白。As used herein, the term "CRISPR effector protein" generally refers to a nuclease present in a naturally occurring CRISPR system (CRISPR nuclease) or a functional variant thereof. The term covers CRISPR-based systems capable of Any effector protein that achieves sequence-specific targeting within the cell.

如本文所用，就CRISPR核酸酶而言的“功能性变体”意指其至少保留向导RNA介导的序列特异性靶向能力。优选地，所述功能性变体是核酸酶失活的变体，即其缺失双链核酸切割活性。然而，缺失双链核酸切割活性的CRISPR核酸酶也涵盖切口酶(nickase)，其在双链核酸分子形成切口(nick)，但不完全切断双链核酸。在本发明的一些优选的实施方案中，本发明所述CRISPR效应蛋白具有切口酶活性。在一些实施方案中，所述功能性变体相对于野生型核酸酶识别不同的PAM(前间区序列邻近基序)序列。As used herein, a "functional variant" with respect to a CRISPR nuclease means that it retains at least the sequence-specific targeting ability mediated by the guide RNA. Preferably, the functional variant is a nuclease-inactive variant, ie, it lacks double-stranded nucleic acid cleavage activity. However, CRISPR nucleases lacking double-stranded nucleic acid cleavage activity also include nickases, which form nicks in double-stranded nucleic acid molecules but do not completely cut the double-stranded nucleic acid. In some preferred embodiments of the invention, the CRISPR effector protein of the invention has nickase activity. In some embodiments, the functional variant recognizes a different PAM (protospacer adjacent motif) sequence relative to the wild-type nuclease.

“CRISPR效应蛋白”可以衍生自Cas9核酸酶，包括Cas9核酸酶或其功能性变体。所述Cas9核酸酶可以是来自不同物种的Cas9核酸酶，例如来自化脓链球菌(S.pyogenes)的spCas9或衍生自金黄色葡萄球菌(S.aureus)的SaCas9。“Cas9核酸酶”和“Cas9”在本文中可互换使用，指的是包括Cas9蛋白或其片段(例如包含Cas9的活性DNA切割结构域和/或Cas9的gRNA结合结构域的蛋白)的RNA指导的核酸酶。Cas9是CRISPR/Cas(成簇的规律间隔的短回文重复序列及其相关系统)基因组编辑系统的组分，能在向导RNA的指导下靶向并切割DNA靶序列形成DNA双链断裂(DSB)。野生型SpCas9的示例性氨基酸序列示于SEQ ID NO:15。"CRISPR effector proteins" can be derived from Cas9 nucleases, including Cas9 nucleases or functional variants thereof. The Cas9 nuclease may be a Cas9 nuclease from a different species, such as spCas9 from S. pyogenes or SaCas9 derived from S. aureus. "Cas9 nuclease" and "Cas9" are used interchangeably herein to refer to an RNA that includes a Cas9 protein or a fragment thereof (e.g., a protein that includes the active DNA cleavage domain of Cas9 and/or the gRNA binding domain of Cas9) Guided nuclease. Cas9 is a component of the CRISPR/Cas (Clustered Regularly Interspaced Short Palindromic Repeats and Related Systems) genome editing system. It can target and cut DNA target sequences to form DNA double-strand breaks (DSBs) under the guidance of guide RNA. ). An exemplary amino acid sequence of wild-type SpCas9 is shown in SEQ ID NO: 15.

“CRISPR效应蛋白”还可以衍生自Cpf1(即Cas12a)核酸酶，包括Cpf1核酸酶或其功能性变体。所述Cpf1核酸酶可以是来自不同物种的Cpf1核酸酶，例如来自Francisella novicida U112、Acidaminococcus sp.BV3L6和Lachnospiraceae bacterium ND2006的Cpf1核酸酶。"CRISPR effector proteins" can also be derived from Cpf1 (i.e., Cas12a) nuclease, including Cpf1 nuclease or functional variants thereof. The Cpf1 nuclease can be a Cpf1 nuclease from different species, such as Cpf1 nuclease from Francisella novicida U112, Acidaminococcus sp. BV3L6 and Lachnospiraceae bacterium ND2006.

可用的“CRISPR效应蛋白”还可以衍生自Cas3、Cas8a、Cas5、Cas8b、Cas8c、Cas10d、Cse1、Cse2、Csy1、Csy2、Csy3、GSU0054、Cas10、Csm2、Cmr5、Cas10、Csx11、Csx10、Csf1、Csn2、Cas4、C2c1(Cas12b)、C2c3、C2c2、Cas12c、Cas12d(即CasY)、Cas12e(即CasX)、Cas12f(即Cas14)、Cas12g、Cas12h、Cas12i、Cas12j(即CasΦ)、Cas12k、Cas12l、Cas12m等核酸酶，例如包括这些核酸酶或其功能性变体。Available "CRISPR effector proteins" can also be derived from Cas3, Cas8a, Cas5, Cas8b, Cas8c, Cas10d, Cse1, Cse2, Csy1, Csy2, Csy3, GSU0054, Cas10, Csm2, Cmr5, Cas10, Csx11, Csx10, Csf1, Csn2 , Cas4, C2c1 (Cas12b), C2c3, C2c2, Cas12c, Cas12d (i.e. CasY), Cas12e (i.e. CasX), Cas12f (i.e. Cas14), Cas12g, Cas12h, Cas12i, Cas12j (i.e. CasΦ), Cas12k, Cas12l, Cas12m, etc. Nucleases include, for example, these nucleases or functional variants thereof.

在一些实施方案中，所述CRISPR效应蛋白是核酸酶失活的Cas9。Cas9核酸酶的DNA切割结构域已知包含两个亚结构域：HNH核酸酶亚结构域和RuvC亚结构域。HNH亚结构域切割与gRNA互补的链，而RuvC亚结构域切割非互补的链。在这些亚结构域中的突变可以使Cas9的核酸酶活性失活，形成“核酸酶失活的Cas9”。所述核酸酶失活的Cas9仍然保留gRNA指导的DNA结合能力。In some embodiments, the CRISPR effector protein is nuclease-inactivated Cas9. The DNA cleavage domain of Cas9 nuclease is known to contain two subdomains: the HNH nuclease subdomain and the RuvC subdomain. The HNH subdomain cleaves the strand that is complementary to the gRNA, while the RuvC subdomain cleaves the non-complementary strand. Mutations in these subdomains can inactivate the nuclease activity of Cas9, forming "nuclease-inactive Cas9." The nuclease-inactivated Cas9 still retains the gRNA-guided DNA binding ability.

本发明所述核酸酶失活的Cas9可以衍生自不同物种的Cas9，例如，衍生自化脓链球菌(S.pyogenes)Cas9(SpCas9)，或衍生自金黄色葡萄球菌(S.aureus)Cas9(SaCas9)。同时突变Cas9的HNH核酸酶亚结构域和RuvC亚结构域(例如，包含突变D10A和H840A)使Cas9的核酸酶失去活性，成为核酸酶死亡Cas9(dCas9)。突变失活其中一个亚结构域可以使得Cas9具有切口酶活性，即获得Cas9切口酶(nCas9)，例如，仅具有突变D10A的nCas9。The nuclease-inactivated Cas9 of the present invention can be derived from Cas9 of different species, for example, derived from S. pyogenes Cas9 (SpCas9), or derived from Staphylococcus aureus (S. aureus) Cas9 (SaCas9 ). Simultaneously mutating the HNH nuclease subdomain and RuvC subdomain of Cas9 (e.g., containing mutations D10A and H840A) renders Cas9 nuclease inactive and becomes nuclease-dead Cas9 (dCas9). Mutational inactivation of one of the subdomains can render Cas9 nickase active, that is, a Cas9 nickase (nCas9) is obtained, for example, nCas9 with only the D10A mutation.

因此，在本发明各方面的一些实施方案中，本发明所述核酸酶失活的Cas9变体相对于野生型Cas9包含氨基酸取代D10A和/或H840A，其中氨基酸编号参照SEQ ID NO:15。在一些优选实施方式中，所述核酸酶失活的Cas9相对于野生型Cas9包含氨基酸取代D10A，其中氨基酸编号参照SEQ ID NO:15。在一些实施方式中，所述核酸酶失活的Cas9包含SEQ ID NO:16所示的氨基酸序列(nCas9(D10A))。Accordingly, in some embodiments of aspects of the invention, the nuclease-inactivated Cas9 variants of the invention are For wild-type Cas9, the amino acid substitutions D10A and/or H840A are included, where the amino acid numbering refers to SEQ ID NO: 15. In some preferred embodiments, the nuclease-inactivated Cas9 includes the amino acid substitution D10A relative to wild-type Cas9, where the amino acid numbering refers to SEQ ID NO: 15. In some embodiments, the nuclease-inactivated Cas9 comprises the amino acid sequence set forth in SEQ ID NO: 16 (nCas9(D10A)).

Cas9核酸酶在用于基因编辑时，通常需要靶序列在3’端具有5’-NGG-3’的PAM(前间区序列邻近基序)序列。然而，本发明人令人惊奇地发现，这一PAM序列在某些物种例如水稻中出现频率很低，极大地限制了在这些物种如水稻中的基因编辑。为此，本发明中可以使用识别不同的PAM序列的CRISPR效应蛋白，例如具有不同的PAM序列的Cas9核酸酶功能性变体。When Cas9 nuclease is used for gene editing, it usually requires that the target sequence has a 5'-NGG-3' PAM (protospacer adjacent motif) sequence at the 3' end. However, the inventors surprisingly found that the frequency of this PAM sequence in some species such as rice is very low, which greatly limits gene editing in these species such as rice. To this end, CRISPR effector proteins that recognize different PAM sequences can be used in the present invention, such as functional variants of Cas9 nuclease with different PAM sequences.

在本发明的一些实施方案中，融合蛋白中的腺嘌呤脱氨结构域能够将CRISPR效应蛋白-向导RNA-DNA复合物形成中产生的单链DNA的腺苷脱氨转换成肌苷(I)，由于DNA聚合酶会将肌苷(I)当做鸟嘌呤(G)处理，因此通过碱基错配修复可以实现A至G的取代。In some embodiments of the invention, the adenine deamination domain in the fusion protein is capable of converting adenosine deamination of single-stranded DNA produced in the formation of the CRISPR effector protein-guide RNA-DNA complex into inosine (I) , because DNA polymerase treats inosine (I) as guanine (G), A to G substitution can be achieved through base mismatch repair.

在本发明的一些实施方案中，所述核酸靶向结构域和所述腺嘌呤脱氨结构域通过接头融合。In some embodiments of the invention, the nucleic acid targeting domain and the adenine deamination domain are fused through a linker.

如本文所用，“接头”可以是长1-50个(例如1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20个或20-25个、25-50个)或更多个氨基酸、无二级以上结构的非功能性氨基酸序列。例如，所述接头可以是柔性接头。As used herein, a "linker" may be 1-50 in length (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, A non-functional amino acid sequence with 18, 19, 20 or 20-25, 25-50) or more amino acids and no secondary or higher structure. For example, the joint may be a flexible joint.

在一些实施方案中，所述碱基编辑融合蛋白从N端至C端方向按以下顺序包含：腺嘌呤脱氨结构域和核酸靶向结构域。In some embodiments, the base editing fusion protein includes in the following order from N-terminus to C-terminus: an adenine deamination domain and a nucleic acid targeting domain.

在本发明的一些实施方案中，本发明的融合蛋白还可以包含核定位序列(NLS)。一般而言，所述融合蛋白中的一个或多个NLS应具有足够的强度，以便在细胞的核中驱动所述融合蛋白以可实现其碱基编辑功能的量积聚。一般而言，核定位活性的强度由所述融合蛋白中NLS的数目、位置、所使用的一个或多个特定的NLS、或这些因素的组合决定。In some embodiments of the invention, the fusion protein of the invention may further comprise a nuclear localization sequence (NLS). In general, one or more NLSs in the fusion protein should be of sufficient strength to drive accumulation of the fusion protein in the nucleus of the cell in an amount that enables its base editing function. In general, the strength of nuclear localization activity is determined by the number, location of NLS in the fusion protein, the specific NLS(s) used, or a combination of these factors.

在本发明的一些实施方案中，本发明的融合蛋白的NLS可以位于N端和/或C端。在本发明的一些实施方案中，本发明的融合蛋白的NLS还可以位于所述腺嘌呤脱氨结构域和核酸靶向结构域之间。在一些实施方案中，所述融合蛋白包含约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中，所述融合蛋白包含在或接近于N端的约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中，所述融合蛋白包含在或接近于C端约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中，所述多肽包含这些的组合，如包含在N端的一个或多个NLS以及在C端的一个或多个NLS。当存在多于一个NLS时，每一个可以被选择为不依赖于其他NLS。In some embodiments of the invention, the NLS of the fusion protein of the invention can be located at the N-terminus and/or C-terminus. In some embodiments of the invention, the NLS of the fusion protein of the invention can also be located between the adenine deamination domain and the nucleic acid targeting domain. In some embodiments, the fusion protein contains about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS. In some embodiments, the fusion protein contains about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS at or near the N-terminus. In some embodiments, the fusion protein contains about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS at or near the C-terminus. In some embodiments, the polypeptide comprises a combination of these, such as one or more NLS at the N-terminus and one or more NLS at the C-terminus. When more than one NLS is present, each one can be selected to be independent of the others.

一般而言，NLS由暴露于蛋白表面上的带正电的赖氨酸或精氨酸的一个或多个短序列组成，但其他类型的NLS也是已知的。NLS的非限制性实例包括：KKRKV、PKKKRKV 或KRPAATKKAGQAKKKK。Generally, NLS consists of one or more short sequences of positively charged lysine or arginine exposed on the protein surface, but other types of NLS are also known. Non-limiting examples of NLS include: KKRKV, PKKKRKV Or KRPAATKKAGQAKKKK.

此外，根据所需要编辑的DNA位置，本发明的融合蛋白还可以包括其他的定位序列，例如细胞质定位序列、叶绿体定位序列、线粒体定位序列等。In addition, depending on the DNA position to be edited, the fusion protein of the present invention may also include other positioning sequences, such as cytoplasmic positioning sequences, chloroplast positioning sequences, mitochondrial positioning sequences, etc.

三、碱基编辑系统3. Base editing system

在另一方面，本发明提供一种用于对基因组中靶核酸区域进行修饰的碱基编辑系统，其包含：In another aspect, the invention provides a base editing system for modifying a target nucleic acid region in a genome, comprising:

i)本发明的碱基编辑融合蛋白和/或含有编码所述碱基编辑融合蛋白的核苷酸序列的表达构建体；和/或i) the base editing fusion protein of the present invention and/or the expression construct containing the nucleotide sequence encoding the base editing fusion protein; and/or

ii)至少一种向导RNA和/或至少一种含有编码所述至少一种向导RNA的核苷酸序列的表达构建体，ii) at least one guide RNA and/or at least one expression construct comprising a nucleotide sequence encoding said at least one guide RNA,

其中所述至少一种向导RNA针对所述靶核酸区域内的至少一个靶序列。wherein said at least one guide RNA is directed to at least one target sequence within said target nucleic acid region.

如本文所用，“碱基编辑系统”是指用于对细胞或生物体内基因组进行碱基编辑所需的成分的组合。其中所述系统的各个成分，例如碱基编辑融合蛋白、一种或多种向导RNA可以各自独立地存在，或者可以以任意的组合作为组合物的形式存在。As used herein, a "base editing system" refers to a combination of components required for base editing of a genome in a cell or organism. Each component of the system, such as the base editing fusion protein and one or more guide RNAs, can exist independently, or can exist in any combination as a composition.

如本文所用，“向导RNA”和“gRNA”可互换使用，指的是能够与CRISPR效应蛋白形成复合物并由于与靶序列具有一定相同性而能够将所述复合物靶向靶序列的RNA分子。向导RNA通过与靶序列互补链之间的碱基配对而靶向所述靶序列。例如，Cas9核酸酶或其功能性变体所采用的gRNA通常由部分互补形成复合物的crRNA和tracrRNA分子构成，其中crRNA包含与靶序列具有足够相同性以便与该靶序列的互补链杂交并且指导CRISPR复合物(Cas9+crRNA+tracrRNA)与该靶序列序列特异性地结合的引导序列(也称种子序列)。然而，本领域已知可以设计单向导RNA(sgRNA)，其同时包含crRNA和tracrRNA的特征。而Cpf1核酸酶或其功能性变体所采用的gRNA通常仅由成熟crRNA分子构成，其也可称为sgRNA。基于所使用的CRISPR核酸酶和待编辑的靶序列设计合适的gRNA属于本领域技术人员的能力范围内。As used herein, "guide RNA" and "gRNA" are used interchangeably and refer to an RNA that is capable of forming a complex with a CRISPR effector protein and is capable of targeting the complex to a target sequence due to certain identity with the target sequence. molecular. Guide RNA targets a target sequence by base pairing with its complementary strand. For example, the gRNA used by Cas9 nuclease or its functional variants is usually composed of crRNA and tracrRNA molecules that are partially complementary to form a complex, where the crRNA contains sufficient identity with the target sequence to hybridize to the complementary strand of the target sequence and guide The guide sequence (also called the seed sequence) that the CRISPR complex (Cas9+crRNA+tracrRNA) specifically binds to the target sequence. However, it is known in the art that single guide RNAs (sgRNAs) can be designed that contain characteristics of both crRNA and tracrRNA. The gRNA used by Cpf1 nuclease or its functional variants is usually composed only of mature crRNA molecules, which can also be called sgRNA. It is within the capabilities of those skilled in the art to design appropriate gRNA based on the CRISPR nuclease used and the target sequence to be edited.

本领域技术人员将知晓，如果所述碱基编辑融合蛋白不是基于CRISPR效应蛋白，则所述系统可能无需向导RNA或编码其的表达构建体。Those skilled in the art will know that if the base editing fusion protein is not based on a CRISPR effector protein, the system may not require a guide RNA or an expression construct encoding it.

在一些实施方案中，本发明的碱基编辑系统在导入所述细胞后，所述碱基编辑融合蛋白和所述向导RNA能够形成复合物，并且该复合物在向导RNA介导下特异性靶向靶序列，并导致靶序列中一或多个A被G取代。In some embodiments, after the base editing system of the present invention is introduced into the cell, the base editing fusion protein and the guide RNA can form a complex, and the complex specifically targets the target under the guidance of the guide RNA. Target the target sequence and cause one or more A's to be replaced by G's in the target sequence.

在一些实施方案中，所述至少一种向导RNA可以针对位于基因组靶核酸区域内有义链(例如蛋白编码链)和/或反义链上的靶序列。当向导RNA靶向有义链(例如蛋白编码链)时，本发明的碱基编辑组合物可以导致有义链(例如蛋白编码链)上靶序列内的一或多个A被G取代。当向导RNA靶向反义链时，本发明的碱基编辑组合物可以导致有义链(例如蛋白编码链)上靶序列内的一或多个T被C取代。In some embodiments, the at least one guide RNA can be directed to a target sequence located on the sense strand (eg, the protein coding strand) and/or the antisense strand within the target nucleic acid region of the genome. When the guide RNA targets the sense strand (eg, protein-coding strand), the base editing composition of the present invention can cause one or more A's in the target sequence on the sense strand (eg, protein-coding strand) to be replaced by G. When the guide RNA targets the antisense strand, the base editing composition of the present invention can cause one or more Ts within the target sequence on the sense strand (eg, protein coding strand) to be replaced with Cs.

为了在细胞中获得有效表达，在本发明的一些实施方式中，编码所述碱基编辑融合蛋白的核苷酸序列针对其基因组待进行修饰的生物体进行密码子优化。In order to obtain efficient expression in cells, in some embodiments of the invention, the base editing fusion encoding The nucleotide sequence of the protein is codon-optimized for the organism whose genome is to be modified.

密码子优化是指通过用在宿主细胞的基因中更频繁地或者最频繁地使用的密码子代替天然序列的至少一个密码子(例如约或多于约1、2、3、4、5、10、15、20、25、50个或更多个密码子同时维持该天然氨基酸序列而修饰核酸序列以便增强在感兴趣宿主细胞中的表达的方法。不同的物种对于特定氨基酸的某些密码子展示出特定的偏好。密码子偏好性(在生物之间的密码子使用的差异)经常与信使RNA(mRNA)的翻译效率相关，而该翻译效率则被认为依赖于被翻译的密码子的性质和特定的转运RNA(tRNA)分子的可用性。细胞内选定的tRNA的优势一般反映了最频繁用于肽合成的密码子。因此，可以将基因定制为基于密码子优化在给定生物中的最佳基因表达。密码子利用率表可以容易地获得，例如在www.kazusa.orjp/codon/上可获得的密码子使用数据库(“Codon Usage Database”)中，并且这些表可以通过不同的方式调整适用。参见，Nakamura Y.等，“Codon usage tabulated from the international DNA sequence databases:status for the year2000.Nucl.Acids Res.，28:292(2000)。Codon optimization refers to replacing at least one codon of the native sequence (e.g., about or more than about 1, 2, 3, 4, 5, 10) with a codon that is more frequently or most frequently used in the host cell's genes. , 15, 20, 25, 50 or more codons while maintaining the native amino acid sequence and modifying the nucleic acid sequence to enhance expression in the host cell of interest. Different species display certain codons for specific amino acids specific preferences. Codon bias (differences in codon usage between organisms) is often related to the efficiency of messenger RNA (mRNA) translation, which is thought to depend on the nature of the codons being translated and Availability of specific transfer RNA (tRNA) molecules. The dominance of selected tRNAs within a cell generally reflects the codons most frequently used for peptide synthesis. Thus, genes can be tailored to be most efficient in a given organism based on codon optimization. Optimal gene expression. Codon utilization tables are readily available, for example in the Codon Usage Database available at www.kazusa.orjp/codon/ , and these tables can be adjusted in different ways Applicable. See, Nakamura Y. et al., "Codon usage tabulated from the international DNA sequence databases: status for the year 2000. Nucl. Acids Res., 28:292 (2000).

可以通过本发明的碱基编辑系统进行基因组修饰的生物体包括适于碱基编辑的任何生物体，优选真核生物。生物体的实例包括但不限于，哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫；家禽如鸡、鸭、鹅；植物，包括单子叶植物和双子叶植物，例如，所述植物是作物植物，包括但不限于小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯。Organisms whose genomes can be modified by the base editing system of the present invention include any organisms suitable for base editing, preferably eukaryotic organisms. Examples of organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, geese; plants, including monocots and dicots For example, the plant is a crop plant, including but not limited to wheat, rice, corn, soybean, sunflower, sorghum, rapeseed, alfalfa, cotton, barley, millet, sugar cane, tomato, tobacco, cassava and potato.

四、产生经遗传修饰的细胞的方法4. Methods of producing genetically modified cells

在另一方面，本发明还提一种产生至少一个经遗传修饰的细胞的方法，包括将本发明的碱基编辑系统导入至少一个所述细胞，由此导致所述至少一个细胞中靶核酸区域内的一个或多个核苷酸取代。在一些实施方案中，所述一个或多个核苷酸取代是A至G取代。In another aspect, the invention also provides a method for producing at least one genetically modified cell, comprising introducing the base editing system of the invention into at least one of the cells, thereby causing a target nucleic acid region in the at least one cell One or more nucleotide substitutions within. In some embodiments, the one or more nucleotide substitutions are A to G substitutions.

在一些实施方案中，所述方法还包括从所述至少一个细胞筛选具有期望的一个或多个核苷酸取代的细胞的步骤。In some embodiments, the method further includes the step of screening said at least one cell for cells having the desired one or more nucleotide substitutions.

在一些实施方式中，本发明的方法在体外进行。例如，所述细胞是分离的细胞，或在分离的组织或器官中的细胞。In some embodiments, the methods of the invention are performed in vitro. For example, the cells are isolated cells, or cells in an isolated tissue or organ.

在另一方面，本发明还提供经遗传修饰的生物体，其包含通过本发明的方法产生的经遗传修饰的细胞或其后代细胞。优选地，所述经遗传修饰的细胞或其后代细胞具有期望的一个或多个核苷酸取代。In another aspect, the invention also provides genetically modified organisms comprising genetically modified cells or progeny cells thereof produced by the methods of the invention. Preferably, the genetically modified cell or progeny thereof has the desired one or more nucleotide substitutions.

在本发明中，待进行修饰的靶核酸区域可以位于基因组的任何位置，例如位于功能基因如蛋白编码基因内，或者例如可以位于基因表达调控区如启动子区或增强子区，从而实现对所述基因功能修饰或对基因表达的修饰。在一些实施方案中，所述期望的核苷酸取代导致期望的基因功能修饰或基因表达修饰。In the present invention, the target nucleic acid region to be modified can be located anywhere in the genome, such as within a functional gene such as a protein-coding gene, or can be located in a gene expression regulatory region such as a promoter region or enhancer region, thereby achieving the desired modification. Modification of gene function or modification of gene expression. In some embodiments, the desired nucleotide substitution results in a desired modification of gene function or gene expression.

在一些实施方案中，所述靶核酸区域与所述细胞或生物体的性状相关。在一些实施方案中，所述靶核酸区域中的突变导致所述细胞或生物体的性状的改变。在一些实施方案中，所述靶核酸区域位于蛋白的编码区。在一些实施方案中，所述靶核酸区域编码蛋白的功能相关基序或结构域。在一些优选实施方案中，所述靶核酸区域中的一个或多个核苷酸取代导致所述蛋白的氨基酸序列中的氨基酸取代。在一些实施方案中，所述一个或多个核苷酸取代导致蛋白的功能的改变。In some embodiments, the target nucleic acid region is associated with a trait of the cell or organism. In some implementations In some embodiments, mutations in the target nucleic acid region result in changes in the characteristics of the cell or organism. In some embodiments, the target nucleic acid region is located in the coding region of the protein. In some embodiments, the target nucleic acid region encodes a functionally relevant motif or domain of a protein. In some preferred embodiments, one or more nucleotide substitutions in the target nucleic acid region results in amino acid substitutions in the amino acid sequence of the protein. In some embodiments, the one or more nucleotide substitutions result in changes in the function of the protein.

在本发明的方法中，所述碱基编辑系统可以通过本领域技术人员熟知的各种方法导入细胞。In the method of the present invention, the base editing system can be introduced into cells through various methods well known to those skilled in the art.

可用于将本发明的碱基编辑系统导入细胞的方法包括但不限于：磷酸钙转染、原生质融合、电穿孔、脂质体转染、微注射、病毒感染(如杆状病毒、痘苗病毒、腺病毒、腺相关病毒、慢病毒和其他病毒)、基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化。Methods that can be used to introduce the base editing system of the present invention into cells include, but are not limited to: calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (such as baculovirus, vaccinia virus, Adenovirus, adeno-associated virus, lentivirus and other viruses), biolistic method, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation.

可以通过本发明的方法进行碱基编辑的细胞可以来自例如，哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫；家禽如鸡、鸭、鹅；植物，包括单子叶植物和双子叶植物，优选作物植物，包括但不限于小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯。Cells that can be base edited by the method of the present invention can be from, for example, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, geese; plants, including Monocots and dicots, preferably crop plants, include but are not limited to wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugarcane, tomato, tobacco, cassava and potato.

五、在植物中的应用5. Application in plants

本发明的碱基编辑融合蛋白、碱基编辑系统和产生经遗传修饰的细胞的方法特别适合用于对植物进行遗传学修饰。优选地，所述植物是作物植物，包括但不限于小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯。更优选地，所述植物是水稻。The base editing fusion proteins, base editing systems and methods of producing genetically modified cells of the present invention are particularly suitable for genetic modification of plants. Preferably, the plant is a crop plant, including but not limited to wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugar cane, tomato, tobacco, cassava and potato. More preferably, the plant is rice.

在另一方面，本发明提供了一种产生经遗传修饰的植物的方法，包括将本发明的碱基编辑系统导入至少一个所述植物，由此导致所述至少一个植物的基因组中靶核酸区域内的一个或多个核苷酸取代。In another aspect, the invention provides a method of producing a genetically modified plant, comprising introducing a base editing system of the invention into at least one said plant, thereby resulting in a target nucleic acid region in the genome of said at least one plant One or more nucleotide substitutions within.

在一些实施方案中，所述方法还包括从所述至少一个植物筛选具有期望的一个或多个核苷酸取代的植物。In some embodiments, the method further includes screening the at least one plant for plants having the desired one or more nucleotide substitutions.

在本发明的方法中，所述碱基编辑组合物可以本领域技术人员熟知的各种方法导入植物。可用于将本发明的碱基编辑系统导入植物的方法包括但不限于：基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化、植物病毒介导的转化、花粉管通道法和子房注射法。优选地，通过瞬时转化将所述碱基编辑组合物导入植物。In the method of the present invention, the base editing composition can be introduced into the plant by various methods well known to those skilled in the art. Methods that can be used to introduce the base editing system of the present invention into plants include, but are not limited to: biolistic method, PEG-mediated protoplast transformation, soil Agrobacterium-mediated transformation, plant virus-mediated transformation, pollen tube channel method, and Intraventricular injection method. Preferably, the base editing composition is introduced into the plant by transient transformation.

在本发明的方法中，只需在植物细胞中导入或产生所述碱基编辑融合蛋白和向导RNA即可实现对靶序列的修饰，并且所述修饰可以稳定遗传，无需将编码所述碱基编辑系统的组分的外源多核苷酸稳定转化植物。这样避免了稳定存在的(持续产生的)碱基编辑组合物的潜在脱靶作用，也避免外源核苷酸序列在植物基因组中的整合，从而具有更高生物安全性。In the method of the present invention, the modification of the target sequence can be achieved by simply introducing or producing the base editing fusion protein and guide RNA in plant cells, and the modification can be stably inherited without the need to encode the base. Plants are stably transformed with exogenous polynucleotides that are components of the editing system. This avoids the potential off-target effects of the stably existing (continuously produced) base editing composition, and also avoids the integration of exogenous nucleotide sequences in the plant genome, thereby achieving higher biological safety.

在一些优选实施方式中，所述导入在不存在选择压力下进行，从而避免外源核苷酸序列在植物基因组中的整合。In some preferred embodiments, the introduction is performed in the absence of selection pressure, thereby avoiding exogenous nucleotides Integration of sequences into plant genomes.

在一些实施方式中，所述导入包括将本发明的碱基编辑系统转化至分离的植物细胞或组织，然后使所述经转化的植物细胞或组织再生为完整植物。优选地，在不存在选择压力下进行所述再生，也即是，在组织培养过程中不使用任何针对表达载体上携带的选择基因的选择剂。不使用选择剂可以提高植物的再生效率，获得不含外源核苷酸序列的经修饰的植物。In some embodiments, the introduction includes transforming the base editing system of the invention into isolated plant cells or tissues, and then regenerating the transformed plant cells or tissues into intact plants. Preferably, the regeneration is performed in the absence of selection pressure, that is, without the use of any selection agent against the selection gene carried on the expression vector during tissue culture. Not using a selection agent can increase the regeneration efficiency of plants and obtain modified plants that do not contain foreign nucleotide sequences.

在另一些实施方式中，可以将本发明的碱基编辑系统转化至完整植物上的特定部位，例如叶片、茎尖、花粉管、幼穗或下胚轴。这特别适合于难以进行组织培养再生的植物的转化。In other embodiments, the base editing system of the present invention can be transformed into specific parts of an intact plant, such as leaves, shoot tips, pollen tubes, young ears or hypocotyls. This is particularly suitable for the transformation of plants that are difficult to regenerate in tissue culture.

在本发明的一些实施方式中，直接将体外表达的蛋白质和/或体外转录的RNA分子(例如，所述表达构建体是体外转录的RNA分子)转化至所述植物。所述蛋白质和/或RNA分子能够在植物细胞中实现碱基编辑，随后被细胞降解，避免了外源核苷酸序列在植物基因组中的整合。In some embodiments of the invention, in vitro expressed proteins and/or in vitro transcribed RNA molecules (eg, the expression construct is an in vitro transcribed RNA molecule) are directly transformed into the plant. The protein and/or RNA molecules can achieve base editing in plant cells and are subsequently degraded by the cells, avoiding the integration of exogenous nucleotide sequences in the plant genome.

因此，在一些实施方式中，使用本发明的方法对植物进行遗传修饰和育种可以获得其基因组无外源多核苷酸整合的植物，即非转基因(transgene-free)的经修饰的植物。Therefore, in some embodiments, genetic modification and breeding of plants using the methods of the present invention can result in plants whose genomes are free of exogenous polynucleotide integration, that is, non-transgene-free modified plants.

在本发明的一些实施方式中，其中所述被修饰的靶核酸区域与植物性状如农艺性状相关，由此所述一个或多个核苷酸取代导致所述植物相对于野生型植物具有改变的(优选改善的)性状，例如农艺性状。In some embodiments of the invention, wherein the modified target nucleic acid region is associated with a plant trait, such as an agronomic trait, whereby the one or more nucleotide substitutions result in the plant having altered properties relative to a wild-type plant. (Preferably improved) traits, such as agronomic traits.

在一些实施方式中，所述方法还包括筛选具有期望的一个或多个核苷酸取代和/或期望的性状如农艺性状的植物的步骤。In some embodiments, the method further includes the step of screening plants with the desired one or more nucleotide substitutions and/or a desired trait, such as an agronomic trait.

在本发明的一些实施方式中，所述方法还包括获得所述经遗传修饰的植物的后代。优选地，所述经遗传修饰的植物或其后代具有期望的一个或多个核苷酸取代和/或期望的性状如农艺性状。In some embodiments of the invention, the method further includes obtaining progeny of the genetically modified plant. Preferably, the genetically modified plant or progeny thereof has the desired nucleotide substitution(s) and/or a desired trait such as an agronomic trait.

在另一方面，本发明还提供了经遗传修饰的植物或其后代或其部分，其中所述植物通过本发明上述的方法获得。在一些实施方式中，所述经遗传修饰的植物或其后代或其部分是非转基因的。优选地，所述经遗传修饰的植物或其后代具有期望的遗传修饰和/或期望的性状如农艺性状。In another aspect, the present invention also provides a genetically modified plant or a progeny thereof or a part thereof, wherein said plant is obtained by the above-mentioned method of the present invention. In some embodiments, the genetically modified plant or progeny thereof or parts thereof are non-transgenic. Preferably, the genetically modified plant or its progeny has the desired genetic modification and/or the desired traits such as agronomic traits.

在另一方面，本发明还提供了一种植物育种方法，包括将通过本发明上述的方法获得的在靶核酸区域包含一个或多个核苷酸取代的经遗传修饰的第一植物与不含有所述一个或多个核苷酸取代的第二植物杂交，从而将所述一个或多个核苷酸取代导入第二植物。优选地，所述经遗传修饰的第一植物具有期望的性状如农艺性状。In another aspect, the present invention also provides a plant breeding method, comprising combining a genetically modified first plant containing one or more nucleotide substitutions in the target nucleic acid region obtained by the above-mentioned method of the present invention with a plant that does not contain The one or more nucleotide substitutions are crossed into a second plant, thereby introducing the one or more nucleotide substitutions into the second plant. Preferably, the genetically modified first plant has desirable traits such as agronomic traits.

六、治疗应用6. Therapeutic Application

本发明还涵盖本发明的碱基编辑系统在疾病治疗中的应用。The invention also covers the use of the base editing system of the invention in disease treatment.

通过本发明的碱基编辑系统对疾病相关基因进行修饰，可以实现疾病相关基因的上调、下调、失活、激活或者突变纠正等，从而实现疾病的预防和/或治疗。例如，本发明中所述靶核酸区域可以位于疾病相关基因的蛋白编码区内，或者例如可以位于基因表达调控区如启动子区或增强子区，从而可以实现对所述疾病相关基因功能修饰或对疾病相关基因表达的修饰。因此，本文所述修饰疾病相关基因包括对疾病相关基因本身(例如蛋白编码区)的修饰，也包含对其表达调控区域(如启动子、增强子、内含子等)的修饰。By modifying disease-related genes through the base editing system of the present invention, the up-regulation, down-regulation, inactivation, activation or mutation correction of disease-related genes can be achieved, thereby achieving prevention and/or treatment of diseases. For example, the present invention The target nucleic acid region may be located within the protein coding region of a disease-related gene, or may be located in a gene expression regulatory region such as a promoter region or enhancer region, thereby enabling functional modification of the disease-related gene or modification of the disease-related gene. Modification of expression. Therefore, modification of disease-related genes described herein includes modifications to the disease-related genes themselves (such as protein coding regions), as well as modifications to their expression regulatory regions (such as promoters, enhancers, introns, etc.).

“疾病相关”基因是指与非疾病对照的组织或细胞相比，在来源于疾病影响的组织的细胞中以异常水平或以异常形式产生转录或翻译产物的任何基因。在改变的表达与疾病的出现和/或进展相关的情况下，它可以是以异常高的水平被表达的基因；它可以是以异常低的水平被表达的基因。疾病相关基因还指具有一个或多个突变或直接负责或与一个或多个负责疾病的病因学的基因连锁不平衡的遗传变异的基因。所述突变或遗传变异例如是单核苷酸变异(SNV)。转录的或翻译的产物可以是已知的或未知的，并且可以处于正常或异常水平。A "disease-associated" gene refers to any gene that produces a transcription or translation product at abnormal levels or in an abnormal form in cells derived from disease-affected tissue as compared to non-disease control tissues or cells. Where altered expression is associated with the emergence and/or progression of a disease, it may be a gene that is expressed at an abnormally high level; it may be a gene that is expressed at an abnormally low level. Disease-associated genes also refer to genes that have one or more mutations or genetic variants that are directly responsible for or in linkage disequilibrium with one or more genes responsible for the etiology of the disease. The mutation or genetic variation is, for example, a single nucleotide variation (SNV). The products of transcription or translation may be known or unknown, and may be at normal or abnormal levels.

因此，本发明还提供治疗有需要的对象中的疾病的方法，包括向所述对象递送有效量的本发明的碱基编辑系统以修饰与所述疾病相关的基因。Accordingly, the invention also provides methods of treating a disease in a subject in need thereof, comprising delivering to said subject an effective amount of a base editing system of the invention to modify a gene associated with said disease.

本发明还提供本发明的碱基编辑系统在制备用于治疗有需要的对象中的疾病的药物组合物中的用途，其中所述碱基编辑系统用于修饰与所述疾病相关的基因。The present invention also provides the use of the base editing system of the invention for preparing a pharmaceutical composition for treating a disease in a subject in need thereof, wherein the base editing system is used to modify a gene associated with the disease.

本发明还提供用于治疗有需要的对象中的疾病的药物组合物，其包含本发明的碱基编辑系统，以及任选的药学可接受的载体，其中所述碱基编辑系统用于修饰与所述疾病相关的基因。The present invention also provides pharmaceutical compositions for treating diseases in a subject in need thereof, which comprise the base editing system of the present invention, and optionally a pharmaceutically acceptable carrier, wherein the base editing system is used to modify and genes related to the disease.

在一些实施方式中，所述对象是哺乳动物，例如人。In some embodiments, the subject is a mammal, such as a human.

所述疾病的实例包括但不限于肿瘤、炎症、帕金森病、心血管疾病、阿尔茨海默病、自闭症、药物成瘾、年龄相关性黄斑变性、精神分裂症、遗传性疾病等。Examples of such diseases include, but are not limited to, tumors, inflammation, Parkinson's disease, cardiovascular disease, Alzheimer's disease, autism, drug addiction, age-related macular degeneration, schizophrenia, genetic disorders, and the like.

七、试剂盒7. Test kit

本发明还包括用于本发明的方法的试剂盒，该试剂盒包括本发明的碱基编辑融合蛋白和/或含有编码所述碱基编辑融合蛋白的核苷酸序列的表达构建体，或包含本发明的碱基编辑系统。试剂盒一般包括表明试剂盒内容物的预期用途和/或使用方法的标签。术语标签包括在试剂盒上或与试剂盒一起提供的或以其他方式随试剂盒提供的任何书面的或记录的材料。本发明所述试剂盒还可以包含用于构建本发明的碱基编辑系统中的表达载体的合适的材料。本发明所述试剂盒还可以包含适于将本发明的碱基编辑融合蛋白或碱基编辑系统转化进细胞的试剂。The invention also includes a kit for use in the method of the invention, the kit comprising the base editing fusion protein of the invention and/or an expression construct containing a nucleotide sequence encoding the base editing fusion protein, or comprising Base editing system of the present invention. Kits generally include labels indicating the intended use and/or method of use of the contents of the kit. The term label includes any written or recorded material on or provided with the kit or otherwise provided with the kit. The kit of the present invention may also contain suitable materials for constructing the expression vector in the base editing system of the present invention. The kit of the present invention may also include reagents suitable for transforming the base editing fusion protein or base editing system of the present invention into cells.

八、制备用于碱基编辑的腺嘌呤脱氨酶的方法8. Method for preparing adenine deaminase for base editing

在另一方面，本发明还提供一种获得/制备用于碱基编辑的腺嘌呤脱氨酶的方法，包括In another aspect, the present invention also provides a method for obtaining/preparing adenine deaminase for base editing, comprising

1)鉴定包含特征序列基序VX_nNX₁₀HAEX_nPCXMC的腺嘌呤脱氨酶；和1) Identification of an adenine deaminase containing the characteristic sequence motif VX _n NX ₁₀ HAEX _n PCXMC; and

2)将所述包含特征序列基序VX_nNX₁₀HAEX_nPCXMC的腺嘌呤脱氨酶中对应于 SEQ ID NO:14的第108位的氨基酸位置处的氨基酸突变为N。2) The adenine deaminase containing the characteristic sequence motif VX _n NX ₁₀ HAEX _n PCXMC corresponds to The amino acid at amino acid position 108 of SEQ ID NO:14 is mutated to N.

在一些实施方案中，所述方法包括将所述包含特征序列基序VX_nNX₁₀HAEX_nPCXMC的腺嘌呤脱氨酶中对应于SEQ ID NO:14的第106-109位的氨基酸位置处的氨基酸突变为VRNS、ALNK、ALNS、ARNK、ARNS、VLNK、VLNS、或VRNK。In some embodiments, the method includes converting the adenine deaminase comprising the characteristic sequence motif VX _n NX ₁₀ HAEX _n PCXMC at amino acid positions corresponding to positions 106-109 of SEQ ID NO: 14 The amino acid mutations are VRNS, ALNK, ALNS, ARNK, ARNS, VLNK, VLNS, or VRNK.

Example

为了便于理解本发明，下面将参照相关具体实施例及附图对本发明进行更全面的描述。附图中给出了本发明的较佳实施例。但是，本发明可以以许多不同的形式来实现，并不限于本文所描述的实施例。相反地，提供这些实施例的目的是使对本发明的公开内容的理解更加透彻全面。In order to facilitate understanding of the present invention, the present invention will be described more fully below with reference to relevant specific embodiments and accompanying drawings. Preferred embodiments of the invention are shown in the drawings. However, the invention may be embodied in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided so that a thorough understanding of the present disclosure will be provided.

材料与方法Materials and Methods

1、载体构建1. Carrier construction

对挖掘到的脱氨酶序列进行构建，序列均由金斯瑞公司进行水稻和小麦双密码子优化。将序列构建至PABE-7载体骨架(addgene号#115628)。实施例中使用到的报告系统的质粒由发明人前期构建完成(Li,C.,Zong,Y.,Wang,Y.,Jin,S.,Zhang,D.,Song,Q.,Zhang,R.,&Gao,C.(2018).Expanded base editing in rice and wheat using a Cas9-adenosine deaminase fusion.Genome biology,19(1),59.)。The excavated deaminase sequences were constructed, and the sequences were double-codon optimized for rice and wheat by Genscript. The sequence was constructed into the PABE-7 vector backbone (addgene #115628). The plasmid of the reporter system used in the examples was constructed in advance by the inventors (Li, C., Zong, Y., Wang, Y., Jin, S., Zhang, D., Song, Q., Zhang, R .,&Gao,C.(2018).Expanded base editing in rice and wheat using a Cas9-adenosine deaminase fusion.Genome biology,19(1),59.).

2、原生质体分离和转化2. Protoplast isolation and transformation

本发明中使用的原生质体来自于水稻中花11品种。The protoplasts used in the present invention come from rice variety Zhonghua 11.

2.1水稻苗培养2.1 Rice seedling culture

水稻种子先用75％乙醇漂洗1分钟，再用4％次氯酸钠处理30分钟，无菌水洗涤5次以上。放在M6培养基上培养3-4周，26℃，避光处理。Rice seeds were first rinsed with 75% ethanol for 1 minute, then treated with 4% sodium hypochlorite for 30 minutes, and washed more than 5 times with sterile water. Cultivate on M6 medium for 3-4 weeks at 26°C, protected from light.

2.2原生质体分离2.2 Protoplast isolation

(1)剪下水稻茎秆，用刀片将其中间部分切成0.5-1mm的丝，放入0.6M的Mannitol溶液中避光处理10min，再用滤网过滤，将其放入50mL酶解液(0.45μm滤膜过滤)中，抽真空(压强约15Kpa)30min，取出后放置于摇床(10rpm)上室温酶解5h；(1) Cut off the rice stalk, cut the middle part into 0.5-1mm silk with a blade, put it into 0.6M Mannitol solution to protect from light for 10 minutes, then filter it with a filter, and put it into 50mL enzymatic hydrolysis solution (0.45 μm membrane filtration), vacuum (pressure about 15Kpa) for 30 minutes, take out and place on a shaker (10 rpm) for enzymatic hydrolysis at room temperature for 5 hours;

(2)加30-50mL W5稀释酶解产物，用75μm尼龙滤膜过滤酶解液于圆底离心管中(50mL)；(2) Add 30-50mL W5 to dilute the enzymatic hydrolyzate, filter the enzymatic hydrolyzate with a 75μm nylon filter and place it in a round-bottom centrifuge tube (50mL);

(3)23℃，250g(rcf)，升3降3，离心3min，弃上清；(3) 23℃, 250g (rcf), rise 3 and drop 3, centrifuge for 3 minutes, discard the supernatant;

(4)用20mL W5轻轻悬起细胞，重复步骤(3)(4) Gently suspend the cells with 20mL W5 and repeat step (3)

(5)加适量MMG悬浮，待转化。(5) Add appropriate amount of MMG to suspend until transformation.

2.3水稻原生质体转化2.3 Rice protoplast transformation

(1)分别加所需转化载体各10μg于2mL离心管，混匀后，用去尖的枪头吸取200 μL原生质体，轻弹混匀，加入220μL PEG4000溶液，轻弹混匀，室温避光诱导转化20-30min；(1) Add 10 μg of each required transformation vector to a 2mL centrifuge tube, mix well, and use a sharpened pipette tip to pipette 200 μL protoplasts, flick to mix, add 220 μL PEG4000 solution, flick to mix, induce transformation at room temperature in the dark for 20-30 minutes;

(2)加880μL W5轻轻颠倒混匀，250g(rcf)，升3降3，离心3min，弃上清；(2) Add 880μL W5 and mix gently by inverting, 250g (rcf), rise 3 and drop 3, centrifuge for 3 minutes, discard the supernatant;

(3)加1mL WI溶液，轻轻颠倒混匀，轻轻转至转移到流式管中，室温暗处培养48小时。(3) Add 1mL of WI solution, mix gently by inverting, transfer gently to a flow tube, and incubate in the dark at room temperature for 48 hours.

3、流式细胞仪观察细胞荧光情况3. Observe cell fluorescence with flow cytometry

使用FACSAria III(BD Biosciences)仪器流式分析原生质体GFP阴性和阳性群体。Protoplast GFP-negative and positive populations were analyzed by flow cytometry using a FACSAria III (BD Biosciences) instrument.

实施例1、通过序列搜索可用于碱基编辑的候选腺嘌呤脱氨酶Example 1. Sequence search for candidate adenine deaminase that can be used for base editing

TadA为作用于tRNA的腺嘌呤脱氨酶，目前使用的腺嘌呤碱基编辑系统中的脱氨酶均为大肠杆菌的TadA变体。其所属分支为Tad1/ADAR分支。Rubio等经过研究和总结表明，TadA脱氨酶具有氨基酸序列包含H(C)xE和PCxxC(其中x代表1个任意氨基酸)的特征(Rubio,M.A.,Pastar,I.,Gaston,K.W.,Ragone,F.L.,Janzen,C.J.,Cross,G.A.,Papavasiliou,F.N.,&Alfonzo,J.D.(2007).An adenosine-to-inosine tRNA-editing enzyme that can perform C-to-U deamination of DNA.Proceedings of the National Academy of Sciences of the United States of America,104(19),7821-7826.)。为了寻找新型的TadA脱氨酶，通过在Uniprot sprot(https://www.uniprot.org/uniprot/)数据库中根据符合特征序列的蛋白质进行进一步的注释，并对特征序列进行了进一步的解析和修改。本发明人发现当特征序列为VnxN10xHAEnxPCxMC时(nx代表数量任意的任意氨基酸，10x代表10个任意数量的氨基酸)，在Uniprot sprot数据库(表1)及Uniprot tremble数据库(表2)中找到的结果多数为注释为TadA的蛋白序列，证明了该特征序列对于搜索新型腺嘌呤脱氨酶具有很高的可信度。TadA is an adenine deaminase that acts on tRNA. The deaminase in the adenine base editing system currently used is a TadA variant of Escherichia coli. The branch it belongs to is the Tad1/ADAR branch. Rubio et al. have studied and summarized that TadA deaminase has the characteristics of an amino acid sequence containing H(C)xE and PCxxC (where x represents an arbitrary amino acid) (Rubio, M.A., Pastar, I., Gaston, K.W., Ragone, F.L.,Janzen,C.J.,Cross,G.A.,Papavasiliou,F.N.,&Alfonzo,J.D.(2007).An adenosine-to-inosine tRNA-editing enzyme that can perform C-to-U deamination of DNA.Proceedings of the National Academy of Sciences of the United States of America,104(19),7821-7826.). In order to find a new type of TadA deaminase, further annotation was carried out based on proteins that matched the characteristic sequence in the Uniprot sprot (https://www.uniprot.org/uniprot/) database, and the characteristic sequence was further analyzed and Revise. The inventor found that when the characteristic sequence is VnxN10xHAEnxPCxMC (nx represents any number of any amino acids, 10x represents 10 any number of amino acids), most of the results found in the Uniprot sprot database (Table 1) and Uniprot tremble database (Table 2) For the protein sequence annotated as TadA, it was proved that this characteristic sequence has high confidence for searching for novel adenine deaminase.

表1.利用修改后特征序列在Uniprot sprot数据库中寻找到的蛋白功能及占比
Table 1. Protein functions and proportions found in the Uniprot sprot database using modified characteristic sequences

表2.利用修改后特征序列在Uniprot tremble数据库中寻找到的蛋白功能及占比

Table 2. Protein functions and proportions found in the Uniprot tremble database using modified characteristic sequences

实施例2、No.135潜在新型TadA脱氨酶的改造Example 2. Transformation of No. 135 potential new TadA deaminase

本发明人发现，Iyer等(Iyer,L.M.,Zhang,D.,Rogozin,I.B.,&Aravind,L.(2011).Evolution of the deaminase fold and multiple origins of eukaryotic editing and mutagenic nucleic acid deaminases from bacterial toxin systems.Nucleic acids research,39(22),9473-9497.所列举的编号为135的潜在脱氨酶具有该特征序列。并且该序列与大肠杆菌TadA脱氨酶相似度很低，只有41.89％(图1)。为了使其可作用于DNA，本发明人参照ABE8e变体序列(Richter,M.F.,Zhao,K.T.,Eton,E.,Lapinaite,A.,Newby,G.A.,Thuronyi,B.W.,Wilson,C.,Koblan,L.W.,Zeng,J.,Bauer,D.E.,Doudna,J.A.,&Liu,D.R.(2020).Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity.Nature biotechnology,38(7),883-891.)，对其101-104位氨基酸(对应ABE8e的106-109位氨基酸)进行突变。本发明人发现，当第103位的D变为N时，可以使No.135脱氨酶具有对DNA上目标位点的腺嘌呤进行脱氨的功能(图2和表3)。因此，编号为135的蛋白具有对单链DNA进行腺嘌呤脱氨的功能，基于该蛋白可建立新型腺嘌呤碱基编辑系统。The inventor found that Iyer et al. (Iyer, L.M., Zhang, D., Rogozin, I.B., & Aravind, L. (2011). Evolution of the deaminase fold and multiple origins of eukaryotic editing and mutagenic nucleic acid deaminases from bacterial toxin systems. Nucleic acids research, 39(22), 9473-9497. The potential deaminase listed as number 135 has this characteristic sequence. And the sequence has a very low similarity with E. coli TadA deaminase, only 41.89% (Figure 1 ). In order to make it act on DNA, the inventors referred to the ABE8e variant sequence (Richter, M.F., Zhao, K.T., Eton, E., Lapinaite, A., Newby, G.A., Thuronyi, B.W., Wilson, C., Koblan,L.W.,Zeng,J.,Bauer,D.E.,Doudna,J.A.,&Liu,D.R.(2020).Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity.Nature biotechnology,38(7),883 -891.), mutating the 101-104 amino acids (corresponding to the 106-109 amino acids of ABE8e). The inventor found that when the D at position 103 is changed to N, the No. 135 deaminase can be The function of deamination of adenine at the target site on DNA (Figure 2 and Table 3). Therefore, the protein numbered 135 has the function of deamination of adenine on single-stranded DNA. Based on this protein, a new type of adenine can be established Base editing system.

表3.No.135潜在脱氨酶101-104位氨基酸突变信息及报告系统发光情况
Table 3. No.135 potential deaminase 101-104 amino acid mutation information and reporting system luminescence status

实施例3、鸟嘌呤脱氨酶分支中的蛋白改造成新型腺嘌呤脱氨酶Example 3. Modification of proteins in the guanine deaminase branch into novel adenine deaminase

通过VnxN10xHAEnxPCxMC特征序列搜索结果发现，除了数据库中已被注释为TadA的序列外，还有一些被注释为其他功能的蛋白，例如鸟嘌呤脱氨酶、tRNA异亮氨酸合成酶、HAD水解酶等以及一些功能尚未被解析的蛋白。本发明人挑选了其中的一些蛋白，发现他们在结构上都与TadA高度相似，而序列上与TadA相似性极低(图4)。因此，根据Iyer等所列举的鸟嘌呤脱氨酶分支选择编号为1299和1417的蛋白与ecTadA进行比对，发现它们与ecTadA的相似度仅为47.24％和42.66％(图3)。根据比对结果对1299和1417的4个关键氨基酸进行改造(替换为VRNS)，原生质体实验结果表明改造后的蛋白可以使报告系统发光(图5、6)，即他们可以实现目标位点腺嘌呤的脱氨。Through the VnxN10xHAEnxPCxMC feature sequence search results, it was found that in addition to the sequences annotated as TadA in the database, there are also some proteins annotated as other functions, such as guanine deaminase, tRNA isoleucine synthase, HAD hydrolase, etc. and some proteins whose functions have not yet been elucidated. The inventors selected some of these proteins and found that they were highly similar to TadA in structure, but had extremely low sequence similarity to TadA (Figure 4). Therefore, based on the guanine deaminase branch listed by Iyer et al., proteins numbered 1299 and 1417 were selected for comparison with ecTadA, and it was found that their similarity to ecTadA was only 47.24% and 42.66% (Figure 3). According to the comparison results, the four key amino acids of 1299 and 1417 were modified (replaced with VRNS). The results of protoplast experiments showed that the modified proteins can make the reporter system glow (Figures 5 and 6), that is, they can achieve the target site gland Deamination of purines.

序列表sequence list

Claims

Use of a cytosine deaminase for gene editing, such as base editing, in an organism or a cell of an organism, wherein the adenine deaminase

1) Contains the characteristic sequence motif VX _n NX ₁₀ HAEX _n PCXMC; and/or

2) Contains at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, An amino acid sequence with 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% sequence identity at the amino acid position corresponding to position 108 of SEQ ID NO: 14 The amino acid is N.

A base editing fusion protein comprising a nucleic acid targeting domain and an adenine deamination domain, wherein the adenine deamination domain comprises at least one (eg, one or two) adenine deaminase polypeptides, adenine deaminase

The use of claim 1 or the base editing fusion protein of claim 2, wherein the adenine deaminase

i) The amino acid at the amino acid position corresponding to amino acid position 106 of SEQ ID NO: 14 is A or V;

ii) the amino acid at amino acid position corresponding to amino acid position 107 of SEQ ID NO: 14 is L or R; and/or

iii) The amino acid at the amino acid position corresponding to amino acid position 109 of SEQ ID NO: 14 is K or S.

The use or base editing fusion protein of claim 3, wherein the amino acids of the adenine deaminase at the amino acid positions corresponding to positions 106-109 of SEQ ID NO: 14 are VRNS, ALNK, ALNS, ARNK, ARNS , VLNK, VLNS, or VRNK.

The use or base editing fusion protein of claim 4, wherein the adenine deaminase comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 2-9, 11 and 13.

The base editing fusion protein of any one of claims 2-5, wherein the nucleic acid targeting domain comprises at least one CRISPR effector polypeptide.

The base editing fusion protein of claim 6, wherein the CRISPR effector protein is nuclease-inactivated Cas9, for example, the nuclease-inactivated Cas9 includes the amino acid sequence shown in SEQ ID NO: 16.

The base editing fusion protein of any one of claims 2-7, wherein the nucleic acid targeting domain and the adenine deamination domain are fused through a linker.

The base editing fusion protein of any one of claims 2 to 8, wherein the base editing fusion protein includes in the following order from the N-terminus to the C-terminus: an adenine deamination domain and a nucleic acid targeting domain.

The base editing fusion protein of any one of claims 2-9, wherein the base editing fusion protein further comprises one or more nuclear localization sequences (NLS).

A base editing system for modifying target nucleic acid regions in the genome of an organism or organism cell, which includes:

i) The base editing fusion protein of one of claims 2-10 and/or an expression construct containing a nucleotide sequence encoding the base editing fusion protein; and/or

ii) at least one guide RNA and/or at least one expression construct comprising a nucleotide sequence encoding said at least one guide RNA,

wherein said at least one guide RNA is directed to at least one target sequence within said target nucleic acid region.

The base editing system of claim 11, wherein the nucleotide sequence encoding the base editing fusion protein is codon-optimized for the organism whose genome is to be modified.

The base editing system of claim 11 or 12, wherein the organism is a eukaryotic organism, including mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, Ducks and geese; plants, including monocots and dicots, for example, the plants are crop plants, including but not limited to wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, Sugar cane, tomato, tobacco, cassava and potato.

A method of producing at least one genetically modified cell, comprising introducing the base editing system of any one of claims 11-13 into at least one of said cells, thereby resulting in said at least one cell within a target nucleic acid region. One or more nucleotides, for example, the one or more nucleotide substitutions are A to G substitutions.

14. The method of claim 14, further comprising the step of screening said at least one cell for cells having the desired one or more nucleotide substitutions.

The method of claim 14 or 15, wherein the base editing system is introduced into the cell by a method selected from the group consisting of: calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (such as rod-shaped viruses, vaccinia virus, adenovirus, adeno-associated virus, lentivirus and other viruses), biolistic method, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation.

The method of any one of claims 14-16, wherein the cells are from mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, geese; plants, Included are monocots and dicots, preferably crop plants such as wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugarcane, tomato, tobacco, cassava and potato.

A pharmaceutical composition for treating a disease in a subject in need thereof, comprising the base editing system of any one of claims 11-13, and optionally a pharmaceutically acceptable carrier, wherein the base editing system The system is used to modify genes associated with the disease.

The pharmaceutical composition of claim 18, wherein said subject is a mammal, such as a human.

The pharmaceutical composition of claim 18 or 19, wherein the disease is selected from the group consisting of tumors, inflammation, Parkinson's disease, cardiovascular disease, Alzheimer's disease, autism, drug addiction, age-related macular degeneration, and schizophrenia. , hereditary diseases, etc.

A method of obtaining/preparing adenine deaminase for base editing, comprising

1) Identification of an adenine deaminase containing the characteristic sequence motif VX _n NX ₁₀ HAEX _n PCXMC; and

2) Mutation of the amino acid at the amino acid position corresponding to position 108 of SEQ ID NO: 14 in the adenine deaminase containing the characteristic sequence motif VX _n NX ₁₀ HAEX _n PCXMC to N.

The method of claim 21 , wherein the method comprises converting the adenine deaminase containing the characteristic sequence motif VX _n NX ₁₀ HAEX _n PCXMC at the amino acid positions corresponding to positions 106-109 of SEQ ID NO: 14 The amino acid mutation is VRNS, ALNK, ALNS, ARNK, ARNS, VLNK, VLNS, or VRNK.