CN118556123A

CN118556123A - HBB modulating compositions and methods

Info

Publication number: CN118556123A
Application number: CN202280074263.XA
Authority: CN
Inventors: R·C·阿尔特舒勒; A·H·博思默; D·R·崔; C·G·S·科特拉-拉穆西诺; K·金; R·M·科特拉尔; G·D·迈克阿利斯特; A·雷; N·罗奎特; C·桑切斯; B·E·斯特因伯格; W·E·萨洛蒙; R·J·希特里克; W·奎尔毕斯; L·H·阿波尼; Z·王; Y·付; D·G·阿伯纳蒂; M·C·霍尔姆斯
Original assignee: Flagship Pioneering Innovations VI Inc
Current assignee: Flagship Pioneering Innovations VI Inc
Priority date: 2021-09-08
Filing date: 2022-09-07
Publication date: 2024-08-27

Abstract

The present disclosure provides compositions, systems, and methods, e.g., for targeting, editing, modifying, or manipulating a host cell genome at one or more locations of a DNA sequence in a cell, tissue, or subject. A genetic modification system for treating Sickle Cell Disease (SCD) is described.

Description

HBB modulating compositions and methods

序列表Sequence Listing

本申请包含序列表，该序列表已经以符合WIPO标准ST.26的XML格式以电子形式递交，并通过援引以其全文特此并入。所述XLM副本创建于2022年9月1日，命名为V2065-7027WO_SL.XML，大小为15,727,019kb。This application contains a sequence listing, which has been submitted electronically in XML format in accordance with WIPO Standard ST.26 and is hereby incorporated by reference in its entirety. The XLM copy was created on September 1, 2022, named V2065-7027WO_SL.XML, and is 15,727,019 kb in size.

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求2021年9月8日提交的美国临时申请号63/241,994、2021年9月29日提交的美国临时申请号63/250,143和2022年1月27日提交的美国临时申请号63/303,900的权益。前述申请的内容通过援引以其全文特此并入。This application claims the benefit of U.S. Provisional Application No. 63/241,994, filed on September 8, 2021, U.S. Provisional Application No. 63/250,143, filed on September 29, 2021, and U.S. Provisional Application No. 63/303,900, filed on January 27, 2022. The contents of the foregoing applications are hereby incorporated by reference in their entirety.

背景技术Background Art

在没有专门的蛋白质来促进插入事件的情况下，目的核酸整合到基因组中的频率较低且位点特异性极低。一些现有的方法，如CRISPR/Cas9，更适合依赖宿主修复途径的小型编辑，并且在整合较长序列时效率较低。其他现有的方法如Cre/loxP需要第一步先将loxP位点插入基因组中，然后第二步将目的序列插入loxP位点中。本领域需要改善的组合物(例如，蛋白质和核酸)和方法，以在基因组中插入、改变、或缺失目的序列。In the absence of specialized proteins to promote insertion events, the frequency of integration of the target nucleic acid into the genome is low and the site specificity is extremely low. Some existing methods, such as CRISPR/Cas9, are more suitable for small edits that rely on host repair pathways and are less efficient when integrating longer sequences. Other existing methods such as Cre/loxP require the first step of inserting the loxP site into the genome, and then the second step of inserting the target sequence into the loxP site. Improved compositions (e.g., proteins and nucleic acids) and methods are needed in the art to insert, change, or delete the target sequence in the genome.

镰状细胞病为影响红细胞的遗传性血液病症。存在几种类型的镰状细胞病(例如，血红蛋白SS病、血红蛋白SC病；镰状β加地中海贫血；镰状β-零地中海贫血)。患有镰状细胞病的人具有主要含有血红蛋白S(异常血红蛋白类型)的红细胞。镰刀形细胞过早地死亡，这可导致红细胞不足(贫血)。镰刀形细胞为刚性的且可阻塞小血管，引起严重疼痛及器官损伤。不接受正常血流的组织最终变得受损。这就是引起镰状细胞病的并发症的原因。Sickle cell disease is an inherited blood disorder that affects red blood cells. There are several types of sickle cell disease (e.g., hemoglobin SS disease, hemoglobin SC disease; sickle beta plus thalassemia; sickle beta-zero thalassemia). People with sickle cell disease have red blood cells that contain mostly hemoglobin S, an abnormal type of hemoglobin. The sickle-shaped cells die prematurely, which can lead to a shortage of red blood cells (anemia). The sickle-shaped cells are rigid and can block small blood vessels, causing severe pain and organ damage. Tissues that do not receive normal blood flow eventually become damaged. This is what causes the complications of sickle cell disease.

HBB基因提供了制备蛋白质β-球蛋白的指令。β-球蛋白为称为血红蛋白的较大蛋白质的组分(亚基)，其位于红细胞内部。在成人中，血红蛋白通常由四个蛋白质亚基组成：β-球蛋白的两个亚基及称为α-球蛋白的另一蛋白质的两个亚基，该α-球蛋白由称为HBA的另一基因产生。这些蛋白质亚基中的每一个与称为血红素的含铁分子结合；各血红素在其中心含有可与一个氧分子结合的铁分子。红细胞内的血红蛋白与肺中的氧分子结合。这些细胞然后行进穿过血流且将氧递送至全身组织。The HBB gene provides instructions for making the protein beta-globulin. Beta-globulin is a component (subunit) of a larger protein called hemoglobin, which is located inside red blood cells. In adults, hemoglobin is usually composed of four protein subunits: two subunits of beta-globulin and two subunits of another protein called alpha-globulin, which is produced by another gene called HBA. Each of these protein subunits is bound to an iron-containing molecule called heme; each heme contains an iron molecule in its center that can bind to one oxygen molecule. The hemoglobin in the red blood cells binds to oxygen molecules in the lungs. These cells then travel through the bloodstream and deliver oxygen to tissues throughout the body.

镰状细胞贫血(镰状细胞病的常见形式)是由HBB基因中的特定突变引起。该突变导致β-球蛋白的异常形式(称为血红蛋白S或HbS)的产生。在该情况下，血红蛋白S替换血红蛋白中的β球蛋白亚基。突变改变β-球蛋白中的单个氨基酸。特别地，氨基酸谷氨酸被β-球蛋白中的位置6处的氨基酸缬氨酸替换(写成Glu6Val或E6V)。用缬氨酸替换谷氨酸引起异常血红蛋白S亚基黏在一起且形成长刚性分子，其将红细胞弯曲成镰状或新月形。HBB基因的突变也可引起β-球蛋白的其他异常，导致其他类型的镰状细胞病。在这些其他类型的镰状细胞病中，仅一个β-球蛋白亚基被替换为血红蛋白S。其他β-球蛋白亚基被不同异常变体(如血红蛋白C或血红蛋白E)替换。Sickle cell anemia, the common form of sickle cell disease, is caused by a specific mutation in the HBB gene. The mutation results in the production of an abnormal form of β-globulin (called hemoglobin S or HbS). In this case, hemoglobin S replaces the β-globin subunit in hemoglobin. The mutation changes a single amino acid in the β-globulin. In particular, the amino acid glutamic acid is replaced by the amino acid valine at position 6 in β-globulin (written as Glu6Val or E6V). Replacing glutamic acid with valine causes the abnormal hemoglobin S subunits to stick together and form a long rigid molecule that bends the red blood cells into a sickle or crescent shape. Mutations in the HBB gene can also cause other abnormalities in β-globulin, leading to other types of sickle cell disease. In these other types of sickle cell disease, only one β-globulin subunit is replaced with hemoglobin S. The other β-globulin subunits are replaced by different abnormal variants (such as hemoglobin C or hemoglobin E).

目前镰状细胞病无普遍治愈方法。用于治疗镰状细胞病的可用选择方案限于骨髓或干细胞移植。因此，需要利用HBB E6V突变对镰状细胞病进行新的且更有效的治疗。There is currently no universal cure for sickle cell disease. Available options for treating sickle cell disease are limited to bone marrow or stem cell transplants. Therefore, new and more effective treatments for sickle cell disease that exploit the HBB E6V mutation are needed.

发明内容Summary of the invention

本披露涉及用于体内或体外改变宿主细胞、组织或受试者中一个或多个位置处的基因组的新型组合物、系统和方法。特别地，本发明的特征在于用于在宿主基因组中插入、改变或缺失目的序列的组合物、系统和方法。例如，本披露提供了能够调节(例如，插入、改变或缺失目的序列)HBB基因活性的系统和治疗镰状细胞病(SCD)的方法，这些方法通过以下进行：施用一种或多种这样的系统来改变HBB核苷酸处的基因组序列，以纠正导致SCD的致病性突变。The present disclosure relates to novel compositions, systems and methods for altering the genome at one or more locations in a host cell, tissue or subject in vivo or in vitro. In particular, the present invention features compositions, systems and methods for inserting, altering or deleting a sequence of interest in a host genome. For example, the present disclosure provides systems capable of modulating (e.g., inserting, altering or deleting a sequence of interest) the activity of the HBB gene and methods for treating sickle cell disease (SCD) by administering one or more such systems to alter the genomic sequence at the HBB nucleotide to correct a pathogenic mutation that causes SCD.

一方面，本披露涉及一种用于修饰DNA以纠正引起SCD的人类HBB基因突变的系统，该系统包含(a)编码能够靶向引发的逆转录的基因修饰多肽的核酸，该多肽包含(i)逆转录酶结构域和(ii)结合DNA并具有核酸内切酶活性的Cas9切口酶；和(b)模板RNA，其包含(i)与人类HBB基因的第一部分互补的gRNA间隔子，(ii)结合该多肽的gRNA支架，(iii)包含突变区以纠正突变的异源对象序列，以及(iv)引物结合位点(PBS)序列，其包含与该模板RNA的3′端处的靶DNA链具有100％同源性的至少3、4、5、6、7或8个碱基。HBB基因可以包含E6V突变。模板RNA序列可包含本文(例如表1、3、4、A、AA、B、B1、5A-5D、X4或X4A中)所描述的序列。In one aspect, the present disclosure relates to a system for modifying DNA to correct a human HBB gene mutation that causes SCD, the system comprising (a) a nucleic acid encoding a gene-modifying polypeptide capable of targeted initiation of reverse transcription, the polypeptide comprising (i) a reverse transcriptase domain and (ii) a Cas9 nickase that binds DNA and has endonuclease activity; and (b) a template RNA comprising (i) a gRNA spacer complementary to a first portion of a human HBB gene, (ii) a gRNA scaffold that binds to the polypeptide, (iii) a heterologous subject sequence comprising a mutation region to correct the mutation, and (iv) a primer binding site (PBS) sequence comprising at least 3, 4, 5, 6, 7 or 8 bases having 100% homology to the target DNA strand at the 3′ end of the template RNA. The HBB gene may comprise an E6V mutation. The template RNA sequence may comprise a sequence described herein (e.g., in Tables 1, 3, 4, A, AA, B, B1, 5A-5D, X4 or X4A).

gRNA间隔子可以包含与模板RNA的5’端处的靶DNA具有100％同一性的至少15个碱基。模板RNA可以进一步包含PBS序列，该PBS序列包含与靶DNA链具有至少80％同源性的至少5个碱基。模板RNA可以包含一个或多个化学修饰。The gRNA spacer may comprise at least 15 bases with 100% identity to the target DNA at the 5' end of the template RNA. The template RNA may further comprise a PBS sequence comprising at least 5 bases with at least 80% homology to the target DNA strand. The template RNA may comprise one or more chemical modifications.

基因修饰多肽的结构域可以通过肽接头连接。该多肽可包含一个或多个肽接头。基因修饰多肽可以进一步包含核定位信号。该多肽可包含多于一个核定位信号，例如，多个相邻的核定位信号或在该多肽的不同区域中的一个或多个核定位信号，例如，在该多肽的N末端中的一个或多个核定位信号以及在该多肽的C末端中的一个或多个核定位信号。编码基因修饰多肽的核酸可以编码一个或多个内含肽结构域。The domains of the genetically modified polypeptide can be connected by peptide linkers. The polypeptide may comprise one or more peptide linkers. The genetically modified polypeptide may further comprise a nuclear localization signal. The polypeptide may comprise more than one nuclear localization signal, for example, multiple adjacent nuclear localization signals or one or more nuclear localization signals in different regions of the polypeptide, for example, one or more nuclear localization signals in the N-terminus of the polypeptide and one or more nuclear localization signals in the C-terminus of the polypeptide. The nucleic acid encoding the genetically modified polypeptide may encode one or more intein domains.

将该系统引入靶细胞可以导致插入至少1、2、3、4、5、10、15、20、25、30、35、40、45、50、60、70、80、90、100、150、200、250、300、350、400、500或1000个碱基对的外源DNA。将该系统引入靶细胞可导致缺失，其中该缺失是该插入上游或下游的基因组DNA的少于2、3、4、5、10、50或100个碱基对。将该系统引入靶细胞可导致取代，例如1、2或3个核苷酸(例如连续核苷酸)的取代。Introducing the system into a target cell can result in the insertion of at least 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 500, or 1000 base pairs of exogenous DNA. Introducing the system into a target cell can result in a deletion, wherein the deletion is less than 2, 3, 4, 5, 10, 50, or 100 base pairs of genomic DNA upstream or downstream of the insertion. Introducing the system into a target cell can result in a substitution, such as a substitution of 1, 2, or 3 nucleotides (e.g., consecutive nucleotides).

异源对象序列可以是至少5、10、25、50、100、150、200、250、300、400、500、600或700个碱基对。The heterologous subject sequence can be at least 5, 10, 25, 50, 100, 150, 200, 250, 300, 400, 500, 600, or 700 base pairs.

一方面，本披露涉及一种药物组合物，其包含上述系统和药学上可接受的赋形剂或载剂，其中该药学上可接受的赋形剂或载剂选自由以下组成的组：质粒载体、病毒载体、囊泡和脂质纳米颗粒。一方面，本披露涉及一种药物组合物，其包含上述系统和多种药学上可接受的赋形剂或载剂，其中该药学上可接受的赋形剂或载剂选自由以下组成的组：质粒载体、病毒载体、囊泡和脂质纳米颗粒，例如，其中上述系统由两种不同的赋形剂或载剂递送，例如两种脂质纳米颗粒、两种病毒载体、或一种脂质纳米颗粒和一种病毒载体。该病毒载体可以是腺相关病毒(AAV)。On the one hand, the present disclosure relates to a pharmaceutical composition comprising the above-mentioned system and a pharmaceutically acceptable excipient or carrier, wherein the pharmaceutically acceptable excipient or carrier is selected from the group consisting of: a plasmid vector, a viral vector, a vesicle, and a lipid nanoparticle. On the one hand, the present disclosure relates to a pharmaceutical composition comprising the above-mentioned system and a plurality of pharmaceutically acceptable excipients or carriers, wherein the pharmaceutically acceptable excipient or carrier is selected from the group consisting of: a plasmid vector, a viral vector, a vesicle, and a lipid nanoparticle, for example, wherein the above-mentioned system is delivered by two different excipients or carriers, such as two lipid nanoparticles, two viral vectors, or a lipid nanoparticle and a viral vector. The viral vector can be an adeno-associated virus (AAV).

一方面，本披露涉及一种宿主细胞(例如哺乳动物细胞，例如人细胞)，其包含上述系统。In one aspect, the disclosure relates to a host cell (eg, a mammalian cell, eg, a human cell) comprising the above-described system.

一方面，本披露涉及一种纠正细胞、组织或受试者中人类HBB基因突变的方法，该方法包括向该细胞、组织或受试者施用上述系统，其中任选地，突变HBB基因的纠正包括V6E的氨基酸取代(逆转致病性取代E6V)。该系统可以体内、体外、离体或原位引入。(a)的核酸可以整合到宿主细胞的基因组中。在一些实施例中，(a)的核酸未整合到宿主细胞的基因组中。在一些实施例中，异源对象序列仅插入宿主细胞基因组中的一个靶位点。异源对象序列可以插入宿主细胞基因组中的两个或更多个靶位点，例如，插入两个同源染色体中的相同对应位点，或相同或不同染色体上的两个不同位点。异源对象序列可以编码哺乳动物多肽或其片段或变体。该系统的组分可以在1、2、3、4或更多个不同的核酸分子上递送。可以通过电穿孔或通过使用选自质粒载体、病毒载体、囊泡和脂质纳米颗粒的至少一种媒介物将该系统引入宿主细胞中。In one aspect, the present disclosure relates to a method for correcting a mutation in a human HBB gene in a cell, tissue or subject, the method comprising administering the above system to the cell, tissue or subject, wherein optionally, the correction of the mutant HBB gene comprises an amino acid substitution of V6E (reversing the pathogenic substitution E6V). The system can be introduced in vivo, in vitro, ex vivo or in situ. The nucleic acid of (a) can be integrated into the genome of the host cell. In some embodiments, the nucleic acid of (a) is not integrated into the genome of the host cell. In some embodiments, the heterologous subject sequence is inserted into only one target site in the genome of the host cell. The heterologous subject sequence can be inserted into two or more target sites in the genome of the host cell, for example, into the same corresponding sites in two homologous chromosomes, or two different sites on the same or different chromosomes. The heterologous subject sequence can encode a mammalian polypeptide or a fragment or variant thereof. The components of the system can be delivered on 1, 2, 3, 4 or more different nucleic acid molecules. The system can be introduced into the host cell by electroporation or by using at least one vector selected from plasmid vectors, viral vectors, vesicles and lipid nanoparticles.

这些组合物或方法的特征可包括以下列举的实施例中的一个或多个。Features of these compositions or methods may include one or more of the embodiments listed below.

列举的实施例Examples of Examples

1.一种模板RNA，其例如从5’至3’包含：1. A template RNA, which comprises, for example, from 5' to 3':

(i)与人类HBB基因的第一部分互补的gRNA间隔子，其中该gRNA间隔子具有包含表1的gRNA间隔子序列的核心核苷酸的序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该gRNA间隔子的侧翼核苷酸的3’端开始的一个或多个连续核苷酸(例如，包含与这些核心核苷酸相邻的一个或多个侧翼核苷酸)，或者其中该gRNA间隔子具有从表A、表AA、表B、表B1、表5A-5D、表X4、或表X4A中选择的间隔子的序列；(i) a gRNA spacer complementary to a first portion of a human HBB gene, wherein the gRNA spacer has a sequence comprising core nucleotides of a gRNA spacer sequence of Table 1, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the gRNA spacer (e.g., comprising one or more flanking nucleotides adjacent to the core nucleotides), or wherein the gRNA spacer has the sequence of a spacer selected from Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4, or Table X4A;

(ii)结合基因修饰多肽(例如，结合该基因修饰多肽的Cas结构域)的gRNA支架，(ii) a gRNA scaffold that binds to a gene modifying polypeptide (e.g., a Cas domain that binds to the gene modifying polypeptide),

(iii)包含突变区的异源对象序列，用于将突变引入该人类HBB基因的第二部分(例如，纠正其中的突变)(其中，任选地，该异源对象序列从5’至3’包含编辑后同源区、突变区和编辑前同源区)，以及(iii) a heterologous subject sequence comprising a mutation region, for introducing a mutation into the second portion of the human HBB gene (e.g., correcting a mutation therein) (wherein, optionally, the heterologous subject sequence comprises, from 5' to 3', a post-editing homology region, a mutation region, and a pre-editing homology region), and

(iv)引物结合位点(PBS)序列，其包含与该人类HBB基因的第三部分具有100％同一性的至少3、4、5、6、7或8个碱基。(iv) a primer binding site (PBS) sequence comprising at least 3, 4, 5, 6, 7 or 8 bases having 100% identity with the third part of the human HBB gene.

2.如实施例1所述的模板RNA，其中该异源对象序列包含以下的核心核苷酸：表3中的RT模板序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该RT模板序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸，或者其中该异源对象序列包含表A、表AA、表B、表B1、表5A-5D、表X4、或表X4A中RT模板序列的序列。2. The template RNA of embodiment 1, wherein the heterologous object sequence comprises the following core nucleotides: an RT template sequence in Table 3, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence, or wherein the heterologous object sequence comprises the sequence of the RT template sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4, or Table X4A.

3.如实施例1所述的模板RNA，其中该异源对象序列包含以下的核心核苷酸：对应于该gRNA间隔子序列的表3中的RT模板序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该RT模板序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸(例如，包含与这些核心核苷酸相邻的一个或多个侧翼核苷酸)，或者其中该异源对象序列包含对应于该gRNA间隔子序列的表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中RT模板序列的序列。3. The template RNA of Example 1, wherein the heterologous object sequence comprises the following core nucleotides: an RT template sequence in Table 3 corresponding to the gRNA spacer sequence, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence (e.g., comprising one or more flanking nucleotides adjacent to the core nucleotides), or wherein the heterologous object sequence comprises a sequence of an RT template sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4 or Table X4A corresponding to the gRNA spacer sequence.

4.如实施例1-3中任一项所述的模板RNA，其中该PBS序列具有包含来自表3中与该RT模板序列同一行的PBS序列的核心核苷酸的序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该PBS序列的侧翼核苷酸的5’端开始的一个或多个连续核苷酸(例如，包含与这些核心核苷酸相邻的一个或多个侧翼核苷酸)。4. The template RNA of any one of embodiments 1-3, wherein the PBS sequence has a sequence comprising core nucleotides from a PBS sequence in the same row as the RT template sequence in Table 3, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 5' end of the flanking nucleotides of the PBS sequence (e.g., comprising one or more flanking nucleotides adjacent to the core nucleotides).

5.如实施例1-3中任一项所述的模板RNA，其中该PBS序列具有包含对应于该RT模板序列、或相对于其具有1、2或3个取代的序列、该gRNA间隔子序列或两者的表3中PBS序列的核心核苷酸的序列，并且任选地包含从该PBS序列的侧翼核苷酸的5’端开始的一个或多个连续核苷酸，或其中该PBS序列具有包含来自表A、表AA、表B、表B1、表5A-5D、表X4或表X4A的对应于该RT模板序列或相对于其具有1、2或3个取代的序列、该gRNA间隔子序列或两者的PBS序列的序列。5. The template RNA of any one of embodiments 1-3, wherein the PBS sequence has a sequence comprising core nucleotides of a PBS sequence in Table 3 corresponding to the RT template sequence, or a sequence having 1, 2 or 3 substitutions therewith, the gRNA spacer sequence, or both, and optionally comprises one or more consecutive nucleotides starting from the 5' end of the flanking nucleotides of the PBS sequence, or wherein the PBS sequence has a sequence comprising a PBS sequence from Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4 or Table X4A corresponding to the RT template sequence, or a sequence having 1, 2 or 3 substitutions therewith, the gRNA spacer sequence, or both.

6.如实施例1-5中任一项所述的模板RNA，其中该gRNA支架包含表12中gRNA支架的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。6. The template RNA of any one of embodiments 1-5, wherein the gRNA scaffold comprises the sequence of the gRNA scaffold in Table 12, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

7.如实施例1-5中任一项所述的模板RNA，其中该gRNA支架包含对应于该RT模板序列、该gRNA间隔子序列、或两者的表12中gRNA支架的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。7. The template RNA of any one of embodiments 1-5, wherein the gRNA scaffold comprises a sequence of the gRNA scaffold in Table 12 corresponding to the RT template sequence, the gRNA spacer sequence, or both, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

8.一种模板RNA，其例如从5’至3’包含：8. A template RNA, which comprises, for example, from 5' to 3':

(i)与人类HBB基因的第一部分互补的gRNA间隔子，(i) a gRNA spacer complementary to the first part of the human HBB gene,

(iii)包含突变区的异源对象序列，用于将突变引入该人类HBB基因的第二部分(例如，纠正其中的突变)，其中该异源对象序列包含表3中RT模板序列或相对于其具有1、2或3个取代的序列的核心核苷酸，并且任选地包含从该RT模板序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸，或者其中该异源对象序列包含表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中的RT模板序列；以及(iii) a heterologous subject sequence comprising a mutation region for introducing a mutation into the second portion of the human HBB gene (e.g., correcting a mutation therein), wherein the heterologous subject sequence comprises the core nucleotides of the RT template sequence in Table 3 or a sequence having 1, 2 or 3 substitutions therewith, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence, or wherein the heterologous subject sequence comprises the RT template sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4 or Table X4A; and

(iv)PBS序列，其包含与该人类HBB基因的第三部分具有100％同一性的至少3、4、5、6、7或8个碱基。(iv) a PBS sequence comprising at least 3, 4, 5, 6, 7 or 8 bases that are 100% identical to the third part of the human HBB gene.

9.如实施例8所述的模板RNA，其中该gRNA间隔子包含以下的核心核苷酸：表1中的gRNA间隔子序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该gRNA间隔子序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸，或者其中该gRNA间隔子包含表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中的gRNA间隔子序列。9. The template RNA of Example 8, wherein the gRNA spacer comprises the following core nucleotides: a gRNA spacer sequence in Table 1, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprising one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the gRNA spacer sequence, or wherein the gRNA spacer comprises a gRNA spacer sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4 or Table X4A.

10.如实施例1-9中任一项所述的模板RNA，其中该gRNA间隔子包含CATGGTGCATCTGACTCCTG(SEQ ID NO：21668)或CATGGTGCACCTGACTCCTG(SEQ ID NO：19249)或相对于其具有1、2或3个取代的序列。10. The template RNA of any one of embodiments 1-9, wherein the gRNA spacer comprises CATGGTGCATCTGACTCCTG (SEQ ID NO: 21668) or CATGGTGCACCTGACTCCTG (SEQ ID NO: 19249) or a sequence having 1, 2 or 3 substitutions therefrom.

11.如实施例1-9中任一项所述的模板RNA，其中该gRNA间隔子包含GTAACGGCAGACTTCTCCAC(SEQ ID NO：19971)或相对于其具有1、2或3个取代的序列。11. The template RNA of any one of embodiments 1-9, wherein the gRNA spacer comprises GTAACGGCAGACTTCTCCAC (SEQ ID NO: 19971) or a sequence having 1, 2 or 3 substitutions thereto.

12.如实施例8所述的模板RNA，其中该异源对象序列包含对应于该RT模板序列或相对于其具有1、2或3个取代的序列的表1中gRNA间隔子序列的核心核苷酸，并且任选地包含从该gRNA间隔子序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸，或者其中该异源对象序列包含对应于该RT模板序列或相对于其具有1、2或3个取代的序列的表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中gRNA间隔子序列的核苷酸。12. The template RNA of embodiment 8, wherein the heterologous subject sequence comprises core nucleotides of a gRNA spacer sequence in Table 1 corresponding to the RT template sequence or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the gRNA spacer sequence, or wherein the heterologous subject sequence comprises nucleotides of a gRNA spacer sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4, or Table X4A corresponding to the RT template sequence or a sequence having 1, 2, or 3 substitutions thereto.

13.如实施例8-12中任一项所述的模板RNA，其中该PBS序列具有包含来自表3中与该RT模板序列同一行的PBS序列的核心核苷酸的序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该PBS序列的侧翼核苷酸的5’端开始的一个或多个连续核苷酸。13. The template RNA of any one of embodiments 8-12, wherein the PBS sequence has a sequence comprising core nucleotides from a PBS sequence in the same row as the RT template sequence in Table 3, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 5' end of the flanking nucleotides of the PBS sequence.

14.如实施例8-12中任一项所述的模板RNA，其中该PBS序列具有包含对应于该RT模板序列、或相对于其具有1、2或3个取代的序列、该gRNA间隔子序列或两者的表3中PBS序列的核心核苷酸的序列，并且任选地包含从该PBS序列的侧翼核苷酸的5’端开始的一个或多个连续核苷酸，或其中该PBS序列具有包含对应于该RT模板序列、该gRNA间隔子序列或两者的表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中PBS序列的核心核苷酸的序列。14. The template RNA of any one of embodiments 8-12, wherein the PBS sequence has a sequence comprising core nucleotides of a PBS sequence in Table 3 corresponding to the RT template sequence, or a sequence having 1, 2 or 3 substitutions relative thereto, the gRNA spacer sequence, or both, and optionally comprises one or more consecutive nucleotides starting from the 5' end of the flanking nucleotides of the PBS sequence, or wherein the PBS sequence has a sequence comprising core nucleotides of a PBS sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4 or Table X4A corresponding to the RT template sequence, the gRNA spacer sequence, or both.

15.如实施例8-14中任一项所述的模板RNA，其中该gRNA支架包含表12中gRNA支架的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。15. The template RNA of any one of embodiments 8-14, wherein the gRNA scaffold comprises the sequence of the gRNA scaffold in Table 12, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

16.如实施例8-14中任一项所述的模板RNA，其中该gRNA支架包含对应于该RT模板序列、该gRNA间隔子序列、或两者的表12中gRNA支架的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。16. The template RNA of any one of embodiments 8-14, wherein the gRNA scaffold comprises a sequence of the gRNA scaffold in Table 12 corresponding to the RT template sequence, the gRNA spacer sequence, or both, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

17.如前述实施例中任一项所述的模板RNA，其中该gRNA间隔子具有表A或表B的gRNA间隔子序列的序列或相对于其具有1、2或3个取代的序列。17. The template RNA of any one of the preceding embodiments, wherein the gRNA spacer has a sequence of a gRNA spacer sequence of Table A or Table B or a sequence having 1, 2 or 3 substitutions thereto.

18.如实施例17所述的模板RNA，其中该gRNA间隔子具有SEQ ID NO：21668的序列。18. The template RNA of Example 17, wherein the gRNA spacer has a sequence of SEQ ID NO: 21668.

19.如实施例17或18所述的模板RNA，其中该PBS序列具有来自表A或B中与该gRNA间隔子序列同一行的PBS序列的序列或相对于其具有1、2或3个取代的序列。19. The template RNA of embodiment 17 or 18, wherein the PBS sequence has a sequence from a PBS sequence in the same row as the gRNA spacer sequence in Table A or B or a sequence having 1, 2 or 3 substitutions relative thereto.

20.如实施例17-19中任一项所述的模板RNA，其中该PBS序列具有包含SEQ ID NO：21669的PBS序列的核心核苷酸的序列，并且任选地包含从该PBS序列的侧翼核苷酸的5’端开始的一个或多个连续核苷酸。20. The template RNA of any one of embodiments 17-19, wherein the PBS sequence has a sequence of core nucleotides of the PBS sequence of SEQ ID NO: 21669, and optionally comprises one or more consecutive nucleotides starting from the 5' end of the flanking nucleotides of the PBS sequence.

21.如实施例17-19中任一项所述的模板RNA，其中该gRNA支架具有来自表A或B中与该gRNA间隔子序列同一行的gRNA支架的序列或相对于其具有1、2或3个取代的序列。21. A template RNA as described in any of embodiments 17-19, wherein the gRNA scaffold has a sequence from a gRNA scaffold in the same row as the gRNA spacer sequence in Table A or B or a sequence having 1, 2 or 3 substitutions relative thereto.

22.如实施例17-20中任一项所述的模板RNA，其中该异源对象序列具有来自表A或B中与该gRNA间隔子序列同一行的RT模板序列的序列，或相对于其具有1、2或3个取代的序列，其中任选地，表A中的该RT模板序列中所示的加粗T被G替换(例如不含PAM-杀灭突变的序列)，或其中进一步任选地，表B中的该RT模板中所示的加粗C被T或U替换(例如，不含以下SNP的序列，该SNP在HEK293T细胞中存在但在hg38人类参考基因组中不存在)。22. The template RNA of any one of embodiments 17-20, wherein the heterologous subject sequence has a sequence from the RT template sequence in the same row as the gRNA spacer sequence in Table A or B, or a sequence having 1, 2, or 3 substitutions relative thereto, wherein optionally, the bold T shown in the RT template sequence in Table A is replaced by G (e.g., a sequence without a PAM-killing mutation), or wherein further optionally, the bold C shown in the RT template in Table B is replaced by T or U (e.g., a sequence without the following SNP, which is present in HEK293T cells but not in the hg38 human reference genome).

23.如实施例17-22中任一项所述的模板RNA，其中该异源对象序列具有包含SEQID NO：21670的RT模板序列的核心核苷酸的序列，并且任选地包含从该RT模板序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸。23. The template RNA of any one of embodiments 17-22, wherein the heterologous subject sequence has a sequence comprising core nucleotides of the RT template sequence of SEQ ID NO: 21670, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence.

24.如实施例17-23中任一项所述的模板RNA，其中该异源对象序列具有包含SEQID NO：21671的RT模板序列的核心核苷酸的序列，并且任选地包含从该RT模板序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸。24. The template RNA of any one of embodiments 17-23, wherein the heterologous subject sequence has a sequence comprising core nucleotides of the RT template sequence of SEQ ID NO: 21671, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence.

25.如实施例17-24中任一项所述的模板RNA，其中该模板RNA具有表A或表B的模板RNA序列或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列，其中任选地该模板RNA包含表A或表B的序列中所示的一种或多种(例如全部)化学修饰。25. The template RNA of any one of embodiments 17-24, wherein the template RNA has a template RNA sequence of Table A or Table B, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto, wherein optionally the template RNA comprises one or more (e.g., all) chemical modifications shown in the sequence of Table A or Table B.

26.一种用于修饰DNA的基因修饰系统，其包含：26. A gene modification system for modifying DNA, comprising:

(a)第一RNA，其从5’至3包含：(i)与人类HBB基因的第一部分互补的指导RNA序列，其中该指导RNA序列具有包含表1的间隔子序列的核心核苷酸的序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该指导RNA序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸，或者其中该指导RNA序列具有包含来自表A、表AA、表B、表B1、表5A-5D、表X4或表X4A的间隔子的序列；以及(ii)结合基因修饰多肽(例如，结合该基因修饰多肽的Cas结构域)的序列(例如，支架区)，以及(a) a first RNA comprising, from 5' to 3', (i) a guide RNA sequence complementary to a first portion of a human HBB gene, wherein the guide RNA sequence has a sequence of core nucleotides comprising a spacer sequence of Table 1, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the guide RNA sequence, or wherein the guide RNA sequence has a sequence comprising a spacer from Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4 or Table X4A; and (ii) a sequence (e.g., a scaffold region) that binds to a gene modifying polypeptide (e.g., a Cas domain that binds to the gene modifying polypeptide), and

(b)第二RNA，其包含：(iii)异源对象序列，其包含核苷酸取代以将突变引入该人类HBB基因的第二部分(其中任选地，该异源对象序列从5’至3’包含编辑后同源区、突变区和编辑前同源区)；(iv)引物区，其包含与该人类HBB基因的第三部分具有100％同一性的至少5、6、7或8个碱基；以及(v)结合基因修饰蛋白的RRS(RNA结合蛋白识别序列)。(b) a second RNA comprising: (iii) a heterologous subject sequence comprising nucleotide substitutions to introduce mutations into the second portion of the human HBB gene (optionally, the heterologous subject sequence comprises a post-editing homology region, a mutation region, and a pre-editing homology region from 5' to 3'); (iv) a primer region comprising at least 5, 6, 7 or 8 bases having 100% identity with the third portion of the human HBB gene; and (v) an RRS (RNA binding protein recognition sequence) that binds to a gene modification protein.

27.如实施例26所述的基因修饰系统，其中该异源对象序列包含以下的核心核苷酸：表3中的RT模板序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该RT模板序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸，或者其中该异源对象序列包含表A、表AA、表B、表B1、表5A-5D、表X4、或表X4A中RT模板序列的序列，或相对于其具有1、2或3个取代的序列。27. The gene modification system of embodiment 26, wherein the heterologous subject sequence comprises the following core nucleotides: an RT template sequence in Table 3, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence, or wherein the heterologous subject sequence comprises a sequence of an RT template sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4, or Table X4A, or a sequence having 1, 2 or 3 substitutions relative thereto.

28.如实施例26所述的基因修饰系统，其中该异源对象序列包含以下的核心核苷酸：对应于该gRNA间隔子序列的表3中的RT模板序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该RT模板序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸，或者其中该异源对象序列包含对应于该gRNA间隔子序列的表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中RT模板序列的序列，或相对于其具有1、2或3个取代的序列。28. The gene modification system of embodiment 26, wherein the heterologous subject sequence comprises the following core nucleotides: an RT template sequence in Table 3 corresponding to the gRNA spacer sequence, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence, or wherein the heterologous subject sequence comprises a sequence of an RT template sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4 or Table X4A corresponding to the gRNA spacer sequence, or a sequence having 1, 2 or 3 substitutions relative thereto.

29.如实施例26-28中任一项所述的基因修饰系统，其中该PBS序列具有包含来自表3中与该RT模板序列同一行的PBS序列的核心核苷酸的序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该PBS序列的侧翼核苷酸的5’端开始的一个或多个连续核苷酸。29. A gene modification system as described in any of embodiments 26-28, wherein the PBS sequence has a sequence comprising core nucleotides from a PBS sequence in Table 3 in the same row as the RT template sequence, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 5' end of the flanking nucleotides of the PBS sequence.

30.如实施例26-28中任一项所述的基因修饰系统，其中该PBS序列具有包含对应于该RT模板序列、或相对于其具有1、2或3个取代的序列、该gRNA间隔子序列或两者的表3中PBS序列的核心核苷酸的序列，并且任选地包含从该PBS序列的侧翼核苷酸的5’端开始的一个或多个连续核苷酸，或其中该PBS序列包含对应于该RT模板序列或相对于其具有1、2或3个取代的序列、该gRNA间隔子序列或两者的表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中的PBS序列。30. A gene modification system as described in any of embodiments 26-28, wherein the PBS sequence has a sequence comprising core nucleotides of a PBS sequence in Table 3 corresponding to the RT template sequence, or a sequence having 1, 2 or 3 substitutions therewith, the gRNA spacer sequence, or both, and optionally comprises one or more consecutive nucleotides starting from the 5' end of the flanking nucleotides of the PBS sequence, or wherein the PBS sequence comprises a PBS sequence in Table A, Table AA, Table B, Table Bl, Table 5A-5D, Table X4 or Table X4A corresponding to the RT template sequence, or a sequence having 1, 2 or 3 substitutions therewith, the gRNA spacer sequence, or both.

31.如实施例26-30中任一项所述的基因修饰系统，其中该gRNA支架包含表12中gRNA支架的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。31. A gene modification system as described in any of Examples 26-30, wherein the gRNA scaffold comprises the sequence of the gRNA scaffold in Table 12, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

32.如实施例26-30中任一项所述的基因修饰系统，其中该gRNA支架包含对应于该RT模板序列、该gRNA间隔子序列、或两者的表12中gRNA支架的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。32. A gene modification system as described in any of Examples 26-30, wherein the gRNA scaffold comprises a sequence of the gRNA scaffold in Table 12 corresponding to the RT template sequence, the gRNA spacer sequence, or both, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

33.一种用于修饰DNA的基因修饰系统，其包含：33. A gene modification system for modifying DNA, comprising:

(a)第一RNA，其从5’至3包含：(i)与人类HBB基因的第一部分互补的指导RNA序列，和(ii)结合基因修饰多肽(例如，结合该基因修饰多肽的Cas结构域)的序列(例如，支架区)，以及(a) a first RNA comprising, from 5' to 3', (i) a guide RNA sequence complementary to a first portion of a human HBB gene, and (ii) a sequence (e.g., a scaffold region) that binds to a gene modifying polypeptide (e.g., a Cas domain that binds to the gene modifying polypeptide), and

(b)第二RNA，其包含：(iii)包含核苷酸取代的异源对象序列，用于将突变引入该人类HBB基因的第二部分，其中该异源对象序列包含以下的核心核苷酸：表3中的RT模板序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该RT模板序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸，或者其中该异源对象序列包含表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中的RT序列，或相对于其具有1、2或3个取代的序列，以及(iv)引物区，其包含与该人类HBB基因的第三部分具有100％同源性的至少5、6、7或8个碱基，以及(v)结合基因修饰蛋白的RRS(RNA结合蛋白识别序列)。(b) a second RNA comprising: (iii) a heterologous subject sequence comprising nucleotide substitutions for introducing mutations into the second portion of the human HBB gene, wherein the heterologous subject sequence comprises the following core nucleotides: the RT template sequence in Table 3, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence, or wherein the heterologous subject sequence comprises the RT sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4 or Table X4A, or a sequence having 1, 2 or 3 substitutions relative thereto, and (iv) a primer region comprising at least 5, 6, 7 or 8 bases having 100% homology to the third portion of the human HBB gene, and (v) an RRS (RNA binding protein recognition sequence) that binds to a gene modifying protein.

34.如实施例33所述的基因修饰系统，其中该gRNA间隔子包含以下的核心核苷酸：表1中的gRNA间隔子序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该gRNA间隔子序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸，或者其中该gRNA间隔子包含表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中的gRNA间隔子序列。34. A gene modification system as described in Example 33, wherein the gRNA spacer comprises the following core nucleotides: a gRNA spacer sequence in Table 1, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprising one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the gRNA spacer sequence, or wherein the gRNA spacer comprises a gRNA spacer sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4 or Table X4A.

35.如实施例33所述的基因修饰系统，其中该异源对象序列包含对应于该RT模板序列或相对于其具有1、2或3个取代的序列的表1中gRNA间隔子序列的核心核苷酸，并且任选地包含从该gRNA间隔子序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸，或者其中该gRNA间隔子包含对应于该RT模板序列或相对于其具有1、2或3个取代的序列的表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中的gRNA间隔子序列。35. The gene modification system of embodiment 33, wherein the heterologous subject sequence comprises core nucleotides of a gRNA spacer sequence in Table 1 corresponding to the RT template sequence or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the gRNA spacer sequence, or wherein the gRNA spacer comprises a gRNA spacer sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4, or Table X4A corresponding to the RT template sequence or a sequence having 1, 2, or 3 substitutions thereto.

36.如实施例33-35中任一项所述的基因修饰系统，其中该PBS序列具有包含来自表3中与该RT模板序列同一行的PBS序列的核心核苷酸的序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该PBS序列的侧翼核苷酸的5’端开始的一个或多个连续核苷酸。36. A gene modification system as described in any of embodiments 33-35, wherein the PBS sequence has a sequence comprising core nucleotides from a PBS sequence in Table 3 in the same row as the RT template sequence, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 5' end of the flanking nucleotides of the PBS sequence.

37.如实施例33-35中任一项所述的基因修饰系统，其中该PBS序列具有包含对应于该RT模板序列、该gRNA间隔子序列或两者的表3中PBS序列的核心核苷酸的序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该PBS序列的侧翼核苷酸的5’端开始的一个或多个连续核苷酸，或其中该PBS序列包含对应于该RT模板序列、该gRNA间隔子序列或两者的表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中的PBS序列，或相对于其具有1、2或3个取代的序列。37. A gene modification system as described in any of embodiments 33-35, wherein the PBS sequence has a sequence comprising core nucleotides of a PBS sequence in Table 3 corresponding to the RT template sequence, the gRNA spacer sequence, or both, or a sequence having 1, 2, or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 5' end of the flanking nucleotides of the PBS sequence, or wherein the PBS sequence comprises a PBS sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4, or Table X4A corresponding to the RT template sequence, the gRNA spacer sequence, or both, or a sequence having 1, 2, or 3 substitutions relative thereto.

38.如实施例33-37中任一项所述的基因修饰系统，其中该gRNA支架包含表12中gRNA支架的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。38. A gene modification system as described in any of Examples 33-37, wherein the gRNA scaffold comprises the sequence of the gRNA scaffold in Table 12, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

39.如实施例33-37中任一项所述的基因修饰系统，其中该gRNA支架包含对应于该RT模板序列、该gRNA间隔子序列、或两者的表12中gRNA支架的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。39. A gene modification system as described in any of Examples 33-37, wherein the gRNA scaffold comprises a sequence of the gRNA scaffold in Table 12 corresponding to the RT template sequence, the gRNA spacer sequence, or both, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

40.一种gRNA，其包含(i)与人类HBB基因的第一部分互补的gRNA间隔子序列，其中该gRNA间隔子具有包含表1、表2或表4的gRNA间隔子序列的核心核苷酸的序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该gRNA间隔子序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸；以及(ii)gRNA支架，或其中该gRNA间隔子具有来自表A、表AA、表B、表B1、表5A-5D、表X4或表X4A的gRNA间隔子序列的序列，或相对于其具有1、2或3个取代的序列。40. A gRNA comprising (i) a gRNA spacer sequence complementary to a first portion of a human HBB gene, wherein the gRNA spacer has a sequence comprising core nucleotides of a gRNA spacer sequence of Table 1, Table 2 or Table 4, or a sequence having 1, 2 or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the gRNA spacer sequence; and (ii) a gRNA scaffold, or wherein the gRNA spacer has a sequence of a gRNA spacer sequence from Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4 or Table X4A, or a sequence having 1, 2 or 3 substitutions thereto.

41.如实施例40所述的gRNA，其中该gRNA支架包含表12中gRNA支架的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。41. The gRNA of embodiment 40, wherein the gRNA scaffold comprises the sequence of the gRNA scaffold in Table 12, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

42.如实施例40所述的gRNA，其中该gRNA支架包含对应于该gRNA间隔子序列的表12中gRNA支架的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。42. The gRNA of embodiment 40, wherein the gRNA scaffold comprises a sequence of a gRNA scaffold in Table 12 corresponding to the gRNA spacer sequence, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

43.一种模板RNA，其包含：(iii)包含突变区的异源对象序列，用于将突变引入人类HBB基因的第二部分，其中该异源对象序列包含以下的核心核苷酸：表3中的RT模板序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该RT模板序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸，或者其中该异源对象序列包含表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中的RT序列或相对于其具有1、2或3个取代的序列，以及(iv)PBS序列，其包含与该人类HBB基因的第三部分具有100％同源性的至少5、6、7或8个碱基。43. A template RNA comprising: (iii) a heterologous subject sequence comprising a mutation region for introducing a mutation into a second portion of a human HBB gene, wherein the heterologous subject sequence comprises the following core nucleotides: an RT template sequence in Table 3, or a sequence having 1, 2 or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence, or wherein the heterologous subject sequence comprises an RT sequence in Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4 or Table X4A, or a sequence having 1, 2 or 3 substitutions thereto, and (iv) a PBS sequence comprising at least 5, 6, 7 or 8 bases having 100% homology to a third portion of the human HBB gene.

44.如实施例43所述的模板RNA，其中该PBS序列具有包含来自表3中与该RT模板序列同一行的PBS序列的核心核苷酸的序列，或相对于其具有1、2或3个取代的序列，并且任选地包含从该PBS序列的侧翼核苷酸的5’端开始的一个或多个连续核苷酸。44. The template RNA of embodiment 43, wherein the PBS sequence has a sequence comprising core nucleotides from a PBS sequence in the same row as the RT template sequence in Table 3, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 5' end of the flanking nucleotides of the PBS sequence.

45.如实施例43所述的模板RNA，其中该PBS序列具有包含对应于该RT模板序列、或相对于其具有1、2或3个取代的序列的表3中PBS序列的核心核苷酸的序列，并且任选地包含从该PBS序列的侧翼核苷酸的5’端开始的一个或多个连续核苷酸，或其中该PBS序列具有包含来自表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中的PBS序列的序列或相对于其具有1、2或3个取代的序列。45. The template RNA of embodiment 43, wherein the PBS sequence has a sequence comprising core nucleotides of a PBS sequence in Table 3 corresponding to the RT template sequence, or a sequence having 1, 2 or 3 substitutions relative thereto, and optionally comprises one or more consecutive nucleotides starting from the 5' end of the flanking nucleotides of the PBS sequence, or wherein the PBS sequence has a sequence comprising a PBS sequence from Table A, Table AA, Table B, Table B1, Table 5A-5D, Table X4 or Table X4A, or a sequence having 1, 2 or 3 substitutions relative thereto.

46.如实施例1-16或43-45中任一项所述的模板RNA、如实施例26-39中任一项所述的基因修饰系统、或如实施例31-33中任一项所述的gRNA，其中由该系统引入的突变是该HBB基因的V6E突变(例如，以纠正致病性E6V突变)。46. The template RNA of any one of embodiments 1-16 or 43-45, the gene modification system of any one of embodiments 26-39, or the gRNA of any one of embodiments 31-33, wherein the mutation introduced by the system is a V6E mutation of the HBB gene (e.g., to correct a pathogenic E6V mutation).

47.如实施例1-16或43-46中任一项所述的模板RNA，或如实施例36-39或46中任一项所述的基因修饰系统，其中该编辑前序列长度包含约1个核苷酸至约35个核苷酸(例如，包含约1-5、5-10、10-15、15-20、20-25、25-30或30-35个核苷酸)。47. The template RNA of any one of embodiments 1-16 or 43-46, or the gene modification system of any one of embodiments 36-39 or 46, wherein the pre-editing sequence length comprises about 1 nucleotide to about 35 nucleotides (e.g., comprises about 1-5, 5-10, 10-15, 15-20, 20-25, 25-30 or 30-35 nucleotides).

48.如实施例1-16或43-47中任一项所述的模板RNA，或如实施例36-39、46或47中任一项所述的基因修饰系统，其中该突变区包含单核苷酸。48. The template RNA of any one of embodiments 1-16 or 43-47, or the gene modification system of any one of embodiments 36-39, 46 or 47, wherein the mutation region comprises a single nucleotide.

49.如实施例1-16或43-47中任一项所述的模板RNA，或如实施例26-39、46或47中任一项所述的基因修饰系统，其中该突变区长度为至少两个核苷酸。49. The template RNA of any one of embodiments 1-16 or 43-47, or the gene modification system of any one of embodiments 26-39, 46 or 47, wherein the mutation region is at least two nucleotides in length.

50.如实施例1-14、41-45或47中任一项所述的模板RNA，或如实施例24-37、44-45或47中任一项所述的基因修饰系统，其中该突变区长度为高达32个(例如，高达5、10、15、20、25、30或32个)核苷酸，并且相对于该人类HBB基因的第二部分包含一个、两个或三个序列差异。50. The template RNA of any one of embodiments 1-14, 41-45 or 47, or the gene modification system of any one of embodiments 24-37, 44-45 or 47, wherein the mutation region is up to 32 (e.g., up to 5, 10, 15, 20, 25, 30 or 32) nucleotides in length and comprises one, two or three sequence differences relative to the second portion of the human HBB gene.

51.如实施例1-16、43-47、49或50中任一项所述的模板RNA，或如实施例26-39、46、47、49或50中任一项所述的基因修饰系统，其中该突变区相对于该人类HBB基因的第二部分包含两个序列差异。51. The template RNA of any one of embodiments 1-16, 43-47, 49 or 50, or the gene modification system of any one of embodiments 26-39, 46, 47, 49 or 50, wherein the mutation region comprises two sequence differences relative to the second portion of the human HBB gene.

52.如实施例1-16、43-47或49-51中任一项所述的模板RNA，或如实施例26-39、46、47或49-51中任一项所述的基因修饰系统，其中该突变区包含设计用于纠正该HBB基因中的致病性突变的第一区(例如，第一核苷酸)和设计用于灭活PAM序列(例如，表A、AA、B或B1中所例示的“PAM-杀灭”突变)的第二区(例如，第二核苷酸)。52. A template RNA as described in any one of embodiments 1-16, 43-47 or 49-51, or a gene modification system as described in any one of embodiments 26-39, 46, 47 or 49-51, wherein the mutation region comprises a first region (e.g., a first nucleotide) designed to correct a pathogenic mutation in the HBB gene and a second region (e.g., a second nucleotide) designed to inactivate a PAM sequence (e.g., a "PAM-killing" mutation exemplified in Table A, AA, B or B1).

53.如实施例1-16、43-51中任一项所述的模板RNA，或如实施例26-39或46-51中任一项所述的基因修饰系统，其中该突变区与该人类HBB基因的对应部分包含小于80％、70％、60％、50％、40％或30％同一性。53. The template RNA of any one of embodiments 1-16, 43-51, or the gene modification system of any one of embodiments 26-39 or 46-51, wherein the mutation region comprises less than 80%, 70%, 60%, 50%, 40% or 30% identity with the corresponding portion of the human HBB gene.

54.如前述实施例中任一项所述的模板RNA，其中该模板RNA包含一种或多种沉默突变(例如，沉默取代)，例如，如表7A、X4或X4A中所例示的。54. The template RNA of any one of the preceding embodiments, wherein the template RNA comprises one or more silent mutations (e.g., silent substitutions), e.g., as exemplified in Table 7A, X4 or X4A.

55.如实施例54所述的模板RNA，其中该一种或多种沉默突变包含在编码该HBB基因的计入初始甲硫氨酸第6个氨基酸(脯氨酸)的密码子处的沉默取代，例如取代为CCC或CCG。55. The template RNA of embodiment 54, wherein the one or more silent mutations comprise a silent substitution at the codon encoding the 6th amino acid (proline) of the HBB gene counting the initial methionine, for example, a substitution to CCC or CCG.

56.如前述实施例中任一项所述的模板RNA，其中该突变区包含设计用于纠正该HBB基因中的致病性突变的第一区和设计用于引入沉默取代的第二区。56. The template RNA of any one of the preceding embodiments, wherein the mutation region comprises a first region designed to correct a pathogenic mutation in the HBB gene and a second region designed to introduce a silent substitution.

57.如前述实施例中任一项所述的模板RNA，其包含一个或多个化学修饰的核苷酸。57. The template RNA of any one of the preceding embodiments, comprising one or more chemically modified nucleotides.

58.一种基因修饰系统，其包含：58. A gene modification system comprising:

如实施例1-16、43-57中任一项所述的模板RNA，或如实施例26-39或46-57中任一项所述的系统，以及A template RNA as described in any one of embodiments 1-16, 43-57, or a system as described in any one of embodiments 26-39 or 46-57, and

基因修饰多肽或编码该基因修饰多肽的核酸(例如RNA)。A gene-modifying polypeptide or a nucleic acid (eg, RNA) encoding the gene-modifying polypeptide.

59.如实施例58所述的基因修饰系统，其中该基因修饰多肽包含：59. The gene modification system of embodiment 58, wherein the gene modification polypeptide comprises:

逆转录酶(RT)结构域(例如，来自逆转录病毒的RT结构域，或与其具有至少75％、80％、85％、90％、95％、96％、97％、98％或99％氨基酸序列同一性的多肽结构域)；以及a reverse transcriptase (RT) domain (e.g., an RT domain from a retrovirus, or a polypeptide domain having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% amino acid sequence identity thereto); and

与靶DNA分子结合且与该RT结构域异源的Cas结构域(例如，Cas9结构域)；以及a Cas domain (e.g., a Cas9 domain) that binds to the target DNA molecule and is heterologous to the RT domain; and

任选地，布置在该RT结构域和该Cas结构域之间的接头。Optionally, a linker is arranged between the RT domain and the Cas domain.

60.如实施例59所述的基因修饰系统，其中：60. The gene modification system of embodiment 59, wherein:

(a)该RT结构域包含：(a) The RT domain comprises:

(i)表6的RT结构域，或(i) the RT domain of Table 6, or

(ii)来自以下的RT结构域：鼠白血病病毒(MMLV)、猪内源逆转录病毒(PERV)、禽网状内皮组织增生病病毒(AVIRE)、猫白血病病毒(FLV)、猿泡沫病毒(SFV)(例如SFV3L)、牛白血病病毒(BLV)、梅森-菲舍猴病毒(MPMV)、人泡沫病毒(HFV)、或牛泡沫/合胞病毒(BFV/BSV)；或(ii) an RT domain from murine leukemia virus (MMLV), porcine endogenous retrovirus (PERV), avian reticuloendotheliosis virus (AVIRE), feline leukemia virus (FLV), simian foamy virus (SFV) (e.g., SFV3L), bovine leukemia virus (BLV), Mason-Fischer monkey virus (MPMV), human foamy virus (HFV), or bovine foamy/syncytial virus (BFV/BSV); or

(b)该基因修饰多肽包含根据表C的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。(b) the genetically modified polypeptide comprises an amino acid sequence according to Table C, or a sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical thereto.

61.如实施例59或60所述的基因修饰系统，其中该Cas结构域包括表7或表8的Cas结构域。61. The gene modification system of Embodiment 59 or 60, wherein the Cas domain comprises the Cas domain of Table 7 or Table 8.

62.如实施例59-61中任一项所述的基因修饰系统，其中该Cas结构域：62. The gene modification system of any one of embodiments 59-61, wherein the Cas domain:

(a)是Cas9结构域；(a) is the Cas9 domain;

(b)是SpCas9结构域、B1atCas9结构域、Nme2Cas9结构域、PnpCas9结构域、SauCas9结构域、SauCas9-KKH结构域、SauriCas9结构域、SauriCas9-KKH结构域、ScaCas9-Sc++结构域、SpyCas9结构域、SpyCas9-NG结构域、SpyCas9-SpRY结构域或St1Cas9结构域；和/或(b) is a SpCas9 domain, a B1atCas9 domain, a Nme2Cas9 domain, a PnpCas9 domain, a SauCas9 domain, a SauCas9-KKH domain, a SauriCas9 domain, a SauriCas9-KKH domain, a ScaCas9-Sc++ domain, a SpyCas9 domain, a SpyCas9-NG domain, a SpyCas9-SpRY domain or a St1Cas9 domain; and/or

(c)是包含N670A突变、N611A突变、N605A突变、N580A突变、N588A突变、N872A突变、N863突变、N622A突变或H840A突变的Cas9结构域。(c) is a Cas9 domain comprising an N670A mutation, an N611A mutation, an N605A mutation, an N580A mutation, an N588A mutation, an N872A mutation, an N863 mutation, an N622A mutation or an H840A mutation.

63.如实施例62所述的基因修饰系统，其中该Cas9结构域结合表7或表12中列出的PAM序列。63. The gene modification system of embodiment 62, wherein the Cas9 domain binds to a PAM sequence listed in Table 7 or Table 12.

64.如实施例63所述的基因修饰系统，其中该人类HBB基因的第二部分与该Cas结构域识别的PAM重叠，例如，其中该人类HBB基因的第二部分在该PAM内或其中该PAM在该人类HBB基因的第二部分内。64. The gene modification system of embodiment 63, wherein the second portion of the human HBB gene overlaps with the PAM recognized by the Cas domain, for example, wherein the second portion of the human HBB gene is within the PAM or wherein the PAM is within the second portion of the human HBB gene.

65.如实施例58-64中任一项所述的基因修饰系统，其中该gRNA间隔子为根据表1的gRNA间隔子，并且该Cas结构域包含表1同一行中列出的Cas结构域，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。65. The gene modification system of any one of embodiments 58-64, wherein the gRNA spacer is a gRNA spacer according to Table 1, and the Cas domain comprises a Cas domain listed in the same row of Table 1, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

66.如实施例58-64中任一项所述的基因修饰系统，其中该模板RNA包含表3、表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中模板RNA序列的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。66. The gene modification system of any one of embodiments 58-64, wherein the template RNA comprises a sequence of a template RNA sequence in Table 3, Table A, Table AA, Table B, Table Bl, Table 5A-5D, Table X4, or Table X4A, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

67.如实施例58-66中任一项所述的基因修饰系统，其中：67. The gene modification system of any one of embodiments 58-66, wherein:

(a)该模板RNA包含表3、表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中模板RNA序列的序列；(a) the template RNA comprises a sequence of a template RNA sequence in Table 3, Table A, Table AA, Table B, Table Bl, Table 5A-5D, Table X4, or Table X4A;

(b)该Cas结构域包括表7或表8的Cas结构域；(b) the Cas domain comprises the Cas domain of Table 7 or Table 8;

(c)该接头包含表10的接头序列(例如SEQ ID NO：5217、5106、5190和5218中任一个的接头序列)；并且(c) the linker comprises a linker sequence of Table 10 (e.g., a linker sequence of any one of SEQ ID NOs: 5217, 5106, 5190, and 5218); and

(d)该基因修饰多肽包含来自表11的一个或两个NLS序列(例如SEQ ID NO：5245、5290、5323、5330、5349、5350、5351和4001中任一个的NLS序列)。(d) the genetically modified polypeptide comprises one or two NLS sequences from Table 11 (e.g., an NLS sequence of any one of SEQ ID NOs: 5245, 5290, 5323, 5330, 5349, 5350, 5351 and 4001).

68.如实施例58-67中任一项所述的基因修饰系统，其在该人类HBB基因的第一链中产生第一切口。68. The gene modification system of any one of embodiments 58-67, which produces a first nick in the first strand of the human HBB gene.

69.如实施例68所述的基因修饰系统，其进一步包含第二链靶向性gRNA，该第二链靶向性gRNA将第二切口引导至该人类HBB基因的第二链。69. The gene modification system of embodiment 68, further comprising a second-strand targeting gRNA that directs a second cut to the second strand of the human HBB gene.

70.如实施例69所述的基因修饰系统，其中该第二链靶向性gRNA包含：70. The gene modification system of embodiment 69, wherein the second strand targeting gRNA comprises:

(i)包含来自表2的左gRNA间隔子序列或右gRNA间隔子序列的核心核苷酸的序列，并且任选地包含从该左gRNA间隔子序列或右gRNA间隔子序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸；或(i) a sequence comprising the core nucleotides of a left gRNA spacer sequence or a right gRNA spacer sequence from Table 2, and optionally comprising one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the left gRNA spacer sequence or the right gRNA spacer sequence; or

(ii)包含表6A的间隔子序列或相对于其具有1、2或3个取代的间隔子序列的第二链靶向性gRNA。(ii) a second-strand targeting gRNA comprising a spacer sequence of Table 6A or a spacer sequence having 1, 2 or 3 substitutions therewith.

71.如实施例69所述的基因修饰系统，其中该第二链靶向性gRNA包含含有对应于(i)的gRNA间隔子序列的来自表2的左gRNA间隔子序列或右gRNA间隔子序列的核心核苷酸的序列，并且任选地包含从该左gRNA间隔子序列或右gRNA间隔子序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸。71. A gene modification system as described in Example 69, wherein the second-strand targeting gRNA comprises a sequence of core nucleotides of a left gRNA spacer sequence or a right gRNA spacer sequence from Table 2 containing the gRNA spacer sequence corresponding to (i), and optionally comprises one or more consecutive nucleotides starting from the 3’ end of the flanking nucleotides of the left gRNA spacer sequence or the right gRNA spacer sequence.

72.如实施例69所述的基因修饰系统，其中该第二链靶向性gRNA包含：72. The gene modification system of embodiment 69, wherein the second strand targeting gRNA comprises:

(i)含有来自表4的第二切口gRNA序列的核心核苷酸的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列，并且任选地包含从该第二切口gRNA序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸；或(i) a sequence containing the core nucleotides of the second nicking gRNA sequence from Table 4, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto, and optionally comprising one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the second nicking gRNA sequence; or

(ii)包含来自表6A的间隔子序列或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列的第二链靶向性gRNA。(ii) a second strand targeting gRNA comprising a spacer sequence from Table 6A, or a sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical thereto.

73.如实施例69所述的基因修饰系统，其中该第二链靶向性gRNA包含含有对应于(i)的gRNA间隔子序列的来自表4的第二切口gRNA序列的核心核苷酸的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列，并且任选地包含从该第二切口gRNA序列的侧翼核苷酸的3’端开始的一个或多个连续核苷酸。73. A gene modification system as described in Example 69, wherein the second strand targeting gRNA comprises a sequence of core nucleotides of a second nick gRNA sequence from Table 4 containing a gRNA spacer sequence corresponding to (i), or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the second nick gRNA sequence.

74.如实施例58-73中任一项所述的基因修饰系统，其中该第二链靶向性gRNA与该基因修饰系统的模板RNA具有“PAM-在内取向”，例如，如表4、6A、X4或X4A中所例示的。74. A gene modification system as described in any of Examples 58-73, wherein the second-strand targeting gRNA and the template RNA of the gene modification system have a "PAM-inward orientation", for example, as exemplified in Table 4, 6A, X4 or X4A.

75.如实施例58-63中任一项所述的基因修饰系统，该第二链靶向性gRNA靶向与该模板RNA的靶突变重叠的序列。75. A gene modification system as described in any of Examples 58-63, wherein the second-strand targeting gRNA targets a sequence overlapping with the target mutation of the template RNA.

76.如实施例75所述的基因修饰系统，其中第二链靶向性gRNA包含：76. The gene modification system of embodiment 75, wherein the second strand targeting gRNA comprises:

(i)与镰状细胞突变互补的序列(例如，间隔子序列)；(i) a sequence complementary to a sickle cell mutation (e.g., a spacer sequence);

(ii)与镰状细胞基因座处的野生型序列互补的序列(例如，间隔子序列)；(ii) a sequence that is complementary to the wild-type sequence at the sickle cell locus (e.g., a spacer sequence);

(iii)与镰状细胞基因座处的Makassar序列互补的序列(例如，间隔子序列)；(iii) a sequence complementary to the Makassar sequence at the sickle cell locus (e.g., a spacer sequence);

(iv)与接近该镰状细胞基因座的SNP、例如受试者(例如，患者)的基因组DNA中包含的SNP互补的序列(例如，间隔子序列)；(iv) a sequence (e.g., a spacer sequence) that is complementary to a SNP proximal to the sickle cell locus, e.g., a SNP contained in genomic DNA of a subject (e.g., a patient);

(v)与接近该镰状细胞基因座的一个或多个沉默取代互补或包含该一个或多个沉默取代的序列(例如，间隔子序列)。(v) a sequence (e.g., a spacer sequence) that is complementary to or comprises one or more silent substitutions proximal to the sickle cell locus.

77.如前述实施例中任一项所述的模板RNA、基因修饰系统或gRNA，其中该gRNA间隔子包含该gRNA间隔子的约1、2、3或更多个侧翼核苷酸。77. The template RNA, gene modification system or gRNA of any of the preceding embodiments, wherein the gRNA spacer comprises approximately 1, 2, 3 or more flanking nucleotides of the gRNA spacer.

78.如前述实施例中任一项所述的模板RNA或基因修饰系统，其中该异源对象序列包含该RT模板序列的约2、3、4、5、10、20、30、40或更多个侧翼核苷酸。78. The template RNA or gene modification system of any of the preceding embodiments, wherein the heterologous subject sequence comprises about 2, 3, 4, 5, 10, 20, 30, 40 or more flanking nucleotides of the RT template sequence.

79.如前述实施例中任一项所述的模板RNA或基因修饰系统，其中该异源对象序列包含约8-30、9-25、10-20、11-16、或12-15个(例如，约11-16个)核苷酸。79. The template RNA or gene modification system of any of the preceding embodiments, wherein the heterologous subject sequence comprises about 8-30, 9-25, 10-20, 11-16, or 12-15 (e.g., about 11-16) nucleotides.

80.如前述实施例中任一项所述的模板RNA或基因修饰系统，其中相对于该人类HBB基因的对应部分，该突变区包含1、2或3个核苷酸位置的序列差异。80. The template RNA or gene modification system of any one of the preceding embodiments, wherein the mutation region comprises a sequence difference of 1, 2 or 3 nucleotide positions relative to the corresponding portion of the human HBB gene.

81.如前述实施例中任一项所述的模板RNA或基因修饰系统，其中相对于该人类HBB基因的对应部分，该突变区包含至少2个核苷酸位置的序列差异。81. The template RNA or gene modification system of any one of the preceding embodiments, wherein the mutation region comprises a sequence difference of at least 2 nucleotide positions relative to the corresponding portion of the human HBB gene.

82.如前述实施例中任一项所述的模板RNA或基因修饰系统，其中该编辑后同源区和/或编辑前同源区与该HBB基因包含100％同一性。82. The template RNA or gene modification system of any one of the preceding embodiments, wherein the post-editing homology region and/or the pre-editing homology region comprises 100% identity with the HBB gene.

83.如前述实施例中任一项所述的模板RNA或基因修饰系统，其中该PBS序列另外包含约1、2、3、4、5、6、7或更多个侧翼核苷酸。83. The template RNA or gene modification system of any one of the preceding embodiments, wherein the PBS sequence further comprises about 1, 2, 3, 4, 5, 6, 7 or more flanking nucleotides.

84.如前述实施例中任一项所述的模板RNA或基因修饰系统，其中该PBS序列包含约5-20、8-16、8-14、8-13、9-13、9-12或10-12个(例如，约9-12个)核苷酸。84. The template RNA or gene modification system of any of the preceding embodiments, wherein the PBS sequence comprises about 5-20, 8-16, 8-14, 8-13, 9-13, 9-12 or 10-12 (e.g., about 9-12) nucleotides.

85.如前述实施例中任一项所述的模板RNA或基因修饰系统，其中该PBS序列在该HBB基因切口位点的1、2、3、4、5、6、7、8、9或10个核苷酸内结合。85. The template RNA or gene modification system of any of the preceding embodiments, wherein the PBS sequence binds within 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides of the HBB gene nicking site.

86.如前述实施例中任一项所述的基因修饰系统，其中该基因修饰多肽的结构域通过肽接头连接。86. The gene modification system of any one of the preceding embodiments, wherein the domains of the gene modifying polypeptide are connected by a peptide linker.

87.如实施例86所述的基因修饰系统，其中该接头包含表10的接头序列(例如SEQID NO：5217、5106、5190和5218中任一个的接头序列)。87. The gene modification system of embodiment 86, wherein the linker comprises a linker sequence of Table 10 (eg, a linker sequence of any one of SEQ ID NOs: 5217, 5106, 5190, and 5218).

88.如前述实施例中任一项所述的基因修饰系统，其中该基因修饰多肽进一步包含一个或多个核定位序列(NLS)。88. The gene modification system of any one of the preceding embodiments, wherein the gene modifying polypeptide further comprises one or more nuclear localization sequences (NLS).

89.如实施例88所述的基因修饰系统，其中该基因修饰多肽包含第一NLS和第二NLS。89. The gene modification system of embodiment 88, wherein the gene modification polypeptide comprises a first NLS and a second NLS.

90.如实施例88或89所述的基因修饰系统，其中该NLS包含表11的NLS序列(例如SEQ ID NO：5245、5290、5323、5330、5349、5350、5351和4001中任一个的序列)。90. The gene modification system of embodiment 88 or 89, wherein the NLS comprises the NLS sequence of Table 11 (e.g., any one of SEQ ID NOs: 5245, 5290, 5323, 5330, 5349, 5350, 5351 and 4001).

91.一种模板RNA，其包含表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中模板RNA的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。91. A template RNA comprising the sequence of a template RNA in Table A, Table AA, Table B, Table Bl, Tables 5A-5D, Table X4, or Table X4A, or a sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical thereto.

92.一种模板RNA，其包含表A、表AA、表B、表B1、表5A-5D、表X4或表X4A中模板RNA的序列。92. A template RNA comprising the sequence of a template RNA in Table A, Table AA, Table B, Table Bl, Tables 5A-5D, Table X4, or Table X4A.

93.一种基因修饰系统，其包含：93. A gene modification system comprising:

(i)模板RNA，其包含表4中模板RNA的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列；以及(i) a template RNA comprising the sequence of a template RNA in Table 4, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto; and

(ii)来自表4中与(i)同一行的第二切口gRNA序列，与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。(ii) a second nicking gRNA sequence from the same row as (i) in Table 4, a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

94.一种基因修饰系统，其包含：94. A gene modification system comprising:

(i)模板RNA，其包含表4的模板RNA的序列；以及(i) a template RNA comprising the sequence of the template RNA of Table 4; and

(ii)来自表4中与(i)同一行的第二切口gRNA序列。(ii) The second cut gRNA sequence from the same row as (i) in Table 4.

95.一种DNA，其编码如实施例1-16、43-53、77-85、91或92中任一项所述的模板RNA，或如实施例40-42中任一项所述的gRNA。95. A DNA encoding the template RNA of any one of embodiments 1-16, 43-53, 77-85, 91 or 92, or the gRNA of any one of embodiments 40-42.

96.一种药物组合物，其包含如实施例58-90、93或94中任一项所述的系统，或编码该系统的一种或多种核酸，以及药学上可接受的赋形剂或载剂。96. A pharmaceutical composition comprising the system of any one of embodiments 58-90, 93 or 94, or one or more nucleic acids encoding the system, and a pharmaceutically acceptable excipient or carrier.

97.如实施例96所述的药物组合物，其中该药学上可接受的赋形剂或载剂选自由以下组成的组：质粒载体、病毒载体、囊泡和脂质纳米颗粒。97. The pharmaceutical composition of embodiment 96, wherein the pharmaceutically acceptable excipient or carrier is selected from the group consisting of a plasmid vector, a viral vector, a vesicle, and a lipid nanoparticle.

98.如实施例97所述的药物组合物，其中该病毒载体是腺相关病毒。98. The pharmaceutical composition of Example 97, wherein the viral vector is an adeno-associated virus.

99.一种宿主细胞(例如哺乳动物细胞，例如人细胞)，其包含如前述实施例中任一项所述的模板RNA或基因修饰系统。99. A host cell (eg, a mammalian cell, eg, a human cell) comprising the template RNA or gene modification system as described in any one of the preceding embodiments.

100.一种制备如实施例1-16、43-53、77-85、91或92中任一项所述的模板RNA的方法，该方法包括通过以下来合成该模板RNA：通过体外转录(例如固态合成)或通过在允许产生该模板RNA的条件下将编码该模板RNA的DNA引入宿主细胞。100. A method for preparing a template RNA as described in any one of embodiments 1-16, 43-53, 77-85, 91 or 92, the method comprising synthesizing the template RNA by in vitro transcription (e.g., solid-state synthesis) or by introducing DNA encoding the template RNA into a host cell under conditions that allow production of the template RNA.

101.一种用于修饰细胞中人类HBB基因中的靶位点的方法，该方法包括使该细胞与如实施例58-90、93或94中任一项所述的基因修饰系统或编码该基因修饰系统的DNA接触，从而修饰该细胞中人类HBB基因中的靶位点。101. A method for modifying a target site in a human HBB gene in a cell, the method comprising contacting the cell with a gene modification system as described in any one of embodiments 58-90, 93 or 94, or a DNA encoding the gene modification system, thereby modifying the target site in the human HBB gene in the cell.

102.一种用于修饰细胞中人类HBB基因中的靶位点的方法，该方法包括使该细胞与以下接触：(i)如实施例58-90、93或94中任一项所述的模板RNA，或编码其的DNA；以及(ii)基因修饰多肽或编码基因修饰多肽的核酸，从而修饰细胞中该人类HBB基因中的靶位点。102. A method for modifying a target site in a human HBB gene in a cell, the method comprising contacting the cell with: (i) a template RNA as described in any one of embodiments 58-90, 93 or 94, or a DNA encoding the same; and (ii) a gene modifying polypeptide or a nucleic acid encoding a gene modifying polypeptide, thereby modifying the target site in the human HBB gene in the cell.

103.一种用于治疗患有与人类HBB基因突变相关的疾病或病况的受试者的方法，该方法包括向该受试者施用如实施例58-90、93或94中任一项所述的基因修饰系统或编码该基因修饰系统的DNA，从而治疗患有与该人类HBB基因突变相关的疾病或病况的受试者。103. A method for treating a subject suffering from a disease or condition associated with a human HBB gene mutation, the method comprising administering to the subject a gene modification system as described in any one of Examples 58-90, 93 or 94, or a DNA encoding the gene modification system, thereby treating the subject suffering from the disease or condition associated with the human HBB gene mutation.

104.一种用于治疗患有与人类HBB基因突变相关的疾病或病况的受试者的方法，该方法包括向该受试者施用如实施例58-90、93或94中任一项所述的模板RNA或编码其的DNA；以及(ii)基因修饰多肽或编码基因修饰多肽的核酸，从而治疗该患有与该人类HBB基因突变相关的疾病或病况的受试者。104. A method for treating a subject suffering from a disease or condition associated with a human HBB gene mutation, the method comprising administering to the subject a template RNA or a DNA encoding the same as described in any one of embodiments 58-90, 93 or 94; and (ii) a gene-modified polypeptide or a nucleic acid encoding a gene-modified polypeptide, thereby treating the subject suffering from the disease or condition associated with the human HBB gene mutation.

105.如实施例103或104所述的方法，其中该疾病或病况是镰状细胞病(SCD)(例如，镰状细胞贫血)。105. The method of embodiment 103 or 104, wherein the disease or condition is sickle cell disease (SCD) (eg, sickle cell anemia).

106.如实施例103-105中任一项所述的方法，其中该受试者具有致病性EV6突变。106. A method as described in any of Examples 103-105, wherein the subject has a pathogenic EV6 mutation.

107.一种用于治疗患有SCD的受试者的方法，该方法包括向该受试者施用如实施例58-90、93或94中任一项所述的基因修饰系统或编码该基因修饰系统的DNA，从而治疗该患有SCD的受试者。107. A method for treating a subject having SCD, the method comprising administering to the subject a gene modification system as described in any one of Examples 58-90, 93 or 94, or a DNA encoding the gene modification system, thereby treating the subject having SCD.

108.一种用于治疗患有SCD的受试者的方法，该方法包括向该受试者施用(i)如实施例58-90、93或94中任一项所述的模板RNA或编码其的DNA，以及(ii)基因修饰多肽或编码基因修饰多肽的核酸，从而治疗该患有SCD的受试者。108. A method for treating a subject with SCD, the method comprising administering to the subject (i) a template RNA or a DNA encoding the same as described in any one of embodiments 58-90, 93 or 94, and (ii) a gene-modified polypeptide or a nucleic acid encoding a gene-modified polypeptide, thereby treating the subject with SCD.

109.如前述实施例中任一项所述的基因修饰系统或方法，其中将该系统引入靶细胞可纠正该HBB基因的致病性突变。109. The gene modification system or method of any preceding embodiment, wherein introducing the system into a target cell can correct a pathogenic mutation in the HBB gene.

110.如前述实施例中任一项所述的基因修饰系统或方法，其中该致病性突变是E6V突变，并且其中该纠正包括V6E的氨基酸取代。110. The gene modification system or method of any of the preceding embodiments, wherein the pathogenic mutation is an E6V mutation, and wherein the correction comprises an amino acid substitution of V6E.

111.如前述实施例中任一项所述的基因修饰系统或方法，其中该突变的纠正发生在至少30％(例如，30％、40％、50％、60％、70％或更多)的靶核酸中。111. The gene modification system or method of any of the preceding embodiments, wherein correction of the mutation occurs in at least 30% (e.g., 30%, 40%, 50%, 60%, 70% or more) of the target nucleic acid.

112.如前述实施例中任一项所述的基因修饰系统或方法，其中该突变的纠正发生在至少30％(例如，30％、40％、50％、60％、70％或更多)的靶细胞中。112. The gene modification system or method of any of the preceding embodiments, wherein correction of the mutation occurs in at least 30% (e.g., 30%, 40%, 50%, 60%, 70% or more) of the target cells.

113.如前述实施例中任一项所述的基因修饰系统或方法，其中该基因修饰系统包含第二链靶向性gRNA，并且其中相对于用包含模板RNA而无第二链靶向性gRNA的基因修饰系统处理的靶细胞群体，靶细胞群体中突变的纠正增加。113. A gene modification system or method as described in any of the preceding embodiments, wherein the gene modification system comprises a second strand targeting gRNA, and wherein correction of mutations in the target cell population is increased relative to a target cell population treated with a gene modification system comprising a template RNA but without a second strand targeting gRNA.

114.如前述实施例中任一项所述的基因修饰系统或方法，其中该模板RNA包含一个或多个沉默取代(例如，如表7A、X4和X4A中例示的)，并且其中相对于用包含不含有一个或多个沉默取代的模板RNA的基因修饰系统处理的靶细胞群体，靶细胞群体中突变的纠正增加。114. The gene modification system or method of any of the preceding embodiments, wherein the template RNA comprises one or more silent substitutions (e.g., as exemplified in Tables 7A, X4, and X4A), and wherein correction of mutations in the target cell population is increased relative to a target cell population treated with a gene modification system comprising a template RNA that does not contain the one or more silent substitutions.

115.如前述实施例中任一项所述的方法，其中该细胞是哺乳动物细胞，例如人细胞。115. The method of any one of the preceding embodiments, wherein the cell is a mammalian cell, eg, a human cell.

116.如前述实施例中任一项所述的方法，其中该受试者是人。116. The method of any one of the preceding embodiments, wherein the subject is a human.

117.如前述实施例中任一项所述的方法，其中该接触离体发生，例如，其中该细胞或受试者的DNA离体修饰。117. The method of any one of the preceding embodiments, wherein the contacting occurs ex vivo, e.g., wherein the DNA of the cell or subject is modified ex vivo.

118.如前述实施例中任一项所述的方法，其中该接触在体内发生，例如，其中该细胞或受试者的DNA在体内修饰。118. The method of any preceding embodiment, wherein the contacting occurs in vivo, e.g., wherein the DNA of the cell or subject is modified in vivo.

119.如前述实施例中任一项所述的方法，其中使该细胞或该受试者与该系统接触包括在允许产生该基因修饰多肽的条件下使该细胞或该受试者体内的细胞与编码该基因修饰多肽的核酸(例如DNA或RNA)接触。119. The method of any of the preceding embodiments, wherein contacting the cell or the subject with the system comprises contacting the cell or a cell in the subject with a nucleic acid (e.g., DNA or RNA) encoding the genetically modified polypeptide under conditions that allow production of the genetically modified polypeptide.

120.如前述实施例中任一项所述的方法，其中该gRNA间隔子在所有核苷酸位置处与该细胞中人类HBB基因的第一部分完全互补，其中该第一部分位于该HBB基因的第二链上。120. A method as described in any of the preceding embodiments, wherein the gRNA spacer is fully complementary to the first portion of the human HBB gene in the cell at all nucleotide positions, wherein the first portion is located on the second strand of the HBB gene.

121.如前述实施例中任一项所述的方法，其中该异源对象序列在除突变区以外的所有核苷酸位置处与该细胞中人类HBB基因的第二部分完全互补，其中该第二部分位于该HBB基因的第一链上。121. The method of any one of the preceding embodiments, wherein the heterologous subject sequence is fully complementary to a second portion of the human HBB gene in the cell at all nucleotide positions except the mutation region, wherein the second portion is located on the first strand of the HBB gene.

122.如前述实施例中任一项所述的方法，其中该PBS序列与该人类HBB基因的第三部分完全互补，其中该第三部分位于该HBB基因的第一链上。122. The method of any one of the preceding embodiments, wherein the PBS sequence is fully complementary to a third portion of the human HBB gene, wherein the third portion is located on the first strand of the HBB gene.

进一步列举的实施例Further Examples

A1.一种模板RNA，其从5’至3’包含：A1. A template RNA comprising from 5' to 3':

(i)与人类HBB基因的第一部分互补的gRNA间隔子，其中该gRNA间隔子具有包含CATGGTGCATCTGACTCCTG(SEQ ID NO：21668)的核苷酸序列，或相对于其具有1个取代的核苷酸序列；(i) a gRNA spacer complementary to the first part of the human HBB gene, wherein the gRNA spacer has a nucleotide sequence comprising CATGGTGCATCTGACTCCTG (SEQ ID NO: 21668), or a nucleotide sequence having 1 substitution relative thereto;

(ii)与基因修饰多肽的Cas结构域结合的gRNA支架；(ii) a gRNA scaffold that binds to the Cas domain of the gene modifying polypeptide;

(iii)异源对象序列，其包含用于纠正该人类HBB基因的第二部分中突变的突变区，以及(iii) a heterologous subject sequence comprising a mutation region for correcting a mutation in the second portion of the human HBB gene, and

(iv)引物结合位点(PBS)序列，其包含与该人类HBB基因的第三部分具有100％同一性的至少5个碱基。(iv) a primer binding site (PBS) sequence comprising at least 5 bases having 100% identity with the third portion of the human HBB gene.

A2.如实施例A1所述的模板RNA，其中该gRNA间隔子具有包含CATGGTGCATCTGACTCCTG(SEQ ID NO：21668)或CATGGTGCACCTGACTCCTG(SEQ ID NO：19249)的核苷酸序列。A2. The template RNA as described in Example A1, wherein the gRNA spacer has a nucleotide sequence comprising CATGGTGCATCTGACTCCTG (SEQ ID NO: 21668) or CATGGTGCACCTGACTCCTG (SEQ ID NO: 19249).

A3.如实施例A1或A2所述的模板RNA，其中该gRNA间隔子具有由CATGGTGCATCTGACTCCTG(SEQ ID NO：21668)或CATGGTGCACCTGACTCCTG(SEQ ID NO：19249)组成的核苷酸序列。A3. The template RNA as described in Example A1 or A2, wherein the gRNA spacer has a nucleotide sequence consisting of CATGGTGCATCTGACTCCTG (SEQ ID NO: 21668) or CATGGTGCACCTGACTCCTG (SEQ ID NO: 19249).

A4.如前述实施例中任一项所述的模板RNA，其中该gRNA间隔子的长度为20个核苷酸。A4. The template RNA as described in any of the preceding embodiments, wherein the gRNA spacer has a length of 20 nucleotides.

A5.如实施例A1所述的模板RNA，其中该gRNA支架具有根据GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC(SEQ ID NO：11,012)的序列，或与其具有至少90％同一性的序列。A5. The template RNA as described in Example A1, wherein the gRNA scaffold has a sequence according to GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC (SEQ ID NO: 11,012), or a sequence with at least 90% identity thereto.

A6.如实施例A1所述的模板RNA，其中该gRNA支架具有根据GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC(SEQ ID NO：11,012)的序列。A6. The template RNA as described in Example A1, wherein the gRNA scaffold has a sequence according to GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC (SEQ ID NO: 11,012).

A7.如实施例A1所述的模板RNA，其中该异源对象序列包含来自根据AGTAACGGCAGACTTCTCTTCAG(SEQ ID NO：20954)的序列的3’端的至少8个核苷酸的序列，或相对于其具有1、2或3个取代的序列。A7. The template RNA as described in Example A1, wherein the heterologous object sequence comprises a sequence of at least 8 nucleotides from the 3' end of the sequence according to AGTAACGGCAGACTTCTCTTCAG (SEQ ID NO: 20954), or a sequence having 1, 2 or 3 substitutions therefrom.

A8.如实施例A1所述的模板RNA，其中该异源对象序列包含来自根据AGTAACGGCAGACTTCTCTTCAG(SEQ ID NO：20954)的序列的3’端的9、10、11、12、13、14、15、16、17、18、19、20、21、22或23个核苷酸的序列，或相对于其具有1、2或3个取代的序列。A8. The template RNA of embodiment A1, wherein the heterologous object sequence comprises a sequence of 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 or 23 nucleotides from the 3' end of the sequence according to AGTAACGGCAGACTTCTCTTCAG (SEQ ID NO: 20954), or a sequence having 1, 2 or 3 substitutions therefrom.

A9.如实施例A1所述的模板RNA，其中该异源对象序列包含来自根据AGTAACGGCAGACTTCTCTTCAG(SEQ ID NO：20954)的序列的3’端的至少8个核苷酸的序列。A9. The template RNA as described in Example A1, wherein the heterologous object sequence comprises a sequence of at least 8 nucleotides from the 3' end of the sequence according to AGTAACGGCAGACTTCTCTTCAG (SEQ ID NO: 20954).

A10.如实施例A1所述的模板RNA，其中该异源对象序列包含来自根据AGTAACGGCAGACTTCTCTGCAG(SEQ ID NO：20955)的序列的3’端的至少8个核苷酸的序列。A10. The template RNA as described in embodiment A1, wherein the heterologous object sequence comprises a sequence of at least 8 nucleotides from the 3' end of the sequence according to AGTAACGGCAGACTTCTCTGCAG (SEQ ID NO: 20955).

A11.如实施例A1所述的模板RNA，其中该PBS序列包含来自根据GAGTCAGGTGCACCATG(SEQ ID NO：19431)的序列的5’端的至少8个核苷酸的序列，或相对于其具有1个取代的序列。A11. The template RNA as described in Example A1, wherein the PBS sequence comprises a sequence of at least 8 nucleotides from the 5' end of the sequence according to GAGTCAGGTGCACCATG (SEQ ID NO: 19431), or a sequence having 1 substitution relative thereto.

A12.如实施例A1所述的模板RNA，其中该PBS序列包含来自根据GAGTCAGGTGCACCATG(SEQ ID NO：19431)的序列的5’端的至少8个核苷酸的序列。A12. The template RNA as described in Example A1, wherein the PBS sequence comprises a sequence of at least 8 nucleotides from the 5' end of the sequence according to GAGTCAGGTGCACCATG (SEQ ID NO: 19431).

A13.如实施例A1所述的模板RNA，其中该PBS序列包含来自根据GAGTCAGGTGCACCATG(SEQ ID NO：19431)的序列的5’端的9、10、11、12、13、14、15、16或17个核苷酸的序列，或相对于其具有1个取代的序列。A13. The template RNA as described in Example A1, wherein the PBS sequence comprises a sequence of 9, 10, 11, 12, 13, 14, 15, 16 or 17 nucleotides from the 5' end of the sequence according to GAGTCAGGTGCACCATG (SEQ ID NO: 19431), or a sequence having 1 substitution relative thereto.

A14.如实施例A1所述的模板RNA，其中A14. The template RNA as described in Example A1, wherein

该gRNA支架具有根据GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC(SEQ ID NO：11,012)的序列，或与其具有至少90％同一性的序列；the gRNA scaffold having a sequence according to GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC (SEQ ID NO: 11,012), or a sequence at least 90% identical thereto;

该异源对象序列包含来自根据AGTAACGGCAGACTTCTCTTCAG(SEQ ID NO：20954)的序列的3’端的至少8个核苷酸的序列，或相对于其具有1、2或3个取代的序列；以及The heterologous subject sequence comprises a sequence of at least 8 nucleotides from the 3' end of the sequence according to AGTAACGGCAGACTTCTCTTCAG (SEQ ID NO: 20954), or a sequence having 1, 2 or 3 substitutions therewith; and

该PBS序列包含来自根据GAGTCAGGTGCACCATG(SEQ ID NO：19431)的序列的5’端的至少8个核苷酸的序列，或相对于其具有1个取代的序列。The PBS sequence comprises a sequence of at least 8 nucleotides from the 5' end of the sequence according to GAGTCAGGTGCACCATG (SEQ ID NO: 19431), or a sequence having 1 substitution therewith.

A15.如实施例A1所述的模板RNA，其中A15. The template RNA as described in Example A1, wherein

该gRNA支架具有根据GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC(SEQ ID NO：11,012)的序列。The gRNA scaffold had a sequence according to GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC (SEQ ID NO: 11,012).

其中该异源对象序列包含来自根据AGTAACGGCAGACTTCTCTTCAG(SEQ ID NO：20954)的序列的3’端的至少8个核苷酸的序列；并且wherein the heterologous subject sequence comprises a sequence of at least 8 nucleotides from the 3' end of the sequence according to AGTAACGGCAGACTTCTCTTCAG (SEQ ID NO: 20954); and

该PBS序列包含来自根据GAGTCAGGTGCACCATG(SEQ ID NO：19431)的序列的5’端的至少8个核苷酸的序列。The PBS sequence comprises a sequence of at least 8 nucleotides from the 5' end of the sequence according to GAGTCAGGTGCACCATG (SEQ ID NO: 19431).

A16.如前述实施例中任一项所述的模板RNA，其不包含根据GCATGGTGCACCTGACTCCTGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCAGACTTCTCCACAGGAGTCAGGTGCAC(SEQ ID NO：XXX)的序列。A16. The template RNA as described in any of the preceding embodiments, which does not comprise a sequence according to GCATGGTGCACCTGACTCCTGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCAGACTTCTCCACAGGAGTCAGGTGCAC (SEQ ID NO: XXX).

A17.一种模板RNA，其从5’至3’包含：A17. A template RNA comprising from 5' to 3':

(i)与人类HBB基因的第一部分互补的gRNA间隔子，其中该gRNA间隔子具有包含GTAACGGCAGACTTCTCCAC(SEQ ID NO：19971)的核苷酸序列，或相对于其具有1个取代的核苷酸序列；(i) a gRNA spacer complementary to the first part of the human HBB gene, wherein the gRNA spacer has a nucleotide sequence comprising GTAACGGCAGACTTCTCCAC (SEQ ID NO: 19971), or a nucleotide sequence having 1 substitution relative thereto;

(ii)与基因修饰多肽的Cas结构域结合的gRNA支架；(ii) a gRNA scaffold that binds to the Cas domain of a gene modifying polypeptide;

(iii)异源对象序列，其包含用于将突变引入该人类HBB基因的第二部分的突变区，以及(iii) a heterologous subject sequence comprising a mutation region for introducing a mutation into the second portion of the human HBB gene, and

A18.如实施例A17所述的模板RNA，其中该gRNA间隔子具有包含GTAACGGCAGACTTCTCCAC(SEQ ID NO：19971)的核苷酸序列。A18. The template RNA as described in Example A17, wherein the gRNA spacer has a nucleotide sequence comprising GTAACGGCAGACTTCTCCAC (SEQ ID NO: 19971).

A19.如实施例A17或A18所述的模板RNA，其中该gRNA支架具有根据GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC(SEQ ID NO：11,012)的序列，或与其具有至少90％同一性的序列。A19. The template RNA of embodiment A17 or A18, wherein the gRNA scaffold has a sequence according to GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC (SEQ ID NO: 11,012), or a sequence having at least 90% identity thereto.

A20.如实施例A17-19中任一项所述的模板RNA，其中该gRNA支架具有根据GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTC CGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC(SEQID NO：11,012)的序列。A20. The template RNA of any one of embodiments A17-19, wherein the gRNA scaffold has a sequence according to GTTTTAGGCTAGAAATAGCAAGTTAAAATAAGGCTAGTC CGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC (SEQ ID NO: 11,012).

A21.如实施例A17-20中任一项所述的模板RNA，其中该异源对象序列包含来自根据CCATGGTGCACCTGACTCCTGAG(SEQ ID NO：20956)或CCATGGTGCACCTGACTCCTGCG(SEQ IDNO：21906)的序列的3’端的至少8个核苷酸的序列，或相对于其具有1、2或3个取代的序列。A21. The template RNA of any one of embodiments A17-20, wherein the heterologous object sequence comprises a sequence of at least 8 nucleotides from the 3' end of the sequence according to CCATGGTGCACCTGACTCCTGAG (SEQ ID NO: 20956) or CCATGGTGCACCTGACTCCTGCG (SEQ ID NO: 21906), or a sequence having 1, 2 or 3 substitutions therefrom.

A22.如实施例A17-21中任一项所述的模板RNA，其中该异源对象序列包含来自根据CCATGGTGCACCTGACTCCTGAG(SEQ ID NO：20956)或CCATGGTGCACCTGACTCCTGCG(SEQ IDNO：21906)的序列的3’端的9、10、11、12、13、14、15、16、17、18、19、20、21、22或23个核苷酸的序列，或相对于其具有1、2或3个取代的序列。A22. The template RNA of any one of embodiments A17-21, wherein the heterologous subject sequence comprises a sequence of 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 or 23 nucleotides from the 3' end of the sequence according to CCATGGTGCACCTGACTCCTGAG (SEQ ID NO: 20956) or CCATGGTGCACCTGACTCCTGCG (SEQ ID NO: 21906), or a sequence having 1, 2 or 3 substitutions therefrom.

A23.如实施例A17-22中任一项所述的模板RNA，其中该异源对象序列包含来自根据CCATGGTGCACCTGACTCCTGAG(SEQ ID NO：20956)或CCATGGTGCACCTGACTCCTGCG(SEQ IDNO：21906)的序列的3’端的至少8个核苷酸的序列。A23. The template RNA of any one of embodiments A17-22, wherein the heterologous object sequence comprises a sequence of at least 8 nucleotides from the 3' end of the sequence according to CCATGGTGCACCTGACTCCTGAG (SEQ ID NO: 20956) or CCATGGTGCACCTGACTCCTGCG (SEQ ID NO: 21906).

A24.如实施例A17-23中任一项所述的模板RNA，其中该异源对象序列包含来自根据CCATGGTGCACCTGACTCCTGAG(SEQ ID NO：20956)或CCATGGTGCACCTGACTCCTGCG(SEQ IDNO：21906)的序列的3’端的9、10、11、12、13、14、15、16、17、18、19、20、21、22或23个核苷酸的序列。A24. The template RNA of any one of embodiments A17-23, wherein the heterologous object sequence comprises a sequence of 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 or 23 nucleotides from the 3' end of the sequence according to CCATGGTGCACCTGACTCCTGAG (SEQ ID NO: 20956) or CCATGGTGCACCTGACTCCTGCG (SEQ ID NO: 21906).

A25.如实施例A17-24中任一项所述的模板RNA，其中该PBS序列包含来自根据GAGAAGTCTGCCGTTAC(SEQ ID NO：20957)的序列的5’端的至少8个核苷酸的序列，或相对于其具有1个取代的序列。A25. The template RNA of any one of embodiments A17-24, wherein the PBS sequence comprises a sequence of at least 8 nucleotides from the 5' end of the sequence according to GAGAAGTCTGCCGTTAC (SEQ ID NO: 20957), or a sequence having 1 substitution relative thereto.

A26.如实施例A17-25中任一项所述的模板RNA，其中该PBS序列包含来自根据GAGAAGTCTGCCGTTAC(SEQ ID NO：20957)的序列的5’端的至少8个核苷酸的序列。A26. The template RNA of any one of embodiments A17-25, wherein the PBS sequence comprises a sequence of at least 8 nucleotides from the 5' end of the sequence according to GAGAAGTCTGCCGTTAC (SEQ ID NO: 20957).

A27.如实施例A17-26中任一项所述的模板RNA，其中该PBS序列包含来自根据GAGAAGTCTGCCGTTAC(SEQ ID NO：20957)的序列的5’端的9、10、11、12、13、14、15、16或17个核苷酸的序列，或相对于其具有1个取代的序列。A27. The template RNA of any one of embodiments A17-26, wherein the PBS sequence comprises a sequence of 9, 10, 11, 12, 13, 14, 15, 16 or 17 nucleotides from the 5' end of the sequence according to GAGAAGTCTGCCGTTAC (SEQ ID NO: 20957), or a sequence having 1 substitution relative thereto.

A28.如实施例A17-27中任一项所述的模板RNA，其中该PBS序列包含来自根据GAGAAGTCTGCCGTTAC(SEQ ID NO：20957)的序列的5’端的9、10、11、12、13、14、15、16或17个核苷酸的序列。A28. The template RNA of any one of embodiments A17-27, wherein the PBS sequence comprises a sequence of 9, 10, 11, 12, 13, 14, 15, 16 or 17 nucleotides from the 5' end of the sequence according to GAGAAGTCTGCCGTTAC (SEQ ID NO: 20957).

A29.如实施例A17-28中任一项所述的模板RNA，其不包含根据GTAACGGCAGACTTCTCCACGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCGACTCCTGaGGAGAAGTCTGCC(SEQ ID NO：YYY)的序列。A29. The template RNA of any one of embodiments A17-28, which does not comprise a sequence according to GTAACGGCAGACTTCTCCACGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCGACTCCTGaGGAGAAGTCTGCC (SEQ ID NO: YYY).

A30.如前述实施例中任一项所述的模板RNA，其中该突变区包含单核苷酸。A30. The template RNA of any one of the preceding embodiments, wherein the mutation region comprises a single nucleotide.

A31.如前述实施例中任一项所述的模板RNA，其中该突变区长度为至少两个核苷酸。A31. The template RNA of any one of the preceding embodiments, wherein the mutation region is at least two nucleotides in length.

A32.如前述实施例中任一项所述的模板RNA，其中该突变区长度为高达20个核苷酸并且相对于该人类HBB基因的第二部分包含一个、两个或三个序列差异。A32. The template RNA of any of the preceding embodiments, wherein the mutation region is up to 20 nucleotides in length and comprises one, two or three sequence differences relative to the second part of the human HBB gene.

A33.如前述实施例中任一项所述的模板RNA，其中该突变区包含设计用于纠正该HBB基因中的致病性突变的第一区和设计用于灭活PAM序列的第二区。A33. The template RNA as described in any of the preceding embodiments, wherein the mutation region comprises a first region designed to correct the pathogenic mutation in the HBB gene and a second region designed to inactivate the PAM sequence.

A34.如前述实施例中任一项所述的模板RNA，其中该突变区包含设计用于纠正该HBB基因中的致病性突变的第一区和设计用于引入沉默取代的第二区。A34. The template RNA as described in any of the preceding embodiments, wherein the mutation region comprises a first region designed to correct a pathogenic mutation in the HBB gene and a second region designed to introduce a silent substitution.

A35.如前述实施例中任一项所述的模板RNA，其被配置为编辑该人类HBB基因中的E6V突变。A35. The template RNA of any of the preceding embodiments, which is configured to edit the E6V mutation in the human HBB gene.

A36.如实施例A35所述的模板RNA，其被配置为将E6V突变转化为谷氨酰胺或丙氨酸。A36. The template RNA of embodiment A35, which is configured to convert the E6V mutation to glutamine or alanine.

A37.如前述实施例中任一项所述的模板RNA，其包含一个或多个化学修饰的核苷酸。A37. The template RNA of any one of the preceding embodiments, comprising one or more chemically modified nucleotides.

A38.一种基因修饰系统，其包含：A38. A gene modification system comprising:

如前述实施例中任一项所述的模板RNA，以及The template RNA as described in any one of the preceding embodiments, and

基因修饰多肽或编码该基因修饰多肽的核酸。A gene-modified polypeptide or a nucleic acid encoding the gene-modified polypeptide.

A39.如实施例A38所述的基因修饰系统，其中该基因修饰多肽包含RT结构域，该结构域具有根据SEQ ID NO：8,003的序列，或与其具有至少70％、80％、85％、90％、95％、98％或99％同一性的序列。A39. A gene modification system as described in embodiment A38, wherein the gene modification polypeptide comprises an RT domain having a sequence according to SEQ ID NO: 8,003, or a sequence having at least 70%, 80%, 85%, 90%, 95%, 98% or 99% identity thereto.

A40.如实施例A38所述的基因修饰系统，其中该基因修饰多肽包含RT结构域，该结构域具有根据SEQ ID NO：8,020的序列，或与其具有至少70％、80％、85％、90％、95％、98％或99％同一性的序列。A40. The gene modification system of embodiment A38, wherein the gene modification polypeptide comprises an RT domain having a sequence according to SEQ ID NO: 8,020, or a sequence having at least 70%, 80%, 85%, 90%, 95%, 98% or 99% identity thereto.

A41.如实施例A38所述的基因修饰系统，其中该基因修饰多肽包含RT结构域，该结构域具有根据SEQ ID NO：8,074的序列，或与其具有至少70％、80％、85％、90％、95％、98％或99％同一性的序列。A41. A gene modification system as described in embodiment A38, wherein the gene modification polypeptide comprises an RT domain having a sequence according to SEQ ID NO: 8,074, or a sequence having at least 70%, 80%, 85%, 90%, 95%, 98% or 99% identity thereto.

A42.如实施例A38所述的基因修饰系统，其中该基因修饰多肽包含RT结构域，该结构域具有根据SEQ ID NO：8,113的序列，或与其具有至少70％、80％、85％、90％、95％、98％或99％同一性的序列。A42. A gene modification system as described in embodiment A38, wherein the gene modification polypeptide comprises an RT domain having a sequence according to SEQ ID NO: 8,113, or a sequence having at least 70%, 80%, 85%, 90%, 95%, 98% or 99% identity thereto.

A43.如实施例A38所述的基因修饰系统，其中该基因修饰多肽包含DNA结合结构域，该结构域具有包含N863A突变的Cas9切口酶的序列，例如根据SEQ ID NO：11,096的序列，或与其具有至少70％、80％、85％、90％、95％、98％或99％同一性的序列。A43. A gene modification system as described in embodiment A38, wherein the gene modification polypeptide comprises a DNA binding domain having the sequence of a Cas9 nickase comprising an N863A mutation, such as a sequence according to SEQ ID NO: 11,096, or a sequence having at least 70%, 80%, 85%, 90%, 95%, 98% or 99% identity thereto.

A44.如实施例A38所述的基因修饰系统，其在该人类HBB基因的第一链中产生第一切口。A44. A gene modification system as described in Example A38, which produces a first nick in the first strand of the human HBB gene.

A45.如实施例A44所述的基因修饰系统，其进一步包含第二链靶向性gRNA，该第二链靶向性gRNA将第二切口引导至该人类HBB基因的第二链。A45. The gene modification system of embodiment A44, further comprising a second strand targeting gRNA, which directs the second cut to the second strand of the human HBB gene.

A46.如实施例A45所述的基因修饰系统，其中该第一切口和该第二切口相隔80-120个核苷酸。A46. The gene modification system of embodiment A45, wherein the first nick and the second nick are separated by 80-120 nucleotides.

A47.如实施例A45所述的基因修饰系统，其中该模板RNA和该第二链靶向性gRNA被配置为产生向外切口取向。A47. A gene modification system as described in Example A45, wherein the template RNA and the second-strand targeting gRNA are configured to produce an outward incision orientation.

A48.如实施例A45所述的基因修饰系统，其中该第二链靶向性gRNA包含与具有镰状细胞病突变、野生型序列或Makassar变体的人类HBB基因互补的间隔子序列。A48. A gene modification system as described in embodiment A45, wherein the second strand targeting gRNA comprises a spacer sequence complementary to a human HBB gene having a sickle cell disease mutation, a wild-type sequence, or a Makassar variant.

A49.一种用于修饰细胞中人类HBB基因中的靶位点的方法，该方法包括使该细胞与如实施例38所述的基因修饰系统接触，从而修饰该细胞中人类HBB基因中的靶位点。A49. A method for modifying a target site in a human HBB gene in a cell, the method comprising contacting the cell with a gene modification system as described in Example 38, thereby modifying the target site in the human HBB gene in the cell.

A50.如实施例A49所述的方法，其中该突变的纠正发生在至少30％的靶核酸中。A50. A method as described in embodiment A49, wherein correction of the mutation occurs in at least 30% of the target nucleic acid.

A51.一种用于治疗患有与人类HBB基因突变相关的疾病或病况的受试者的方法，其中该疾病或病况是镰状细胞病(SCD)，该方法包括向该受试者施用如实施例38所述的基因修饰系统，从而治疗患有与该人类HBB基因突变相关的疾病或病况的受试者。A51. A method for treating a subject having a disease or condition associated with a human HBB gene mutation, wherein the disease or condition is sickle cell disease (SCD), the method comprising administering to the subject a gene modification system as described in Example 38, thereby treating the subject having the disease or condition associated with the human HBB gene mutation.

A52.一种模板RNA，其从5’至3’包含：A52. A template RNA comprising from 5' to 3':

(i)与人类HBB基因的第一部分互补的gRNA间隔子，其中该gRNA间隔子具有包含表1的gRNA间隔子序列的核心核苷酸的核苷酸序列，并且任选地包含从该gRNA间隔子的侧翼核苷酸的3’端开始的一个或多个连续核苷酸，或相对于该核苷酸序列具有1、2或3个取代的核苷酸序列；(i) a gRNA spacer complementary to the first part of the human HBB gene, wherein the gRNA spacer has a nucleotide sequence comprising the core nucleotides of the gRNA spacer sequence of Table 1, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the gRNA spacer, or a nucleotide sequence having 1, 2 or 3 substitutions relative to the nucleotide sequence;

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

本专利或申请文件包含至少一个彩色附图。应请求并且支付必要的费用后，具有彩色附图的本专利或专利申请公开的副本将由专利局提供。The patent or application file contains at least one drawing drawn in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.

图1描绘了如本文所述的基因修饰系统。左手图显示基因修饰多肽，其包含通过接头连接的Cas切口酶结构域(例如，spCas9 N863A)和逆转录酶结构域(RT结构域)。右手图显示模板RNA，其从5′至3′包含gRNA间隔子、gRNA支架、异源对象序列和引物结合位点序列(PBS序列)。异源对象序列可以包含突变区，该突变区相对于靶位点包含一个或多个序列差异。异源对象序列还可以包含位于该突变区侧翼的编辑前同源区和编辑后同源区。不希望受理论束缚，认为模板RNA的gRNA间隔子结合基因组中靶位点的第二链，且模板RNA的gRNA支架结合基因修饰多肽，例如将该基因修饰多肽定位于基因组中的靶位点。认为基因修饰多肽的Cas结构域在靶位点(例如，靶位点的第一链)产生切口，例如，允许PBS序列与跟靶位点的第一链上待改变位点相邻的序列结合。认为基因修饰多肽的RT结构域使用与互补序列(包含模板RNA的PBS序列)结合的靶位点的第一链作为引物，并以模板RNA的异源对象序列作为模板，例如聚合与异源对象序列互补的序列。不希望受理论束缚，认为逆转录可随后通过编辑前同源区、然后通过突变区、然后通过编辑后同源区进行，从而产生包含异源对象序列所指定突变的DNA链。FIG. 1 depicts a gene modification system as described herein. The left-hand figure shows a gene modification polypeptide comprising a Cas nickase domain (e.g., spCas9 N863A) and a reverse transcriptase domain (RT domain) connected by a linker. The right-hand figure shows a template RNA, which comprises a gRNA spacer, a gRNA scaffold, a heterologous object sequence, and a primer binding site sequence (PBS sequence) from 5′ to 3′. The heterologous object sequence may comprise a mutation region comprising one or more sequence differences relative to the target site. The heterologous object sequence may also comprise a pre-editing homology region and a post-editing homology region flanking the mutation region. Without wishing to be bound by theory, it is believed that the gRNA spacer of the template RNA binds to the second strand of the target site in the genome, and the gRNA scaffold of the template RNA binds to the gene modification polypeptide, for example, positioning the gene modification polypeptide at the target site in the genome. It is believed that the Cas domain of the gene modification polypeptide produces a nick at the target site (e.g., the first strand of the target site), for example, allowing the PBS sequence to bind to a sequence adjacent to the site to be changed on the first strand of the target site. It is believed that the RT domain of the gene modifying polypeptide uses the first strand of the target site that binds to the complementary sequence (comprising the PBS sequence of the template RNA) as a primer and the heterologous subject sequence of the template RNA as a template, for example, polymerizing a sequence complementary to the heterologous subject sequence. Without wishing to be bound by theory, it is believed that reverse transcription can then proceed through the editing pre-homologous region, then through the mutation region, and then through the editing post-homologous region, thereby generating a DNA strand comprising the mutation specified by the heterologous subject sequence.

图2中一对图显示在包含基因修饰多肽、各种模板RNA的基因修饰系统转染之后，293T细胞(左分图)以及CD34+原代人HSC中的重写水平。FIG2 is a pair of graphs showing the rewriting levels in 293T cells (left panel) and CD34+ primary human HSCs after transfection with a gene modification system comprising a gene modification polypeptide and various template RNAs.

图3中一对图显示在包含基因修饰多肽、各种模板RNA的基因修饰系统转染之后，293T细胞(左分图)以及CD34+原代人HSC中的重写水平。FIG3 is a pair of graphs showing the rewriting levels in 293T cells (left panel) and CD34+ primary human HSCs after transfection with a gene modification system comprising a gene modification polypeptide and various template RNAs.

图4中图显示在用包含具有或不具有第二切口的tgRNA14的基因修饰系统进行电穿孔之后在原代人成纤维细胞中的编辑百分比。FIG. 4 shows graphs showing the editing percentage in primary human fibroblasts after electroporation with a gene modification system comprising tgRNA14 with or without a second nick.

图5中图显示在用包含具有或不具有第二切口的tgRNA14的基因修饰系统进行电穿孔之后在野生型人原代成纤维细胞(用于安装Makassar突变)以及镰状人原代成纤维细胞(用于安装野生型序列)中的编辑百分比。5 shows the percentage of editing in wild-type human primary fibroblasts (for installation of the Makassar mutation) and sickle human primary fibroblasts (for installation of the wild-type sequence) after electroporation with a gene modification system comprising tgRNA14 with or without a second nick.

图6中图显示使用RNAV209-013或RNAV214-040基因修饰多肽与所指示模板RNA实现的重写百分比。The graph in Figure 6 shows the percentage of rewriting achieved using RNAV209-013 or RNAV214-040 gene modifying polypeptides with the indicated template RNAs.

图7中图显示当模板RNA与RNAV209-013或RNAV214-040基因修饰多肽一起使用时，相对于野生型的Fah mRNA的量。The graph in FIG7 shows the amount of Fah mRNA relative to the wild type when the template RNA is used with the RNAV209-013 or RNAV214-040 gene modifying polypeptide.

图8中图显示给予含有各种基因修饰多肽和模板RNA的LNP后6小时Cas9阳性肝细胞百分比。The middle graph of FIG8 shows the percentage of Cas9-positive hepatocytes 6 hours after administration of LNPs containing various gene-modifying polypeptides and template RNA.

图9中图显示给予含有各种基因修饰多肽和模板RNA的LNP后6天肝样品中的重写水平。FIG. 9 shows the graph showing the rewriting levels in liver samples 6 days after administration of LNPs containing various gene-modifying polypeptides and template RNA.

图10中图显示给予含有各种基因修饰多肽和模板RNA的LNP后，与同窝杂合小鼠相比，肝样品中的野生型Fah mRNA恢复。The middle graph of FIG. 10 shows that after administration of LNPs containing various gene-modified polypeptides and template RNA, wild-type Fah mRNA was restored in liver samples compared to heterozygous littermates.

图11中图显示给予含有各种基因修饰多肽和模板RNA的LNP后肝样品中的Fah蛋白分布。The graph in Figure 11 shows the distribution of Fah protein in liver samples after administration of LNPs containing various gene-modified polypeptides and template RNA.

图12是一系列蛋白质印迹，显示输注Cas9-RT mRNA+TTR指导LNP 6小时后Cas9-RT表达。每条泳道代表一只动物个体，其中每条泳道添加20μg组织匀浆。阳性对照来自表达Cas9-RT的体外细胞实验(先前所述)。使用GAPDH作为每个样品的负载对照。每组n＝4，媒介物或经处理的。Figure 12 is a series of Western blots showing Cas9-RT expression 6 hours after infusion of Cas9-RT mRNA + TTR guide LNP. Each lane represents an individual animal, with 20 μg of tissue homogenate added to each lane. The positive control is from an in vitro cell experiment expressing Cas9-RT (previously described). GAPDH was used as a loading control for each sample. n = 4 per group, vehicle or treated.

图13中图显示用Cas9-RT mRNA+TTR指导LNP处理后，TTR基因座的基因编辑。通过对原间隔子靶向的TTR基因座的桑格测序(Sanger sequencing)进行TIDE分析，测量在TTR基因座检测到的插入缺失水平。Figure 13 shows gene editing of the TTR locus after treatment with Cas9-RT mRNA+TTR-guided LNPs. The level of indels detected at the TTR locus was measured by TIDE analysis of Sanger sequencing of the protospacer-targeted TTR locus.

图14中图显示用Cas9-RT mRNA+TTR指导LNP处理后，TTR血清水平下降。用包封Cas9-RT+TTR指导RNA的LNP处理小鼠5天后测量循环TTR水平。The graph in Figure 14 shows that TTR serum levels decreased after treatment with Cas9-RT mRNA + TTR guide LNPs. Circulating TTR levels were measured 5 days after treatment of mice with LNPs encapsulating Cas9-RT + TTR guide RNA.

图15中图显示输注Cas9-RT mRNA+TTR指导LNP后的Cas9-RT表达。通过ProteinSimple Jess毛细管电泳蛋白质印迹定量相对表达。符号中的数字为组中动物的数量。媒介物n＝2，Cas9-RT+TTR指导n＝3。Figure 15, middle panel shows Cas9-RT expression after infusion of Cas9-RT mRNA + TTR guide LNPs. Relative expression was quantified by ProteinSimple Jess capillary electrophoresis western blot. Numbers in symbols are the number of animals in the group. Vehicle n=2, Cas9-RT + TTR guide n=3.

图16中图显示输注Cas9-RT mRNA+TTR指导LNP后，TTR基因座的基因编辑。通过对原间隔子靶向的TTR基因座的扩增子测序，测量在TTR基因座检测到的插入缺失水平。每只动物的肝脏都进行了8次不同的活检，其中扩增子测序测量了显示插入缺失的读段百分比。Figure 16 shows gene editing of the TTR locus after infusion of Cas9-RT mRNA + TTR guide LNP. The level of indels detected at the TTR locus was measured by amplicon sequencing of the protospacer-targeted TTR locus. The liver of each animal was biopsied 8 different times, where amplicon sequencing measured the percentage of reads showing indels.

图17中图显示在经各种基因修饰多肽及模板RNA转染之后原代人HSC中的平均完美重写水平。FIG. 17 , in which a graph is shown, shows the average perfect rewriting levels in primary human HSCs after transfection with various gene modifying polypeptides and template RNA.

图18A和18B中图显示在经各种基因修饰多肽及包含HBB5间隔子(图18A)或HBB8间隔子(图18B)的模板RNA转染之后原代人HSC中的平均完美重写水平。The graphs in Figures 18A and 18B show the average perfect rewriting levels in primary human HSCs after transfection with various gene modifying polypeptides and template RNA comprising an HBB5 spacer (Figure 18A) or an HBB8 spacer (Figure 18B).

图19A和19B中热图(图19A)以及图(图19B)显示在经各种基因修饰多肽及包含HBB5间隔子(图19A)或HBB8间隔子(图19B)的模板RNA转染之后原代人HSC中的平均完美重写水平。Figures 19A and 19B are heat maps (Figure 19A) and graphs (Figure 19B) showing average perfect rewriting levels in primary human HSCs following transfection with various gene modifying polypeptides and template RNA comprising either the HBB5 spacer (Figure 19A) or the HBB8 spacer (Figure 19B).

图20A-20C中图显示在经各种基因修饰多肽以及包含HBB5间隔子(图20A和图20C)或HBB8间隔子(图20B)的模板RNA转染之后原代人HSC中的平均完美重写水平。Figures 20A-20C show the average perfect rewriting levels in primary human HSCs after transfection with various gene modifying polypeptides and template RNA comprising an HBB5 spacer (Figures 20A and 20C) or an HBB8 spacer (Figure 20B).

图21A和21B中一对图显示在经各种基因修饰多肽及模板RNA转染之后原代人HSC中的完美重写水平(图21A)以及HSC亚群百分比(图21B)。Figures 21A and 21B are a pair of graphs showing the perfect rewriting level in primary human HSCs (Figure 21A) and the percentage of HSC subpopulations (Figure 21B) after transfection with various gene modifying polypeptides and template RNA.

图22A和22B中图显示在经各种基因修饰多肽以及模板RNA转染之后在原代人HSC亚群中的完美重写水平。Figures 22A and 22B show perfect reprogramming levels in primary human HSC subsets following transfection with various gene modifying polypeptides and template RNA.

图23A-23C中图显示在经各种基因修饰多肽以及模板RNA转染之后的总集落数目(图23A)、集落数目(图23B)以及去核CD235+细胞百分比(图23C)。The graphs in Figures 23A-23C show the total colony number (Figure 23A), colony number (Figure 23B), and percentage of enucleated CD235+ cells (Figure 23C) after transfection with various gene-modifying polypeptides and template RNA.

具体实施方式DETAILED DESCRIPTION

定义definition

如本文所用，术语“表达盒”是指包含足以表达本发明的核酸分子的核酸元件的核酸构建体。As used herein, the term "expression cassette" refers to a nucleic acid construct comprising nucleic acid elements sufficient to express a nucleic acid molecule of the present invention.

如本文所用，“gRNA间隔子”是指与靶核酸具有互补性并且可以与gRNA支架一起将Cas蛋白靶向靶核酸的核酸部分。As used herein, "gRNA spacer" refers to a nucleic acid portion that has complementarity with a target nucleic acid and that, together with a gRNA scaffold, can target a Cas protein to a target nucleic acid.

如本文所用，“gRNA支架”是指可以结合Cas蛋白并且可以与gRNA间隔子一起将Cas蛋白靶向靶核酸的核酸部分。在一些实施例中，gRNA支架包含crRNA序列、四环和tracrRNA序列。As used herein, "gRNA scaffold" refers to a nucleic acid portion that can bind to a Cas protein and can target the Cas protein to a target nucleic acid together with a gRNA spacer. In some embodiments, the gRNA scaffold comprises a crRNA sequence, a tetraloop, and a tracrRNA sequence.

如本文所用，“基因修饰多肽”是指包含逆转录病毒逆转录酶的多肽，或包含与逆转录病毒逆转录酶具有至少75％、80％、85％、90％、95％、96％、97％、98％或99％氨基酸序列同一性的氨基酸序列的多肽，其能够将核酸序列(例如，模板核酸上提供的序列)整合到靶DNA分子中(例如，在哺乳动物宿主细胞中，如宿主细胞中的基因组DNA分子)。在一些实施例中，基因修饰多肽能够在基本上不依赖宿主机器的情况下整合序列。在一些实施例中，基因修饰多肽将序列整合到基因组中的随机位置，并且在一些实施例中，基因修饰多肽将序列整合到特定靶位点。在一些实施例中，基因修饰多肽包含一个或多个结构域，它们共同促进1)结合模板核酸，2)结合靶DNA分子，和3)促进模板核酸的至少一部分的整合进入靶DNA。基因修饰多肽包括天然存在的多肽以及前述多肽的工程改造变体，例如，这些变体相对于天然存在的序列具有一个或多个氨基酸取代。基因修饰多肽还包括异源构建体，例如，其中一个或多个上述结构域彼此异源，无论是通过在其他方面是野生型的结构域的异源融合(或其他缀合物)，以及经修饰的结构域的融合，例如，通过异源子结构域或其他经取代的结构域的替代或融合。可用于本文提供的方法的示例性基因修饰多肽、包含它们的系统以及使用它们的方法例如描述于PCT/US2021/020948，其关于包含逆转录病毒逆转录酶结构域的基因修饰多肽通过援引并入本文。在一些实施例中，基因修饰多肽将序列整合到基因中。在一些实施例中，基因修饰多肽将序列整合到基因外的序列中。如本文所用，“基因修饰系统”是指包含基因修饰多肽和模板核酸的系统。As used herein, "genetically modified polypeptide" refers to a polypeptide comprising a retroviral reverse transcriptase, or a polypeptide comprising an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% amino acid sequence identity with a retroviral reverse transcriptase, which is capable of integrating a nucleic acid sequence (e.g., a sequence provided on a template nucleic acid) into a target DNA molecule (e.g., in a mammalian host cell, such as a genomic DNA molecule in a host cell). In some embodiments, the gene-modified polypeptide is capable of integrating a sequence substantially independent of the host machinery. In some embodiments, the gene-modified polypeptide integrates the sequence into a random position in the genome, and in some embodiments, the gene-modified polypeptide integrates the sequence into a specific target site. In some embodiments, the gene-modified polypeptide comprises one or more domains that together promote 1) binding to the template nucleic acid, 2) binding to the target DNA molecule, and 3) promoting the integration of at least a portion of the template nucleic acid into the target DNA. Genetically modified polypeptides include naturally occurring polypeptides and engineered variants of the aforementioned polypeptides, for example, these variants have one or more amino acid substitutions relative to the naturally occurring sequence. Genetically modified polypeptides also include heterologous constructs, for example, wherein one or more of the above-mentioned domains are heterologous to each other, whether by heterologous fusions (or other conjugates) of domains that are otherwise wild-type, and fusions of modified domains, for example, by replacement or fusion of heterologous subdomains or other substituted domains. Exemplary genetically modified polypeptides that can be used for the methods provided herein, systems comprising them, and methods using them are described, for example, in PCT/US2021/020948, which are incorporated herein by reference with respect to genetically modified polypeptides comprising retroviral reverse transcriptase domains. In some embodiments, the genetically modified polypeptide integrates a sequence into a gene. In some embodiments, the genetically modified polypeptide integrates a sequence into a sequence outside a gene. As used herein, "genetically modified system" refers to a system comprising a genetically modified polypeptide and a template nucleic acid.

如本文所用，术语“结构域”是指有助于生物分子的特定功能的生物分子的结构。结构域可以包含生物分子的连续区域(例如，连续序列)或不同的非连续区域(例如，非连续序列)。蛋白质结构域的实例包括但不限于核酸内切酶结构域、DNA结合结构域、逆转录结构域；核酸的结构域的实例是调节结构域，例如转录因子结合结构域。在一些实施例中，结构域(例如，Cas结构域)可以包含两个或更多个较小的结构域(例如，DNA结合结构域和核酸内切酶结构域)。As used herein, the term "domain" refers to the structure of a biomolecule that contributes to a specific function of the biomolecule. A domain may comprise a continuous region (e.g., a continuous sequence) or different non-continuous regions (e.g., a non-continuous sequence) of a biomolecule. Examples of protein domains include, but are not limited to, endonuclease domains, DNA binding domains, reverse transcription domains; examples of domains of nucleic acids are regulatory domains, such as transcription factor binding domains. In some embodiments, a domain (e.g., a Cas domain) may comprise two or more smaller domains (e.g., a DNA binding domain and an endonuclease domain).

如本文所用，术语“外源”当相对于生物分子(例如核酸序列或多肽)使用时，意指通过人工将生物分子引入宿主基因组、细胞或生物中。例如，使用重组DNA技术或其他方法添加到现有基因组、细胞、组织或受试者中的核酸对于现有核酸序列、细胞、组织或受试者而言是外源的。As used herein, the term "exogenous" when used with respect to a biomolecule (e.g., a nucleic acid sequence or a polypeptide) means that the biomolecule is artificially introduced into a host genome, cell, or organism. For example, a nucleic acid added to an existing genome, cell, tissue, or subject using recombinant DNA technology or other methods is exogenous to the existing nucleic acid sequence, cell, tissue, or subject.

如本文所用，用于描述靶DNA的单个DNA链的“第一链”和“第二链”基于逆转录酶结构域启动聚合的链来区分两条DNA链，例如，基于靶引发的合成启动的地方。第一链是指靶DNA的链，逆转录酶结构域在该链上启动聚合，例如，在靶引发的合成启动的地方。第二链是指靶DNA的另一条链。第一和第二链名称在其他方面没有描述靶位点DNA链；例如，在一些实施例中，第一链和第二链被本文所述的多肽切口，但“第一”和“第二”链的名称与此类切口发生的顺序无关。As used herein, "first strand" and "second strand" used to describe a single DNA strand of a target DNA distinguish the two DNA strands based on the strand on which the reverse transcriptase domain initiates polymerization, e.g., based on where target-triggered synthesis is initiated. The first strand refers to the strand of the target DNA on which the reverse transcriptase domain initiates polymerization, e.g., where target-triggered synthesis is initiated. The second strand refers to the other strand of the target DNA. The first and second strand designations are otherwise not descriptive of the target site DNA strands; for example, in some embodiments, the first and second strands are nicked by a polypeptide described herein, but the designations of the "first" and "second" chains are not related to the order in which such nicking occurs.

当本文用于参考第二元件来描述第一元件时，术语“异源”意指第一元件和第二元件在自然界中不以如所描述的布置存在。例如，异源多肽、核酸分子、构建体或序列是指(a)对于表达其的细胞而言不是天然的多肽、核酸分子或多肽或核酸分子序列的一部分，(b)相对于其天然状态已发生改变或突变的多肽或核酸分子或多肽或核酸分子的一部分，或(c)具有与在类似条件下的天然表达水平相比改变的表达的多肽或核酸分子。例如，异源调节序列(例如启动子、增强子)可以用于调节基因或核酸分子的表达，其方式不同于基因或核酸分子通常在自然界中表达的方式。在另一个实例中，多肽或核酸序列的异源结构域(例如，多肽的DNA结合结构域或编码多肽的DNA结合结构域的核酸)可以相对于其他结构域布置，或者可以是不同的序列或相对于多肽的其他结构域或部分或其编码核酸来自不同来源。在某些实施例中，异源核酸分子可以存在于天然宿主细胞基因组中，但是可以具有改变的表达水平或具有不同的序列或两者。在其他实施例中，异源核酸分子对于宿主细胞或宿主基因组可能不是内源的，而是通过转化(例如，转染、电穿孔)引入宿主细胞的，其中所添加的分子可以整合到宿主基因组中，或可以作为染色体外遗传材料短暂存在(例如，mRNA)或半稳定存在超过一代(例如，游离病毒载体、质粒或其他自我复制载体)。When used herein to describe a first element with reference to a second element, the term "heterologous" means that the first element and the second element do not exist in the arrangement as described in nature. For example, a heterologous polypeptide, nucleic acid molecule, construct or sequence refers to a polypeptide, nucleic acid molecule or a part of a polypeptide or nucleic acid molecule sequence that is not natural (a) for a cell expressing it, (b) a polypeptide or nucleic acid molecule or a part of a polypeptide or nucleic acid molecule that has been changed or mutated relative to its natural state, or (c) a polypeptide or nucleic acid molecule with expression that is changed compared to the natural expression level under similar conditions. For example, a heterologous regulatory sequence (e.g., a promoter, an enhancer) can be used to regulate the expression of a gene or nucleic acid molecule in a manner different from that in which a gene or nucleic acid molecule is usually expressed in nature. In another example, a heterologous domain of a polypeptide or nucleic acid sequence (e.g., a DNA binding domain of a polypeptide or a nucleic acid encoding a DNA binding domain of a polypeptide) can be arranged relative to other domains, or can be a different sequence or relative to other domains or parts of a polypeptide or its encoding nucleic acid from different sources. In certain embodiments, the heterologous nucleic acid molecule may be present in the native host cell genome, but may have an altered expression level or have a different sequence or both. In other embodiments, the heterologous nucleic acid molecule may not be endogenous to the host cell or host genome, but is introduced into the host cell by transformation (e.g., transfection, electroporation), wherein the added molecule may be integrated into the host genome, or may exist transiently (e.g., mRNA) or semi-stably for more than one generation (e.g., episomal viral vectors, plasmids or other self-replicating vectors) as extrachromosomal genetic material.

如本文所用，将序列“插入”靶位点是指在靶位点处DNA序列的净添加，例如，在未编辑的靶位点中没有同源位置的异源对象序列中存在新的核苷酸的情况。在一些实施例中，PBS序列和异源对象序列与靶核酸序列的核苷酸比对将导致靶核酸序列中的比对空位。As used herein, "insertion" of a sequence into a target site refers to a net addition of a DNA sequence at the target site, e.g., where there are new nucleotides in the heterologous subject sequence that have no homologous position in the unedited target site. In some embodiments, nucleotide alignment of the PBS sequence and the heterologous subject sequence with the target nucleic acid sequence will result in alignment gaps in the target nucleic acid sequence.

如本文所用，异源对象序列在靶位点产生的“缺失”是指靶位点处DNA序列的净缺失，例如，在异源对象序列中没有同源位置的未编辑的靶位点中存在核苷酸的情况。在一些实施例中，PBS序列和异源对象序列与靶核酸序列的核苷酸比对将导致包含PBS序列和异源对象序列的分子中出现比对空位。As used herein, a "deletion" of a heterologous subject sequence at a target site refers to a net loss of a DNA sequence at the target site, e.g., a nucleotide present in the unedited target site that has no homologous position in the heterologous subject sequence. In some embodiments, nucleotide alignment of a PBS sequence and a heterologous subject sequence with a target nucleic acid sequence will result in alignment gaps in the molecule comprising the PBS sequence and the heterologous subject sequence.

如本文所用，术语“反向末端重复序列”或“ITR”是指AAV病毒顺式元件，因其对称性而如此命名。这些元件促进AAV基因组的有效倍增。假设ITR功能的最小元件是Rep结合位点(RBS；5′-GCGCGCTCGCTCGCTC-3′，对于AAV2；SEQ ID NO：4601)和末端解离位点(TRS；5′-AGTTGG-3′，对于AAV2；SEQ ID NO：4602)加上允许发夹形成的可变回文序列。根据本发明，ITR至少包含这三个元件(RBS、TRS和允许形成发夹的序列)。此外，在本发明中，术语“ITR”是指已知天然AAV血清型的ITR(例如血清型1、2、3、4、5、6、7、8、9、10或11AAV的ITR)、由源自不同血清型的ITR元件融合形成的嵌合ITR，及其功能变体。“功能变体”是指与已知ITR具有至少80％、85％、90％、优选至少95％序列同一性的序列，允许包含所述ITR的序列在Rep蛋白存在下倍增。As used herein, the term "inverted terminal repeat" or "ITR" refers to the cis-elements of the AAV virus, so named because of its symmetry. These elements promote efficient multiplication of the AAV genome. The minimal elements hypothesized to function as ITRs are the Rep binding site (RBS; 5'-GCGCGCTCGCTCGCTC-3', for AAV2; SEQ ID NO: 4601) and the terminal dissociation site (TRS; 5'-AGTTGG-3', for AAV2; SEQ ID NO: 4602) plus a variable palindromic sequence that allows hairpin formation. According to the present invention, ITRs contain at least these three elements (RBS, TRS, and a sequence that allows hairpin formation). In addition, in the present invention, the term "ITR" refers to ITRs of known natural AAV serotypes (e.g., ITRs of serotypes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11 AAV), chimeric ITRs formed by fusion of ITR elements derived from different serotypes, and functional variants thereof. "Functional variant" refers to a sequence having at least 80%, 85%, 90%, preferably at least 95% sequence identity with a known ITR, allowing the sequence comprising said ITR to multiply in the presence of Rep protein.

如本文所用，术语“突变区”是指模板RNA中相对于靶核酸中的对应序列具有一个或多个序列差异的区域。序列差异可以包括例如取代、插入、移码或缺失。As used herein, the term "mutation region" refers to a region in a template RNA that has one or more sequence differences relative to a corresponding sequence in a target nucleic acid. The sequence difference may include, for example, a substitution, insertion, frameshift, or deletion.

当应用于核酸序列时，术语“突变的”意指与参考(例如天然)核酸序列相比，核酸序列中的核苷酸被插入、缺失或改变。可以在基因座处进行单个改变(点突变)，或者可以在单个基因座处插入、缺失或改变多个核苷酸。另外，可以在核酸序列内的任何数目的基因座处进行一个或多个改变。核酸序列可以通过本领域已知的任何方法进行突变。When applied to nucleic acid sequences, the term "mutated" means that the nucleotides in the nucleic acid sequence are inserted, deleted or changed compared to a reference (e.g., natural) nucleic acid sequence. A single change (point mutation) can be made at the locus, or multiple nucleotides can be inserted, deleted or changed at a single locus. In addition, one or more changes can be made at any number of loci within the nucleic acid sequence. Nucleotide sequences can be mutated by any method known in the art.

核酸分子是指RNA和DNA分子两者，包括但不限于互补DNA(“cDNA”)、基因组DNA(“gDNA”)和信使RNA(“mRNA”)，并且还包括合成的核酸分子，例如化学合成或重组产生的核酸分子，例如如本文所述的RNA模板。核酸分子可以是双链或单链、环状或线性的。如果是单链，则核酸分子可以是有义链或反义链。除非另外指示，并且作为本文中以通用格式“SEQID NO：”所述的所有序列的实例，“包含SEQ ID NO：1的核酸”是指具有(i)SEQ ID NO：1的序列或(ii)与SEQ ID NO：1互补的序列的核酸、至少一部分。两者之间的选择取决于使用SEQID NO：1的上下文。例如，如果将核酸用作探针，则两者之间的选择取决于探针与期望的靶互补的要求。如本领域技术人员将容易理解的，本披露的核酸序列可以经化学或生物化学修饰或可以含有非天然或衍生的核苷酸碱基。此类修饰包括例如标签，甲基化，用类似物取代一个或多个天然存在的核苷酸，核苷酸间修饰，例如不带电荷的连接(例如，甲基膦酸酯、磷酸三酯、氨基磷酸酯、氨基甲酸酯等)、带电荷的连接(例如，硫代磷酸酯、二硫代磷酸酯等)，侧链部分(例如，多肽)，嵌入剂(例如，吖啶、补骨脂素等)，螯合剂，烷基化剂和经修饰的连接(例如，α异头核酸等等)。还包括化学修饰的碱基(参见例如表13)、主链(参见例如表14)和经修饰的帽(参见例如表15)。还包括合成的分子，它们模拟多核苷酸经由氢键和其他化学相互作用与指定序列结合的能力。此类分子是本领域已知的，并且包括例如其中肽连接替代分子主链中的磷酸连接的那些，例如肽核酸(PNA)。其他修饰可以包括，例如，其中核糖环含有桥接部分或其他结构(例如在“锁”核酸(LNA)中发现的修饰)的类似物。在各个实施例中，核酸与另外的遗传元件(例如一个或多个组织特异性表达控制序列(例如，组织特异性启动子和组织特异性微小RNA识别序列))以及另外的元件(例如反向重复序列(例如，反向末端重复序列，例如来自或源自病毒的元件，例如，AAV ITR)和串联重复序列、反向重复序列/直接重复序列、同源区(与靶DNA具有不同同源程度的区段)、非翻译区(UTR)(5′、3′或5′和3′UTR))以及前述的各种组合可操作地关联。本发明提供的系统的核酸元件能以多种拓扑结构提供，包括单链、双链、环状、线性、具有开放末端的线性、具有封闭末端的线性，以及这些的特定版本，例如狗骨DNA(doggybone DNA，dbDNA)、封闭末端DNA(ceDNA)。Nucleic acid molecules refer to both RNA and DNA molecules, including but not limited to complementary DNA ("cDNA"), genomic DNA ("gDNA"), and messenger RNA ("mRNA"), and also include synthetic nucleic acid molecules, such as chemically synthesized or recombinantly produced nucleic acid molecules, such as RNA templates as described herein. Nucleic acid molecules can be double-stranded or single-stranded, circular or linear. If single-stranded, the nucleic acid molecule can be a sense strand or an antisense strand. Unless otherwise indicated, and as an example of all sequences described herein in the general format "SEQ ID NO:", "a nucleic acid comprising SEQ ID NO: 1" refers to a nucleic acid, at least a portion thereof, having (i) a sequence of SEQ ID NO: 1 or (ii) a sequence complementary to SEQ ID NO: 1. The choice between the two depends on the context in which SEQ ID NO: 1 is used. For example, if the nucleic acid is used as a probe, the choice between the two depends on the requirement that the probe is complementary to the desired target. As will be readily appreciated by those skilled in the art, the nucleic acid sequences disclosed herein may be chemically or biochemically modified or may contain non-natural or derived nucleotide bases. Such modifications include, for example, tags, methylation, substitution of one or more naturally occurring nucleotides with analogs, internucleotide modifications, such as uncharged connections (e.g., methylphosphonates, phosphotriesters, phosphoramidates, carbamates, etc.), charged connections (e.g., phosphorothioates, phosphorodithioates, etc.), side chain moieties (e.g., polypeptides), intercalators (e.g., acridine, psoralen, etc.), chelators, alkylating agents, and modified connections (e.g., α-anomeric nucleic acids, etc.). Chemically modified bases (see, e.g., Table 13), backbones (see, e.g., Table 14), and modified caps (see, e.g., Table 15) are also included. Also included are synthetic molecules that simulate the ability of polynucleotides to bind to a specified sequence via hydrogen bonds and other chemical interactions. Such molecules are known in the art and include, for example, those in which peptide connections replace phosphate connections in the backbone of the molecule, such as peptide nucleic acids (PNAs). Other modifications may include, for example, analogs in which the ribose ring contains a bridging moiety or other structure (e.g., modifications found in "locked" nucleic acids (LNAs)). In various embodiments, the nucleic acid is operably associated with additional genetic elements, such as one or more tissue-specific expression control sequences (e.g., tissue-specific promoters and tissue-specific microRNA recognition sequences) and additional elements, such as inverted repeats (e.g., inverted terminal repeats, such as elements from or derived from viruses, such as AAV ITRs) and tandem repeats, inverted repeats/direct repeats, homology regions (segments with varying degrees of homology to the target DNA), untranslated regions (UTRs) (5′, 3′, or 5′ and 3′ UTRs), and various combinations of the foregoing. The nucleic acid elements of the systems provided by the present invention can be provided in a variety of topological structures, including single-stranded, double-stranded, circular, linear, linear with open ends, linear with closed ends, and special versions of these, such as doggybone DNA (dbDNA), closed end DNA (ceDNA).

如本文所用，“基因表达单元”是核酸序列，其包含与至少一个效应子序列可操作地连接的至少一个调节核酸序列。当第一核酸序列被放置成与第二核酸序列有功能关系时，该第一核酸序列与该第二核酸序列可操作地连接。例如，如果启动子或增强子影响编码序列的转录或表达，则该启动子或增强子与该编码序列可操作地连接。可操作地连接的DNA序列可以是连续的或非连续的。在需要连接两个蛋白质编码区的情况下，可操作地连接的序列可以在同一阅读框中。As used herein, a "gene expression unit" is a nucleic acid sequence comprising at least one regulatory nucleic acid sequence operably linked to at least one effector sequence. A first nucleic acid sequence is operably linked to a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For example, if a promoter or enhancer affects the transcription or expression of a coding sequence, the promoter or enhancer is operably linked to the coding sequence. Operably linked DNA sequences can be continuous or non-continuous. In the case where it is necessary to connect two protein coding regions, the operably linked sequences can be in the same reading frame.

如本文所用，术语“宿主基因组”或“宿主细胞”是指已将蛋白质和/或遗传材料引入其中的细胞和/或其基因组。应当理解，这样的术语不仅旨在指特定的受试者细胞和/或基因组，而且还指这样的细胞的子代和/或这样的细胞的子代的基因组。因为由于突变或环境影响，某些修饰可能在后代中发生，所以这样的子代实际上可能与亲本细胞不同，但仍包括在如本文所用的术语“宿主细胞”的范围内。宿主基因组或宿主细胞可以是在培养物中生长的分离的细胞或细胞系，或者是从这种细胞或细胞系分离的基因组材料，或者可以是构成活组织或生物的宿主细胞或宿主基因组。在一些情况下，宿主细胞可以是动物细胞或植物细胞，例如，如本文所述。在某些情况下，宿主细胞可以是哺乳动物细胞、人细胞、禽类细胞、爬行动物细胞、牛细胞、马细胞、猪细胞、山羊细胞、绵羊细胞、鸡细胞或火鸡细胞。在某些情况下，宿主细胞可以是玉米细胞、大豆细胞、小麦细胞或稻细胞。As used herein, the term "host genome" or "host cell" refers to a cell and/or its genome into which proteins and/or genetic materials have been introduced. It should be understood that such terms are intended not only to refer to specific subject cells and/or genomes, but also to the progeny of such cells and/or the genomes of the progeny of such cells. Because some modifications may occur in offspring due to mutations or environmental influences, such progeny may actually be different from parental cells, but are still included in the scope of the term "host cell" as used herein. The host genome or host cell can be an isolated cell or cell line grown in culture, or a genomic material isolated from such a cell or cell line, or can be a host cell or host genome constituting a living tissue or organism. In some cases, the host cell can be an animal cell or a plant cell, for example, as described herein. In some cases, the host cell can be a mammalian cell, a human cell, a bird cell, a reptile cell, a cattle cell, a horse cell, a pig cell, a goat cell, a sheep cell, a chicken cell or a turkey cell. In some cases, the host cell can be a corn cell, a soybean cell, a wheat cell or a rice cell.

如本文所用，“可操作的关联”描述了两个核酸序列(例如1)启动子和2)异源对象序列)之间的功能关系，并且在这样的实例中意味着启动子和异源对象序列(例如目的基因)的取向使得在合适的条件下，启动子驱动异源对象序列的表达。例如，携带启动子和异源对象序列的模板核酸可以是单链的，例如(+)或(-)取向。该模板中启动子与异源对象序列之间的“可操作的关联”意指，无论模板核酸是否以特定状态转录，当其处于合适的状态时(例如，处于(+)取向，在需要的催化因子和NTP等存在的情况下)，就被准确地转录。可操作的关联类似地适用于其他核酸对，包括其他组织特异性表达控制序列(例如增强子、阻遏物和微小RNA识别序列)、IR/DR、ITR、UTR或同源区和异源对象序列或编码逆转录病毒RT结构域的序列。As used herein, "operable association" describes a functional relationship between two nucleic acid sequences (e.g., 1) a promoter and 2) a heterologous subject sequence), and in such instances means that the orientation of the promoter and the heterologous subject sequence (e.g., a target gene) is such that under appropriate conditions, the promoter drives the expression of the heterologous subject sequence. For example, the template nucleic acid carrying the promoter and the heterologous subject sequence can be single-stranded, such as in (+) or (-) orientation. The "operable association" between the promoter and the heterologous subject sequence in the template means that, regardless of whether the template nucleic acid is transcribed in a specific state, when it is in the appropriate state (e.g., in the (+) orientation, in the presence of the required catalytic factors and NTPs, etc.), it is accurately transcribed. Operable association is similarly applicable to other nucleic acid pairs, including other tissue-specific expression control sequences (e.g., enhancers, repressors, and microRNA recognition sequences), IR/DR, ITR, UTR, or homology regions and heterologous subject sequences or sequences encoding retroviral RT domains.

如本文所用，术语“引物结合位点序列”或“PBS序列”是指能够与靶核酸序列中包含的区域结合的模板RNA的一部分。在一些情况下，PBS序列是包含与靶核酸序列中包含的区域具有100％同一性的至少3、4、5、6、7或8个碱基的核酸序列。在一些实施例中，引物区包含与靶核酸序列中包含的区域具有100％同一性的至少5、6、7、8个碱基。不希望受理论束缚，在一些实施例中，当模板RNA包含PBS序列和异源对象序列时，PBS序列与靶核酸序列中包含的区域结合，从而允许逆转录酶结构域使用该区域作为逆转录的引物，并使用异源对象序列作为逆转录的模板。As used herein, the term "primer binding site sequence" or "PBS sequence" refers to a portion of a template RNA that is capable of binding to a region contained in a target nucleic acid sequence. In some cases, the PBS sequence is a nucleic acid sequence comprising at least 3, 4, 5, 6, 7, or 8 bases that are 100% identical to a region contained in a target nucleic acid sequence. In some embodiments, the primer region comprises at least 5, 6, 7, or 8 bases that are 100% identical to a region contained in a target nucleic acid sequence. Without wishing to be bound by theory, in some embodiments, when the template RNA comprises a PBS sequence and a heterologous object sequence, the PBS sequence binds to a region contained in a target nucleic acid sequence, thereby allowing the reverse transcriptase domain to use the region as a primer for reverse transcription and use the heterologous object sequence as a template for reverse transcription.

如本文所用，“茎环序列”是指具有足够的自互补性以形成茎-环的核酸序列(例如，RNA序列)，例如，具有的茎包含至少两个(例如，3、4、5、6、7、8、9或10个)碱基对，以及具有的环具有至少三个(例如，四个)碱基对。茎可能包含错配或凸起。As used herein, "stem-loop sequence" refers to a nucleic acid sequence (e.g., RNA sequence) having sufficient self-complementarity to form a stem-loop, e.g., having a stem comprising at least two (e.g., 3, 4, 5, 6, 7, 8, 9, or 10) base pairs, and having a loop having at least three (e.g., four) base pairs. The stem may contain mismatches or bulges.

如本文所用，“组织特异性表达控制序列”意指在靶组织中以组织特异性方式例如相对于一个或多个脱靶组织优先在一个或多个中靶组织中增加或降低包含异源对象序列的转录本水平的核酸元件。在一些实施例中，组织特异性表达控制序列优先在靶组织中以组织特异性方式例如相对于一个或多个脱靶组织优先在一个或多个中靶组织中驱动或抑制包含异源对象序列的转录本的转录、活性或半衰期。示例性组织特异性表达控制序列包括组织特异性启动子、阻遏物、增强子或其组合，以及组织特异性微小RNA识别序列。组织特异性是指中靶(期望或耐受模板核酸的表达或活性的一个或多个组织)和脱靶(不期望或不耐受模板核酸的表达或活性的一个或多个组织)。例如，相对于脱靶组织，组织特异性启动子优先驱动中靶组织中的表达。相反，相对于中靶组织，结合组织特异性微小RNA识别序列的微小RNA优先在脱靶组织中表达，从而减少脱靶组织中模板核酸的表达。因此，关于组织中相关联序列的转录、活性或半衰期，对同一组织(例如靶组织)特异的启动子和微小RNA识别序列具有不同的功能(分别促进和抑制，具有一致的表达水平，即脱靶组织中的高水平微小RNA和中靶组织中的低水平，而启动子驱动中靶组织中的高表达和脱靶组织中的低表达)。As used herein, "tissue-specific expression control sequence" means a nucleic acid element that increases or decreases the transcript level of a heterologous object sequence in a tissue-specific manner in a target tissue, for example, in one or more target tissues, relative to one or more off-target tissues. In some embodiments, a tissue-specific expression control sequence preferentially drives or inhibits the transcription, activity or half-life of a transcript containing a heterologous object sequence in a tissue-specific manner in a target tissue, for example, in one or more target tissues, relative to one or more off-target tissues. Exemplary tissue-specific expression control sequences include tissue-specific promoters, repressors, enhancers, or combinations thereof, and tissue-specific microRNA recognition sequences. Tissue-specificity refers to on-target (one or more tissues that expect or tolerate the expression or activity of a template nucleic acid) and off-target (one or more tissues that do not expect or tolerate the expression or activity of a template nucleic acid). For example, relative to off-target tissues, a tissue-specific promoter preferentially drives expression in the target tissue. On the contrary, relative to the target tissue, a microRNA that binds to a tissue-specific microRNA recognition sequence preferentially expresses in off-target tissues, thereby reducing the expression of template nucleic acids in off-target tissues. Therefore, promoters and microRNA recognition sequences specific to the same tissue (e.g., target tissue) have different functions with respect to the transcription, activity, or half-life of the associated sequences in the tissue (promotion and inhibition, respectively, with consistent expression levels, i.e., high levels of microRNA in off-target tissues and low levels in on-target tissues, while the promoter drives high expression in on-target tissues and low expression in off-target tissues).

目录Table of contents

1)引言1) Introduction

2)基因修饰系统2) Gene modification system

a)基因修饰系统的多肽组分a) Polypeptide components of the gene modification system

i)书写结构域i) Writing domain

ii)核酸内切酶结构域和DNA结合结构域ii) Endonuclease domain and DNA binding domain

(1)包含Cas结构域的基因修饰多肽(1) Gene-modifying polypeptides containing Cas domains

(2)TAL效应子和锌指核酸酶(2) TAL effectors and zinc finger nucleases

iii)接头iii) Connectors

iv)基因修饰系统的定位序列iv) Localization sequence of gene modification system

v)基因修饰多肽和系统的进化变体v) Evolutionary variants of gene-modified polypeptides and systems

vi)内含肽vi) Inteins

vii)另外的结构域vii) Additional domains

b)模板核酸b) Template nucleic acid

i)gRNA间隔子和gRNA支架i) gRNA spacer and gRNA scaffold

ii)异源对象序列ii) Heterogeneous object sequence

iii)PBS序列iii) PBS sequence

iv)示例性模板序列iv) Exemplary template sequences

c)具有诱导活性的gRNAc) gRNA with inducible activity

d)基因修饰系统中的环状RNA和核酶d) Circular RNA and ribozymes in gene modification systems

e)靶核酸位点e) Target nucleic acid site

f)第二链切口f) Second chain cut

3)组合物和系统的产生3) Generation of compositions and systems

4)治疗性应用4) Therapeutic applications

5)施用和递送5) Administration and delivery

a)组织特异性活性/施用a) Tissue-specific activity/administration

i)启动子i) Promoter

ii)微小RNAii) MicroRNA

b)病毒载体及其组分b) Viral vectors and their components

c)AAV施用c) AAV administration

d)脂质纳米颗粒d) Lipid Nanoparticles

6)试剂盒、制品和药物组合物6) Kits, products and pharmaceutical compositions

7)化学、制造和控制(CMC)7) Chemistry, Manufacturing and Controls (CMC)

引言introduction

本披露涉及用于治疗镰状细胞病(SCD)的方法和用于例如体内或体外靶向、编辑、修饰或操纵细胞、组织或受试者中DNA序列中的一个或多个位置处的DNA序列(例如，将异源对象序列插入哺乳动物基因组的靶位点)的组合物。异源对象DNA序列可以包括例如取代。The present disclosure relates to methods for treating sickle cell disease (SCD) and compositions for targeting, editing, modifying or manipulating a DNA sequence at one or more positions in a DNA sequence in a cell, tissue or subject, for example, in vivo or in vitro (e.g., inserting a heterologous subject sequence into a target site in a mammalian genome). The heterologous subject DNA sequence can include, for example, a substitution.

更特别地，本披露提供了用于治疗SCD的方法，这些方法使用基于逆转录酶的系统来改变目的基因组DNA序列，例如，通过向目的序列中插入一个或多个核苷酸、使目的序列缺失一个或多个核苷酸或取代目的序列中的一个或多个核苷酸。More particularly, the present disclosure provides methods for treating SCD that use a reverse transcriptase-based system to alter a genomic DNA sequence of interest, e.g., by inserting one or more nucleotides into, deleting one or more nucleotides from, or substituting one or more nucleotides in a sequence of interest.

本披露部分地提供了用于治疗SCD的方法，这些方法使用包含基因修饰多肽组分和模板核酸(例如，模板RNA)组分的基因修饰系统。在一些实施例中，基因修饰系统可用于将改变引入基因组中的靶位点。在一些实施例中，基因修饰多肽组分包含书写结构域(例如，逆转录酶结构域)、DNA结合结构域和核酸内切酶结构域(例如，切口酶结构域)。在一些实施例中，模板核酸(例如，模板RNA)包含结合基因组中的靶位点(例如，结合靶位点的第二链)的序列(例如，gRNA间隔子)、结合基因修饰多肽组分的序列(例如，gRNA支架)、异源对象序列和PBS序列。不希望受理论束缚，认为模板核酸(例如模板RNA)结合基因组中靶位点的第二链，并结合基因修饰多肽组分(例如，将多肽组分定位于基因组中的靶位点)。认为基因修饰多肽组分的核酸内切酶(例如，切口酶)切割靶位点(例如，靶位点的第一链)，例如，允许PBS序列与跟靶位点的第一链上待改变位点相邻的序列结合。认为多肽组分的书写结构域(例如，逆转录酶结构域)使用与包含模板核酸的PBS序列作为引物和模板核酸的异源对象序列作为模板以例如聚合与异源对象序列互补的序列的互补序列结合的靶位点的第一链。不希望受理论束缚，认为选择合适的异源对象序列可导致在靶位点处取代、缺失和/或插入一个或多个核苷酸。The present disclosure provides, in part, methods for treating SCD, which use a gene modification system comprising a gene modification polypeptide component and a template nucleic acid (e.g., template RNA) component. In some embodiments, the gene modification system can be used to introduce changes to a target site in a genome. In some embodiments, the gene modification polypeptide component comprises a writing domain (e.g., a reverse transcriptase domain), a DNA binding domain, and an endonuclease domain (e.g., a nickase domain). In some embodiments, the template nucleic acid (e.g., template RNA) comprises a sequence that binds to a target site in a genome (e.g., binding to the second strand of the target site) (e.g., a gRNA spacer), a sequence that binds to the gene modification polypeptide component (e.g., a gRNA scaffold), a heterologous object sequence, and a PBS sequence. Without wishing to be bound by theory, it is believed that the template nucleic acid (e.g., template RNA) binds to the second strand of the target site in the genome and binds to the gene modification polypeptide component (e.g., locating the polypeptide component to the target site in the genome). It is believed that the endonuclease (e.g., nickase) of the gene modification polypeptide component cuts the target site (e.g., the first strand of the target site), for example, allowing the PBS sequence to bind to a sequence adjacent to the site to be changed on the first strand of the target site. It is believed that the writing domain (e.g., reverse transcriptase domain) of the polypeptide component uses the PBS sequence comprising the template nucleic acid as a primer and the heterologous subject sequence of the template nucleic acid as a template to, for example, polymerize the first strand of the target site that binds to the complementary sequence of the sequence complementary to the heterologous subject sequence. Without wishing to be bound by theory, it is believed that selecting an appropriate heterologous subject sequence can result in substitution, deletion, and/or insertion of one or more nucleotides at the target site.

基因修饰系统Gene modification system

在一些实施例中，本文所述的基因修饰系统包含：(A)基因修饰多肽或编码该基因修饰多肽的核酸，其中该基因修饰多肽包含(i)逆转录酶结构域和(x)含有DNA结合功能的核酸内切酶结构域或(y)核酸内切酶结构域和单独的DNA结合结构域；和(B)模板RNA。在一些实施例中，基因修饰多肽作为基本上自主的蛋白质机器，能够将模板核酸序列整合到靶DNA分子中(例如，在哺乳动物宿主细胞中，例如宿主细胞中的基因组DNA分子)，基本上不依赖于宿主机器。例如，基因修饰蛋白可包含DNA结合结构域、逆转录酶结构域和核酸内切酶结构域。在一些实施例中，DNA结合功能可涉及将蛋白质引导至DNA序列(例如gRNA间隔子)的RNA组分。在其他实施例中，基因修饰多肽可包含逆转录酶结构域和核酸内切酶结构域。基因修饰系统的RNA模板元件通常与基因修饰多肽元件异源，并提供要插入(逆转录)到宿主基因组中的对象序列。在一些实施例中，基因修饰多肽能够靶向引发的逆转录。在一些实施例中，基因修饰多肽能够进行第二链合成。In some embodiments, the gene modification system described herein comprises: (A) a gene modification polypeptide or a nucleic acid encoding the gene modification polypeptide, wherein the gene modification polypeptide comprises (i) a reverse transcriptase domain and (x) a nuclease domain containing a DNA binding function or (y) an nuclease domain and a separate DNA binding domain; and (B) a template RNA. In some embodiments, the gene modification polypeptide is a substantially autonomous protein machine capable of integrating a template nucleic acid sequence into a target DNA molecule (e.g., in a mammalian host cell, such as a genomic DNA molecule in a host cell), substantially independent of the host machine. For example, a gene modification protein may comprise a DNA binding domain, a reverse transcriptase domain, and an endonuclease domain. In some embodiments, the DNA binding function may involve an RNA component that guides the protein to a DNA sequence (e.g., a gRNA spacer). In other embodiments, the gene modification polypeptide may comprise a reverse transcriptase domain and an endonuclease domain. The RNA template element of the gene modification system is typically heterologous to the gene modification polypeptide element and provides a target sequence to be inserted (reverse transcribed) into the host genome. In some embodiments, the gene modification polypeptide is capable of targeted reverse transcription. In some embodiments, the genetically modified polypeptide is capable of second strand synthesis.

在一些实施例中，基因修饰系统与第二多肽组合。在一些实施例中，第二多肽可包含核酸内切酶结构域。在一些实施例中，第二多肽可包含聚合酶结构域，例如逆转录酶结构域。在一些实施例中，第二多肽可包含DNA依赖性DNA聚合酶结构域。在一些实施例中，第二多肽有助于完成基因组编辑，例如，通过有助于第二链合成或DNA修复解离。In some embodiments, the gene modification system is combined with a second polypeptide. In some embodiments, the second polypeptide may include a nucleic acid endonuclease domain. In some embodiments, the second polypeptide may include a polymerase domain, such as a reverse transcriptase domain. In some embodiments, the second polypeptide may include a DNA-dependent DNA polymerase domain. In some embodiments, the second polypeptide helps to complete genome editing, for example, by helping second strand synthesis or DNA repair dissociation.

功能性基因修饰多肽可以由不相关的DNA结合结构域、逆转录结构域和核酸内切酶结构域构成。这种模块化结构允许组合功能性结构域，例如dCas9(DNA结合)、MMLV逆转录酶(逆转录)、FokI(核酸内切酶)。在一些实施例中，多个功能性结构域可以来自单一蛋白质，例如，Cas9或Cas9切口酶(DNA结合、核酸内切酶)。Functional gene modification polypeptides can be composed of unrelated DNA binding domains, reverse transcription domains, and endonuclease domains. This modular structure allows the combination of functional domains, such as dCas9 (DNA binding), MMLV reverse transcriptase (reverse transcription), FokI (endonuclease). In some embodiments, multiple functional domains can be from a single protein, for example, Cas9 or Cas9 nickase (DNA binding, endonuclease).

在一些实施例中，基因修饰多肽包含一个或多个结构域，它们共同促进1)结合模板核酸，2)结合靶DNA分子，和3)促进模板核酸的至少一部分的整合进入靶DNA。在一些实施例中，基因修饰多肽是工程改造的多肽，例如，相对于天然存在的序列具有一个或多个氨基酸取代。在一些实施例中，基因修饰多肽包含两个或更多个相对于彼此异源的结构域，例如，通过在其他方面是野生型的结构域的异源融合(或其他缀合物)，或经修饰的结构域的融合，例如，通过异源子结构域或其他经取代的结构域的替代或融合。例如，在一些实施例中，以下中的一项或多项：该RT结构域与该DBD异源；该DBD与该核酸内切酶结构域异源；或该RT结构域与该核酸内切酶结构域异源。In some embodiments, the genetically modified polypeptide comprises one or more domains that collectively promote 1) binding to the template nucleic acid, 2) binding to the target DNA molecule, and 3) promoting the integration of at least a portion of the template nucleic acid into the target DNA. In some embodiments, the genetically modified polypeptide is an engineered polypeptide, for example, having one or more amino acid substitutions relative to a naturally occurring sequence. In some embodiments, the genetically modified polypeptide comprises two or more domains that are heterologous to each other, for example, by heterologous fusion (or other conjugates) of domains that are otherwise wild-type, or fusions of modified domains, for example, by replacement or fusion of heterologous subdomains or other substituted domains. For example, in some embodiments, one or more of the following: the RT domain is heterologous to the DBD; the DBD is heterologous to the endonuclease domain; or the RT domain is heterologous to the endonuclease domain.

在一些实施例中，用于该系统中的模板RNA分子从5′至3′包含(1)gRNA间隔子；(2)gRNA支架；(3)异源对象序列；(4)引物结合位点(PBS)序列。在一些实施例中：In some embodiments, the template RNA molecule used in the system comprises from 5′ to 3′ (1) a gRNA spacer; (2) a gRNA scaffold; (3) a heterologous target sequence; and (4) a primer binding site (PBS) sequence. In some embodiments:

(1)是约18-22nt(例如，20nt)的gRNA间隔子(1) is a gRNA spacer of about 18-22 nt (e.g., 20 nt)

(2)是包含一个或多个发夹环(例如，1、2、或3个环)的gRNA支架，用于使模板与Cas结构域例如切口酶Cas9结构域相关联。在一些实施例中，gRNA支架从5′到3′包含序列GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCC(SEQID NO：5008)。(2) is a gRNA scaffold comprising one or more hairpin loops (e.g., 1, 2, or 3 loops) for associating the template with a Cas domain, such as a nickase Cas9 domain. In some embodiments, the gRNA scaffold comprises the sequence GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCC (SEQ ID NO: 5008) from 5′ to 3′.

(3)在一些实施例中，异源对象序列长度是例如7-74，例如10-20、20-30、30-40、40-50、50-60、60-70、或70-80nt或80-90nt。在一些实施例中，序列的第一个(最5′)碱基不是C。(3) In some embodiments, the heterologous subject sequence length is, for example, 7-74, such as 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, or 70-80 nt or 80-90 nt. In some embodiments, the first (most 5′) base of the sequence is not C.

(4)在一些实施例中，在切口发生后结合靶引发序列的PBS序列是例如3-20nt，例如7-15nt，例如12-14nt。在一些实施例中，PBS序列具有40％-60％的GC含量。(4) In some embodiments, the PBS sequence that binds to the target priming sequence after nicking is, for example, 3-20 nt, for example, 7-15 nt, for example, 12-14 nt. In some embodiments, the PBS sequence has a GC content of 40%-60%.

在一些实施例中，与系统相关联的第二gRNA可能有助于驱动完全整合。在一些实施例中，第二gRNA可以靶向距第一链切口0-200nt，例如距第一链切口0-50、50-100、100-200nt的位置。在一些实施例中，第二gRNA只能在进行编辑后结合其靶序列，例如，gRNA结合存在于异源对象序列中但不存在于初始靶序列中的序列。In some embodiments, a second gRNA associated with the system may help drive full integration. In some embodiments, the second gRNA can target a position 0-200 nt from the first strand nick, such as 0-50, 50-100, 100-200 nt from the first strand nick. In some embodiments, the second gRNA can only bind to its target sequence after editing, for example, the gRNA binds to a sequence that is present in the heterologous subject sequence but not in the initial target sequence.

在一些实施例中，本文所述的基因修饰系统用于在HEK293、K562、U2OS、或HeLa细胞中进行编辑。在一些实施例中，基因修饰系统用于在原代细胞(例如，来自E18.5小鼠的原代皮层神经元)中进行编辑。In some embodiments, the gene modification system described herein is used to edit in HEK293, K562, U2OS, or HeLa cells. In some embodiments, the gene modification system is used to edit in primary cells (e.g., primary cortical neurons from E18.5 mice).

在一些实施例中，如本文所述的基因修饰多肽包含含有MoMLV RT序列或其变体的逆转录酶或RT结构域(例如，如本文所述)。在实施例中，MoMLV RT序列包含一种或多种选自以下的突变：D200N、L603W、T330P、T306K、W313F、D524G、E562Q、D583N、P51L、S67R、E67K、T197A、H204R、E302K、F309N、L435G、N454K、H594Q、D653N、R110S、和K103L。在实施例中，MoMLV RT序列包含突变(例如D200N、L603W和T330P)的组合，任选地还包括T306K和/或W313F。In some embodiments, a genetically modified polypeptide as described herein comprises a reverse transcriptase or RT domain comprising a MoMLV RT sequence or a variant thereof (e.g., as described herein). In embodiments, the MoMLV RT sequence comprises one or more mutations selected from the group consisting of D200N, L603W, T330P, T306K, W313F, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H204R, E302K, F309N, L435G, N454K, H594Q, D653N, R110S, and K103L. In embodiments, the MoMLV RT sequence comprises a combination of mutations (e.g., D200N, L603W, and T330P), optionally further comprising T306K and/or W313F.

在一些实施例中，核酸内切酶结构域(例如，如本文所述)nCas9，例如，包含N863A突变(例如，在spCas9中)或H840A突变。In some embodiments, the endonuclease domain (e.g., as described herein) nCas9, e.g., comprises a N863A mutation (e.g., in spCas9) or a H840A mutation.

在一些实施例中，异源对象序列(例如，如本文所述的系统的)长度是约1-50、50-100、100-200、200-300、300-400、400-500、500-600、600-700、700-800、800-900、900-1000或更多个核苷酸。In some embodiments, the heterologous subject sequence (e.g., of a system as described herein) is about 1-50, 50-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1000, or more nucleotides in length.

在一些实施例中，RT和核酸内切酶结构域通过柔性接头连接，例如，包含氨基酸序列SGGSSGGSSGSETPGTSESATPESSGGSSGGSS(SEQ ID NO：5006)。In some embodiments, the RT and the endonuclease domain are connected by a flexible linker, for example, comprising the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGSS (SEQ ID NO: 5006).

在一些实施例中，核酸内切酶结构域相对于RT结构域在N末端。在一些实施例中，核酸内切酶结构域相对于RT结构域在C末端。In some embodiments, the endonuclease domain is at the N-terminus relative to the RT domain. In some embodiments, the endonuclease domain is at the C-terminus relative to the RT domain.

在一些实施例中，该系统通过TPRT将异源对象序列掺入靶位点，例如，如本文所述。In some embodiments, the system incorporates the heterologous subject sequence into the target site via TPRT, eg, as described herein.

在一些实施例中，基因修饰多肽包含DNA结合结构域。在一些实施例中，基因修饰多肽包含RNA结合结构域。在一些实施例中，RNA结合结构域包含B-盒蛋白、MS2外壳蛋白、dCas、或本文表中序列的元件的RNA结合结构域。在一些实施例中，RNA结合结构域能够以比参考RNA结合结构域更大的亲和力结合模板RNA。In some embodiments, the gene-modified polypeptide comprises a DNA binding domain. In some embodiments, the gene-modified polypeptide comprises an RNA binding domain. In some embodiments, the RNA binding domain comprises an RNA binding domain of an element of a B-box protein, MS2 coat protein, dCas, or a sequence in the table herein. In some embodiments, the RNA binding domain can bind to the template RNA with greater affinity than the reference RNA binding domain.

在一些实施例中，基因修饰系统能够在靶位点中产生至少45、50、55、60、65、70、75、80、85、90、95、或100个核苷酸(并且任选地没有超过500、400、300、200或100个核苷酸)插入。在一些实施例中，基因修饰系统能够在靶位点中产生至少1、2、3、4、5、6、7、8、9、10、15、20、25、30、35、40、45、50、55、60、65、70、75、80、85、90、95、或100个核苷酸(并且任选地不超过500、400、300、200或100个核苷酸)插入。在一些实施例中，基因修饰系统能够在靶位点中产生至少0.2、0.3、0.4、0.5、0.6、0.7、0.8、0.9、1、1.5、2、2.5、3、3.5、4、4.5、5、5.5、6、6.5、7、7.5、8、8.5、9、9.5或10千碱基(并且任选地不超过1、5、10或20千碱基)插入。在一些实施例中，基因修饰系统能够产生至少81、85、90、95、100、110、120、130、140、150、160、170、180、190、或200个核苷酸(并且任选地不超过500、400、300或200个核苷酸)缺失。在一些实施例中，基因修饰系统能够产生至少81、85、90、95、100、110、120、130、140、150、160、170、180、190、或200个核苷酸(并且任选地不超过500、400、300或200个核苷酸)缺失。在一些实施例中，基因修饰系统能够产生至少1、2、3、4、5、6、7、8、9、10、15、20、25、30、35、40、45、50、55、60、65、70、75、80、85、90、95、100、110、120、130、140、150、160、170、180、190、或200个核苷酸(并且任选不超过500、400、300或200个核苷酸)缺失。在一些实施例中，基因修饰系统能够产生至少0.2、0.3、0.4、0.5、0.6、0.7、0.8、0.9、1、1.5、2、2.5、3、3.5、4、4.5、5、5.5、6、6.5、7、7.5、8、8.5、9、9.5或10千碱基(并且任选不超过1、5、10、或20千碱基)缺失。在一些实施例中，基因修饰系统能够在靶位点中产生至少1、2、3、4、5、6、7、8、9、10、15、20、25、30、35、40、45、50、60、70、80、90、或100个或更多个核苷酸取代。在一些实施例中，基因修饰系统能够在靶位点中产生1-2、2-3、3-4、4-5、5-10、10-15、15-20、20-30、30-40、40-50、50-60、60-70、70-80、80-90或90-100个核苷酸取代。In some embodiments, the genetic modification system is capable of producing at least 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotides (and optionally no more than 500, 400, 300, 200, or 100 nucleotides) insertions in the target site. In some embodiments, the genetic modification system is capable of producing at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotides (and optionally no more than 500, 400, 300, 200, or 100 nucleotides) insertions in the target site. In some embodiments, the gene modification system is capable of producing at least 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10 kilobases (and optionally no more than 1, 5, 10, or 20 kilobases) insertion in the target site. In some embodiments, the gene modification system is capable of producing at least 81, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides (and optionally no more than 500, 400, 300, or 200 nucleotides) deletion. In some embodiments, the gene modification system is capable of producing at least 81, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides (and optionally no more than 500, 400, 300, or 200 nucleotides) deletion. In some embodiments, the gene modification system is capable of producing at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides (and optionally no more than 500, 400, 300, or 200 nucleotides) deletion. In some embodiments, the gene modification system is capable of producing at least 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10 kilobases (and optionally no more than 1, 5, 10, or 20 kilobases) deletion. In some embodiments, the gene modification system is capable of producing at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 or more nucleotide substitutions in the target site. In some embodiments, the gene modification system is capable of producing 1-2, 2-3, 3-4, 4-5, 5-10, 10-15, 15-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, or 90-100 nucleotide substitutions in the target site.

在一些实施例中，取代是转位突变。在一些实施例中，取代是颠换突变。在一些实施例中，取代将腺嘌呤转化为胸腺嘧啶，腺嘌呤转化为鸟嘌呤，腺嘌呤转化为胞嘧啶，鸟嘌呤转化为胸腺嘧啶，鸟嘌呤转化为胞嘧啶，鸟嘌呤转化为腺嘌呤，胸腺嘧啶转化为胞嘧啶，胸腺嘧啶转化为腺嘌呤，胸腺嘧啶转化为鸟嘌呤，胞嘧啶转化为腺嘌呤，胞嘧啶转化为鸟嘌呤，或胞嘧啶转化为胸腺嘧啶。In some embodiments, the substitution is a transposition mutation. In some embodiments, the substitution is a transversion mutation. In some embodiments, the substitution converts adenine to thymine, adenine to guanine, adenine to cytosine, guanine to thymine, guanine to cytosine, guanine to adenine, thymine to cytosine, thymine to adenine, thymine to guanine, cytosine to adenine, cytosine to guanine, or cytosine to thymine.

在一些实施例中，插入、缺失、取代或其组合增加或减少基因的表达(例如转录或翻译)。在一些实施例中，插入、缺失、取代或其组合通过改变、添加或缺失启动子或增强子中的序列(例如结合转录因子的序列)来增加或减少基因的表达(例如转录或翻译)。在一些实施例中，插入、缺失、取代或其组合改变基因的翻译(例如改变氨基酸序列)，插入或缺失起始或终止密码子，改变或固定基因的翻译框架。在一些实施例中，插入、缺失、取代或其组合改变基因的剪接，例如通过插入、缺失或改变剪接受体或供体位点。在一些实施例中，插入、缺失、取代或其组合改变转录本或蛋白质半衰期。在一些实施例中，插入、缺失、取代或其组合改变细胞中的蛋白质定位(例如从细胞质到线粒体，从细胞质到细胞外空间(例如添加分泌标签))。在一些实施例中，插入、缺失、取代或其组合改变(例如改善)蛋白质折叠(例如以防止错误折叠蛋白质的积累)。在一些实施例中，插入、缺失、取代或其组合改变、增加、降低基因的活性，例如由基因编码的蛋白质的活性。In some embodiments, insertion, deletion, substitution, or a combination thereof increases or decreases the expression (e.g., transcription or translation) of a gene. In some embodiments, insertion, deletion, substitution, or a combination thereof increases or decreases the expression (e.g., transcription or translation) of a gene by changing, adding, or deleting a sequence in a promoter or enhancer (e.g., a sequence that binds a transcription factor). In some embodiments, insertion, deletion, substitution, or a combination thereof changes the translation of a gene (e.g., changes the amino acid sequence), inserts or deletes a start or stop codon, changes or fixes the translation frame of a gene. In some embodiments, insertion, deletion, substitution, or a combination thereof changes the splicing of a gene, for example, by inserting, deleting, or changing a splicing acceptor or donor site. In some embodiments, insertion, deletion, substitution, or a combination thereof changes transcript or protein half-life. In some embodiments, insertion, deletion, substitution, or a combination thereof changes the localization of a protein in a cell (e.g., from the cytoplasm to mitochondria, from the cytoplasm to the extracellular space (e.g., adding a secretion tag)). In some embodiments, insertion, deletion, substitution, or a combination thereof changes (e.g., improves) protein folding (e.g., to prevent the accumulation of misfolded proteins). In some embodiments, insertion, deletion, substitution, or a combination thereof changes, increases, or decreases the activity of a gene, such as the activity of a protein encoded by a gene.

示例性的基因修饰多肽、包含它们的系统以及使用它们的方法例如描述于PCT/US2021/020948，其关于逆转录病毒RT结构域(包括其中的氨基酸和核酸序列)通过援引并入本文。Exemplary gene-modifying polypeptides, systems comprising them, and methods of using them are described, for example, in PCT/US2021/020948, which is incorporated herein by reference with respect to retroviral RT domains, including the amino acid and nucleic acid sequences therein.

示例性的基因修饰多肽和逆转录病毒RT结构域序列也描述于例如2021年3月4日提交的国际申请号PCT/US21/20948，例如其中的表30、表31和表44；整个申请关于逆转录病毒RT通过援引并入本文，例如在所述序列和表中。因此，本文所述的基因修饰多肽可包含根据本段提及的任何表的氨基酸序列或其结构域(例如，逆转录病毒RT结构域)，或前述中任一个的功能片段或变体，或与其具有至少70％、80％、85％、90％、95％或99％同一性的氨基酸序列。Exemplary gene-modified polypeptides and retroviral RT domain sequences are also described in, for example, International Application No. PCT/US21/20948 filed on March 4, 2021, such as Tables 30, 31, and 44 therein; the entire application is incorporated herein by reference with respect to retroviral RT, such as in the sequences and tables. Thus, the gene-modified polypeptides described herein may comprise an amino acid sequence or a domain thereof (e.g., a retroviral RT domain) according to any table mentioned in this paragraph, or a functional fragment or variant of any of the foregoing, or an amino acid sequence having at least 70%, 80%, 85%, 90%, 95%, or 99% identity thereto.

在一些实施例中，用于本文所述的任何系统中的多肽可以是基于多个同源蛋白的对齐的多肽序列的分子重建或遗传重建。在一些实施例中，用于本文所述任何系统的逆转录酶结构域可以是分子重建或遗传重建，或者可以基于来自相同或不同来源的逆转录酶结构域的比对在特定残基处进行修饰。基于本文提供的登录号，技术人员可以例如通过使用常规序列分析工具(如基本局部比对搜索工具(BLAST)或CD-搜索(用于保守结构域分析))来比对多肽或核酸序列。可以基于共有序列创建分子重建，例如使用在Ivics等人，Cell[细胞]1997，501-510；Wagstaff等人，Molecular Biology and Evolution[分子生物学与进化]2013，88-99中描述的方法。In some embodiments, the polypeptide used in any system described herein can be a molecular reconstruction or genetic reconstruction of the aligned polypeptide sequence based on multiple homologous proteins. In some embodiments, the reverse transcriptase domain used in any system described herein can be a molecular reconstruction or genetic reconstruction, or can be modified at a specific residue based on the comparison of the reverse transcriptase domain from the same or different sources. Based on the accession number provided herein, the technician can, for example, compare polypeptides or nucleic acid sequences by using conventional sequence analysis tools such as basic local alignment search tools (BLAST) or CD-search (for conservative domain analysis). Molecular reconstruction can be created based on a consensus sequence, for example, using Ivics et al., Cell [cell] 1997, 501-510; Wagstaff et al., Molecular Biology and Evolution [molecular biology and evolution] 2013, 88-99 described in the method.

基因修饰系统的多肽组分Peptide components of gene modification systems

在一些实施例中，基因修饰多肽具有DNA靶位点结合、模板核酸(例如RNA)结合、DNA靶位点切割和模板核酸(例如RNA)书写(例如逆转录)的功能。在一些实施例中，每个功能都包含在不同的结构域内。在一些实施例中，功能可以归属于两个或更多个结构域(例如，两个或更多个结构域一起展示该功能)。在一些实施例中，两个或更多个结构域可以具有相同或相似的功能(例如，两个或更多个结构域各自独立地具有DNA结合功能，例如对于两个不同的DNA序列)。在其他实施例中，一个或多个结构域可能能够实现一种或多种功能，例如，Cas9结构域能够同时实现DNA结合和靶位点切割。在一些实施例中，这些结构域都位于单个多肽内。在一些实施例中，第一结构域在一个多肽中并且第二结构域在第二多肽中。例如，在一些实施例中，序列可以在第一多肽和第二多肽之间断裂，例如，其中第一多肽包含逆转录酶(RT)结构域并且其中第二多肽包含DNA结合结构域和核酸内切酶结构域，例如切口酶结构域。作为另一个实例，在一些实施例中，第一多肽和第二多肽各自包含DNA结合结构域(例如，第一DNA结合结构域和第二DNA结合结构域)。在一些实施例中，第一和第二多肽可以通过断裂内含肽在翻译后结合在一起以形成单个基因修饰多肽。In some embodiments, the gene modification polypeptide has the functions of DNA target site binding, template nucleic acid (e.g., RNA) binding, DNA target site cutting, and template nucleic acid (e.g., RNA) writing (e.g., reverse transcription). In some embodiments, each function is contained in different domains. In some embodiments, the function can be attributed to two or more domains (e.g., two or more domains display the function together). In some embodiments, two or more domains can have the same or similar functions (e.g., two or more domains each independently have a DNA binding function, such as for two different DNA sequences). In other embodiments, one or more domains may be able to achieve one or more functions, for example, the Cas9 domain can simultaneously achieve DNA binding and target site cutting. In some embodiments, these domains are all located in a single polypeptide. In some embodiments, the first domain is in a polypeptide and the second domain is in a second polypeptide. For example, in some embodiments, the sequence can be broken between the first polypeptide and the second polypeptide, for example, wherein the first polypeptide comprises a reverse transcriptase (RT) domain and wherein the second polypeptide comprises a DNA binding domain and an endonuclease domain, such as a nickase domain. As another example, in some embodiments, the first polypeptide and the second polypeptide each comprise a DNA binding domain (e.g., a first DNA binding domain and a second DNA binding domain). In some embodiments, the first and second polypeptides can be post-translationally bound together by splitting the intein to form a single gene-modified polypeptide.

在一些方面，本文所述的基因修饰多肽包含(例如，本文所述的系统包含基因修饰多肽，其包含)：1)Cas结构域(例如，Cas切口酶结构域，例如，Cas9切口酶结构域)；2)表D的逆转录酶(RT)结构域，或与其具有至少70％、75％、80％、85％、90％、95％、97％、98％或99％同一性的序列，其中该RT结构域位于该Cas结构域的C末端；和位于该RT结构域和该Cas结构域之间的接头，其中该接头具有来自表D中与该RT结构域同一行的序列，或者与其具有至少70％、75％、80％、85％、90％、95％、97％、98％或99％同一性的序列。In some aspects, the gene modifying polypeptide described herein comprises (e.g., the system described herein comprises a gene modifying polypeptide comprising): 1) a Cas domain (e.g., a Cas nickase domain, e.g., a Cas9 nickase domain); 2) a reverse transcriptase (RT) domain of Table D, or a sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical thereto, wherein the RT domain is located at the C-terminus of the Cas domain; and a linker located between the RT domain and the Cas domain, wherein the linker has a sequence from the same row as the RT domain in Table D, or a sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical thereto.

在一些实施例中，RT结构域具有与表D的RT结构域具有100％同一性的序列，并且接头具有来自表D中与该RT结构域同一行的接头序列具有100％同一性的序列。在一些实施例中，该Cas结构域包含表8的序列或与其具有至少70％、75％、80％、85％、90％、95％、98％或99％同一性的序列。在一些实施例中，基因修饰多肽包含根据序列表中SEQ ID NO：1-3332中任一个的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％、97％、98％或99％同一性的序列。In some embodiments, the RT domain has a sequence that is 100% identical to the RT domain of Table D, and the linker has a sequence that is 100% identical to the linker sequence from the same row as the RT domain in Table D. In some embodiments, the Cas domain comprises a sequence of Table 8, or a sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 98% or 99% identical thereto. In some embodiments, the gene modifying polypeptide comprises an amino acid sequence according to any one of SEQ ID NOs: 1-3332 in the sequence listing, or a sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% identical thereto.

在一些实施例中，基因修饰多肽包含Cas结构域和接头之间的GG氨基酸序列、RT结构域和第二NLS之间的AG氨基酸序列、和/或接头和RT结构域之间的GG氨基酸序列。在一些实施例中，基因修饰多肽包含含有第一NLS和Cas结构域的SEQ ID NO：4000的序列，或与其具有至少70％、75％、80％、85％、90％、95％、98％或99％同一性的序列。在一些实施例中，基因修饰多肽包含含有第二NLS的SEQ ID NO：4001的序列，或与其具有至少70％、75％、80％、85％、90％、95％、98％或99％同一性的序列。In some embodiments, the gene-modified polypeptide comprises a GG amino acid sequence between the Cas domain and the linker, an AG amino acid sequence between the RT domain and the second NLS, and/or a GG amino acid sequence between the linker and the RT domain. In some embodiments, the gene-modified polypeptide comprises a sequence of SEQ ID NO: 4000 containing a first NLS and a Cas domain, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 98% or 99% identity thereto. In some embodiments, the gene-modified polypeptide comprises a sequence of SEQ ID NO: 4001 containing a second NLS, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 98% or 99% identity thereto.

示例性N末端NLS-Cas9结构域Exemplary N-terminal NLS-Cas9 domains

包含NLS的示例性C末端序列Exemplary C-terminal sequences containing NLS

书写结构域(RT结构域)Writing domain (RT domain)

在本发明的某些方面，基因修饰系统的书写结构域具有逆转录酶活性，也称为逆转录酶结构域(RT结构域)。在一些实施例中，RT结构域包含RT催化部分和RNA结合区(例如，结合模板RNA的区域)。In certain aspects of the invention, the writing domain of the gene modification system has reverse transcriptase activity, also referred to as a reverse transcriptase domain (RT domain). In some embodiments, the RT domain comprises a RT catalytic portion and an RNA binding region (e.g., a region that binds to template RNA).

在一些实施例中，编码逆转录酶的核酸从其天然序列改变为具有改变的密码子使用，例如，针对人细胞进行改善。在一些实施例中，逆转录酶结构域是来自逆转录病毒的异源逆转录酶。在一些实施例中，包含基因修饰多肽的RT结构域已从其原始氨基酸序列突变，例如，具有至少1、2、3、4、5、6、7、8、9、10、20、30、40、50、60、70、80、90或100个取代。在一些实施例中，RT结构域源自逆转录病毒的RT，例如HIV-1 RT、莫洛尼鼠白血病病毒(MMLV)RT、禽成髓细胞瘤病毒(AMV)RT、或劳斯肉瘤病毒(RSV)RT。In some embodiments, the nucleic acid encoding the reverse transcriptase is changed from its native sequence to have a changed codon usage, for example, for human cells to improve. In some embodiments, the reverse transcriptase domain is a heterologous reverse transcriptase from a retrovirus. In some embodiments, the RT domain comprising a genetically modified polypeptide has been mutated from its original amino acid sequence, for example, with at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100 substitutions. In some embodiments, the RT domain is derived from the RT of a retrovirus, such as HIV-1 RT, Moloney murine leukemia virus (MMLV) RT, avian myeloblastosis virus (AMV) RT, or Rous sarcoma virus (RSV) RT.

在一些实施例中，逆转录病毒逆转录酶(RT)结构域表现出靶引发的逆转录(TPRT)起始的增强的严格性，例如，相对于内源RT结构域。在一些实施例中，当靶位点中紧邻第一链切口上游的3nt，例如引发RNA模板的基因组DNA，与RNA模板中的同源3nt具有至少66％或100％的互补性时，RT结构域启动TPRT。在一些实施例中，当模板RNA同源性和靶DNA引发逆转录之间存在少于5nt错配(例如少于1、2、3、4或5nt错配)时，RT结构域启动TPRT。在一些实施例中，修饰RT结构域使得TPRT反应引发中的错配的严格性增加，例如，其中相对于野生型(例如，未修饰的)RT结构域，RT结构域不容许任何错配或容许引发区域中更少的错配。在一些实施例中，RT结构域包含HIV-1RT结构域。在实施例中，HIV-1RT结构域启动较低水平的合成，即使相对于替代RT结构域具有三个核苷酸错配(例如，如Jamburuthugoda和Eickbush JMol Biol[分子生物学杂志]407(5)：661-672(2011)所述；其通过援引以其全文并入本文)。在一些实施例中，RT结构域形成二聚体(例如，异二聚体或同二聚体)。在一些实施例中，RT结构域是单体的。在一些实施例中，RT结构域天然地作为单体或二聚体(例如，异二聚体或同二聚体)起作用。在一些实施例中，RT结构域天然地作为单体起作用，例如，源自病毒，其中它作为单体起作用。在实施例中，RT结构域选自来自以下的RT结构域：鼠白血病病毒(MLV；有时称为MoMLV)(例如，P03355)、猪内源逆转录病毒(PERV)(例如，UniProt Q4VFZ2)、小鼠乳腺肿瘤病毒(MMTV)(例如，UniProt P03365)、禽网状内皮组织增生病病毒(AVIRE)(UniProtKB登录号：P03360)、猫白血病病毒(FLV或FeLV)(例如，例如UniProtKB登录号：P10273)、梅森-菲舍猴病毒(MPMV)(例如，UniProt P07572)、牛白血病病毒(BLV)(例如，UniProt P03361)、人T细胞白血病病毒-1(HTLV-1)(例如，UniProt P03362)、人泡沫病毒(HFV)(例如，UniProt P14350)、猿泡沫病毒(SFV)(例如，SFV3L)(例如UniProt P23074或P27401)或牛泡沫/合胞病毒(BFV/BSV)(例如UniProt O41894)，或其功能片段或变体(例如，与其具有至少70％、80％、90％、95％或99％同一性的氨基酸序列)。在一些实施例中，RT结构域在其天然功能上是二聚体。在一些实施例中，RT结构域源自病毒，其中它作为二聚体起作用。在实施例中，RT结构域选自来自以下的RT结构域：禽肉瘤/白血病病毒(ASLV)(例如，UniProtA0A142BKH1)、劳斯肉瘤病毒(RSV)(例如，UniProt P03354)、禽成髓细胞瘤病毒(AMV)(例如，UniProt Q83133)、人免疫缺陷病毒I型(HIV-1)(例如，UniProt P03369)、人免疫缺陷病毒II型(HIV-2)(例如，UniProt P15833)、猿免疫缺陷病毒(SIV)(例如，UniProtP05896)、牛免疫缺陷病毒(BIV)(例如，UniProtP19560)、马传染性贫血病毒(EIAV)(例如，UniProt P03371)或猫免疫缺陷病毒(FIV)(例如，UniProt P16088)(Herschhorn和HiziCell Mol Life Sci[细胞和分子生命科学]67(16)：2717-2747(2010))，或其功能片段或变体(例如，与其具有至少70％、80％、90％、95％或99％同一性的氨基酸序列)。在一些实施例中，天然异二聚体RT结构域也可以作为同二聚体起作用。在一些实施例中，二聚体RT结构域被表达为融合蛋白，例如，同二聚体融合蛋白或异二聚体融合蛋白。在一些实施例中，系统的RT功能由多个RT结构域实现(例如，如本文所述)。在另外的实施例中，多个RT结构域是融合的或分开的，例如，可以在相同的多肽上或在不同的多肽上。In some embodiments, the retroviral reverse transcriptase (RT) domain shows the enhanced stringency of the reverse transcription (TPRT) initiated by the target, for example, relative to the endogenous RT domain. In some embodiments, when the 3nt immediately upstream of the first strand nick in the target site, such as the genomic DNA of the triggering RNA template, has at least 66% or 100% complementarity with the homologous 3nt in the RNA template, the RT domain starts TPRT. In some embodiments, when there is less than 5nt mismatch (for example, less than 1,2,3,4 or 5nt mismatch) between the template RNA homology and the target DNA triggering reverse transcription, the RT domain starts TPRT. In some embodiments, the modification of the RT domain increases the stringency of the mismatch in the TPRT reaction initiation, for example, wherein relative to the wild-type (for example, unmodified) RT domain, the RT domain does not allow any mismatch or allows less mismatch in the triggering region. In some embodiments, the RT domain includes the HIV-1RT domain. In embodiments, the HIV-1 RT domain initiates lower levels of synthesis, even with three nucleotide mismatches relative to an alternative RT domain (e.g., as described in Jamburuthugoda and Eickbush J Mol Biol [Journal of Molecular Biology] 407(5): 661-672 (2011); which is incorporated herein by reference in its entirety). In some embodiments, the RT domain forms a dimer (e.g., a heterodimer or homodimer). In some embodiments, the RT domain is monomeric. In some embodiments, the RT domain naturally functions as a monomer or dimer (e.g., a heterodimer or homodimer). In some embodiments, the RT domain naturally functions as a monomer, e.g., derived from a virus in which it functions as a monomer. In embodiments, the RT domain is selected from the RT domains from murine leukemia virus (MLV; sometimes referred to as MoMLV) (e.g., P03355), porcine endogenous retrovirus (PERV) (e.g., UniProt Q4VFZ2), mouse mammary tumor virus (MMTV) (e.g., UniProt P03365), avian reticuloendotheliosis virus (AVIRE) (UniProtKB Accession No.: P03360), feline leukemia virus (FLV or FeLV) (e.g., UniProtKB Accession No.: P10273), Mason-Fischer monkey virus (MPMV) (e.g., UniProt P07572), bovine leukemia virus (BLV) (e.g., UniProt P03361), human T-cell leukemia virus-1 (HTLV-1) (e.g., UniProt P03362), human foamy virus (HFV) (e.g., UniProt P14350), simian foamy virus (SFV) (e.g., SFV3L) (e.g., UniProt P23074 or P27401), or bovine foamy/syncytial virus (BFV/BSV) (e.g., UniProt O41894), or a functional fragment or variant thereof (e.g., an amino acid sequence having at least 70%, 80%, 90%, 95%, or 99% identity thereto). In some embodiments, the RT domain is a dimer in its native function. In some embodiments, the RT domain is derived from a virus, wherein it functions as a dimer. In an embodiment, the RT domain is selected from an RT domain from avian sarcoma/leukemia virus (ASLV) (e.g., UniProt A0A142BKH1), Rous sarcoma virus (RSV) (e.g., UniProt P03354), avian myeloblastosis virus (AMV) (e.g., UniProt Q83133), human immunodeficiency virus type I (HIV-1) (e.g., UniProt P03369), human immunodeficiency virus type II (HIV-2) (e.g., UniProt P15833), simian immunodeficiency virus (SIV) (e.g., UniProt P05896), bovine immunodeficiency virus (BIV) (e.g., UniProt P19560), equine infectious anemia virus (EIAV) (e.g., UniProt P03371), or feline immunodeficiency virus (FIV) (e.g., UniProt P16088) (Herschhorn and HiziCell Mol Life Sci [Cellular and Molecular Life Sciences] 67(16): 2717-2747 (2010)), or a functional fragment or variant thereof (e.g., an amino acid sequence having at least 70%, 80%, 90%, 95% or 99% identity thereto). In some embodiments, the native heterodimeric RT domain can also function as a homodimer. In some embodiments, the dimeric RT domain is expressed as a fusion protein, e.g., a homodimeric fusion protein or a heterodimeric fusion protein. In some embodiments, the RT function of the system is achieved by multiple RT domains (e.g., as described herein). In other embodiments, the multiple RT domains are fused or separated, e.g., on the same polypeptide or on different polypeptides.

在一些实施例中，本文所述的基因修饰系统包含整合酶结构域，例如，其中整合酶结构域可以是RT结构域的一部分。在一些实施例中，RT结构域(例如，如本文所述)包含整合酶结构域。在一些实施例中，RT结构域(例如，如本文所述)缺少整合酶结构域，或包含已通过突变或缺失失活的整合酶结构域。在一些实施例中，本文所述的基因修饰系统包含RNA酶H结构域，例如，其中RNA酶H结构域可以是RT结构域的一部分。在一些实施例中，RNA酶H结构域不是RT结构域的一部分并且通过柔性接头共价连接。在一些实施例中，RT结构域(例如，如本文所述)包含RNA酶H结构域，例如，内源RNA酶H结构域或异源RNA酶H结构域。在一些实施例中，RT结构域(例如，如本文所述)缺少RNA酶H结构域。在一些实施例中，RT结构域(例如，如本文所述)包含异源RNA酶H结构域的添加、缺失、突变或交换的RNA酶H结构域。在一些实施例中，多肽包含灭活的内源RNA酶H结构域。在一些实施例中，从多肽的其他结构域之一中遗传去除内源RNA酶H结构域，使得它不包含在多肽中，例如，内源RNA酶H结构域从包含结构域中部分或完全截短。在一些实施例中，RNA酶H结构域的突变产生表现出更低RNA酶活性的多肽，例如，如通过Kotewicz等人Nucleic Acids Res[核酸研究]16(1)：265-277(1988)(其通过援引以其全文并入本文)描述的方法所确定的，例如与没有该突变的在其他方面类似的结构域相比降低至少10％、20％、30％、40％、50％、60％、70％、80％或90％。在一些实施例中，RNA酶H活性被消除。In some embodiments, the gene modification system described herein includes an integrase domain, for example, wherein the integrase domain can be a part of the RT domain. In some embodiments, the RT domain (for example, as described herein) includes an integrase domain. In some embodiments, the RT domain (for example, as described herein) lacks an integrase domain, or includes an integrase domain inactivated by mutation or deletion. In some embodiments, the gene modification system described herein includes an RNase H domain, for example, wherein the RNase H domain can be a part of the RT domain. In some embodiments, the RNase H domain is not a part of the RT domain and is covalently connected by a flexible joint. In some embodiments, the RT domain (for example, as described herein) includes an RNase H domain, for example, an endogenous RNase H domain or a heterologous RNase H domain. In some embodiments, the RT domain (for example, as described herein) lacks an RNase H domain. In some embodiments, the RT domain (for example, as described herein) includes an RNase H domain of addition, deletion, mutation or exchange of a heterologous RNase H domain. In some embodiments, the polypeptide includes an inactivated endogenous RNase H domain. In some embodiments, the endogenous RNase H domain is genetically removed from one of the other domains of the polypeptide so that it is not included in the polypeptide, e.g., the endogenous RNase H domain is partially or completely truncated from the included domain. In some embodiments, mutations in the RNase H domain produce a polypeptide that exhibits lower RNase activity, e.g., as determined by the methods described in Kotewicz et al. Nucleic Acids Res 16(1):265-277 (1988), which is incorporated herein by reference in its entirety, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% compared to an otherwise similar domain without the mutation. In some embodiments, RNase H activity is eliminated.

在一些实施例中，与没有突变的其他方面类似结构域相比，RT结构域被突变以增加保真度。例如，在一些实施例中，RT结构域中(例如，逆转录酶中)的YADD或YMDD基序被YVDD替换。在实施例中，替换YADD或YMDD或YVDD导致逆转录病毒逆转录酶活性的保真度更高(例如，如Jamburuthugoda和Eickbush J Mol Biol[分子生物学杂志]2011中所述；其通过援引以其全文并入本文)。In some embodiments, the RT domain is mutated to increase fidelity compared to an otherwise similar domain without the mutation. For example, in some embodiments, a YADD or YMDD motif in a RT domain (e.g., in a reverse transcriptase) is replaced with a YVDD. In embodiments, replacing the YADD or YMDD or YVDD results in higher fidelity of retroviral reverse transcriptase activity (e.g., as described in Jamburuthugoda and Eickbush J Mol Biol [Molecular Biology] 2011; which is incorporated herein by reference in its entirety).

在一些实施例中，本文所述的基因修饰多肽包含具有根据表6的氨基酸序列，或与其具有至少70％、80％、85％、90％、95％、97％、98％或99％同一性的序列的RT结构域。在一些实施例中，本文所述的核酸编码具有根据表6的氨基酸序列，或与其具有至少70％、80％、85％、90％、95％、97％、98％或99％同一性的序列的RT结构域。In some embodiments, the genetically modified polypeptides described herein comprise an RT domain having an amino acid sequence according to Table 6, or a sequence having at least 70%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto. In some embodiments, the nucleic acids described herein encode an RT domain having an amino acid sequence according to Table 6, or a sequence having at least 70%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto.

表6：来自逆转录病毒的示例性逆转录酶结构域Table 6: Exemplary reverse transcriptase domains from retroviruses

在一些实施例中，逆转录酶结构域被修饰，例如通过位点特异性突变。在一些实施例中，将逆转录酶结构域工程改造以具有改善的特性，例如源自MMLV RT的SuperScript IV(SSIV)逆转录酶。在一些实施例中，可以将逆转录酶结构域工程改造以具有较低的错误率，例如，如WO 2001068895(通过援引并入本文)中所述。在一些实施例中，可以将逆转录酶结构域工程改造以更耐热。在一些实施例中，可以将逆转录酶结构域工程改造以更具持续合成能力。在一些实施例中，可以将逆转录酶结构域工程改造以对抑制剂具有耐受性。在一些实施例中，可以将逆转录酶结构域工程改造为更快。在一些实施例中，可以将逆转录酶结构域工程改造以更好地耐受RNA模板中的经修饰的核苷酸。在一些实施例中，可以将逆转录酶结构域工程改造以插入经修饰的DNA核苷酸。在一些实施例中，将逆转录酶结构域工程改造以结合模板RNA。在一些实施例中，一种或多种突变选自鼠白血病病毒逆转录酶RT结构域中的D200N、L603W、T330P、D524G、E562Q、D583N、P51L、S67R、E67K、T197A、H204R、E302K、F309N、W313F、L435G、N454K、H594Q、L671P、E69K、H8Y、T306K或D653N，或另一个RT结构域的对应位置的对应突变。In certain embodiments, the reverse transcriptase domain is modified, for example, by site-specific mutation. In certain embodiments, the reverse transcriptase domain is engineered to have improved characteristics, for example, SuperScript IV (SSIV) reverse transcriptase derived from MMLV RT. In certain embodiments, the reverse transcriptase domain can be engineered to have a lower error rate, for example, as described in WO 2001068895 (incorporated herein by reference). In certain embodiments, the reverse transcriptase domain can be engineered to be more heat-resistant. In certain embodiments, the reverse transcriptase domain can be engineered to have more continuous synthesis ability. In certain embodiments, the reverse transcriptase domain can be engineered to have tolerance to inhibitors. In certain embodiments, the reverse transcriptase domain can be engineered to be faster. In certain embodiments, the reverse transcriptase domain can be engineered to better tolerate the modified nucleotides in the RNA template. In certain embodiments, the reverse transcriptase domain can be engineered to insert modified DNA nucleotides. In certain embodiments, the reverse transcriptase domain is engineered to bind template RNA. In some embodiments, the one or more mutations are selected from D200N, L603W, T330P, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H204R, E302K, F309N, W313F, L435G, N454K, H594Q, L671P, E69K, H8Y, T306K or D653N in the RT domain of murine leukemia virus reverse transcriptase, or corresponding mutations in corresponding positions of another RT domain.

在一些实施例中，基因修饰多肽包含来自逆转录病毒逆转录酶的RT结构域，例如野生型M-MLV RT，例如包含以下序列：In some embodiments, the genetically modified polypeptide comprises an RT domain from a retroviral reverse transcriptase, such as a wild-type M-MLV RT, for example comprising the sequence:

M-MLV(WT)：M-MLV(WT):

在一些实施例中，基因修饰多肽包含来自逆转录病毒逆转录酶的RT结构域，例如M-MLV RT，例如包含以下序列：In some embodiments, the gene modifying polypeptide comprises an RT domain from a retroviral reverse transcriptase, such as M-MLV RT, for example comprising the sequence:

在一些实施例中，基因修饰多肽包含来自逆转录病毒逆转录酶的RT结构域，其包含NP_057933的氨基酸659-1329的序列。在实施例中，基因修饰多肽在NP_057933的氨基酸659-1329的序列的N末端进一步包含一个另外的氨基酸，例如如下所示：In some embodiments, the genetically modified polypeptide comprises an RT domain from a retroviral reverse transcriptase comprising a sequence of amino acids 659-1329 of NP_057933. In embodiments, the genetically modified polypeptide further comprises an additional amino acid at the N-terminus of the sequence of amino acids 659-1329 of NP_057933, for example as shown below:

核心RT(粗体)，按上述注释Core RT (bold), as noted above

RNA酶H(下划线)，按上述注释 RNase H (underlined), as noted above

在实施例中，基因修饰多肽在NP_057933的氨基酸659-1329的序列的C末端进一步包含一个另外的氨基酸。在实施例中，基因修饰多肽包含RNA酶H1结构域(例如NP_057933的氨基酸1178-1318)。In embodiments, the genetically modified polypeptide further comprises an additional amino acid at the C-terminus of the sequence of amino acids 659-1329 of NP_057933. In embodiments, the genetically modified polypeptide comprises an RNase H1 domain (eg, amino acids 1178-1318 of NP_057933).

在一些实施例中，逆转录病毒逆转录酶结构域，例如M-MLV RT，可以包含野生型序列的一个或多个突变，其可以改善RT的特征，例如热稳定性、持续合成能力和/或模板结合。在一些实施例中，M-MLV RT结构域相对于上述M-MLV(WT)序列包含一个或多个突变，例如选自D200N、L603W、T330P、T306K、W313F、D524G、E562Q、D583N、P51L、S67R、E67K、T197A、H204R、E302K、F309N、L435G、N454K、H594Q、D653N、R110S、K103L，例如突变的组合，例如D200N、L603W，和T330P，任选地进一步包括T306K和W313F。在一些实施例中，本文使用的M-MLV RT包含突变D200N、L603W、T330P、T306K和W313F。在实施例中，突变M-MLV RT包含以下氨基酸序列：In some embodiments, a retroviral reverse transcriptase domain, such as M-MLV RT, can comprise one or more mutations of a wild-type sequence that can improve characteristics of RT, such as thermostability, processivity, and/or template binding. In some embodiments, the M-MLV RT domain comprises one or more mutations relative to the above-mentioned M-MLV (WT) sequence, such as selected from D200N, L603W, T330P, T306K, W313F, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H204R, E302K, F309N, L435G, N454K, H594Q, D653N, R110S, K103L, such as a combination of mutations, such as D200N, L603W, and T330P, optionally further comprising T306K and W313F. In some embodiments, the M-MLV RT used herein comprises mutations D200N, L603W, T330P, T306K and W313F. In an embodiment, the mutant M-MLV RT comprises the following amino acid sequence:

M-MLV(PE2)：M-MLV(PE2):

在一些实施例中，书写结构域(例如，RT结构域)包含RNA结合结构域，例如，其特异性结合RNA序列。在一些实施例中，模板RNA包含由书写结构域的RNA结合结构域特异性结合的RNA序列。In some embodiments, the writing domain (e.g., RT domain) comprises an RNA binding domain, e.g., which specifically binds to an RNA sequence. In some embodiments, the template RNA comprises an RNA sequence that is specifically bound by the RNA binding domain of the writing domain.

在一些实施例中，逆转录结构域仅识别和逆转录特定模板，例如系统的模板RNA。在一些实施例中，模板包含能够被逆转录结构域识别和逆转录的序列或结构。在一些实施例中，模板包含能够与本文所述基因组工程系统的多肽组分的RNA结合结构域关联的序列或结构。在一些实施例中，基因组工程系统优选地逆转录包含关联序列的模板，而非缺少关联序列的模板。In certain embodiments, the reverse transcription domain only recognizes and reverse transcribes a specific template, such as the template RNA of the system. In certain embodiments, the template comprises a sequence or structure that can be recognized and reverse transcribed by the reverse transcription domain. In certain embodiments, the template comprises a sequence or structure that can be associated with the RNA binding domain of the polypeptide component of the genome engineering system described herein. In certain embodiments, the genome engineering system preferably reverse transcribes a template that comprises an associated sequence, rather than a template that lacks an associated sequence.

书写结构域还可包含DNA依赖性DNA聚合酶活性，例如，包含能够将DNA从模板DNA序列书写入基因组的酶活性。在一些实施例中，采用DNA依赖性DNA聚合来完成靶位点编辑的第二链合成。在一些实施例中，DNA依赖性DNA聚合酶活性由多肽中的DNA聚合酶结构域提供。在一些实施例中，DNA依赖性DNA聚合酶活性由逆转录酶结构域提供，该逆转录酶结构域也能够进行DNA依赖性DNA聚合，例如第二链合成。在一些实施例中，DNA依赖性DNA聚合酶活性由系统中的第二多肽提供。在一些实施例中，DNA依赖性DNA聚合酶活性由内源宿主细胞聚合酶提供，该聚合酶任选地由基因组工程系统的组分募集到靶位点。The writing domain may also include DNA-dependent DNA polymerase activity, for example, including an enzyme activity capable of writing DNA from a template DNA sequence into a genome. In some embodiments, DNA-dependent DNA polymerization is used to complete the second strand synthesis of target site editing. In some embodiments, the DNA-dependent DNA polymerase activity is provided by a DNA polymerase domain in a polypeptide. In some embodiments, the DNA-dependent DNA polymerase activity is provided by a reverse transcriptase domain, which is also capable of DNA-dependent DNA polymerization, such as second strand synthesis. In some embodiments, the DNA-dependent DNA polymerase activity is provided by a second polypeptide in the system. In some embodiments, the DNA-dependent DNA polymerase activity is provided by an endogenous host cell polymerase, which is optionally recruited to the target site by components of a genome engineering system.

在一些实施例中，相对于参考逆转录酶结构域，逆转录酶结构域在体外具有较低的过早终止率概率(P_off)。在一些实施例中，参考逆转录酶结构域是病毒逆转录酶结构域，例如来自M-MLV的RT结构域。In some embodiments, the reverse transcriptase domain has a lower probability of premature termination (P _off ) in vitro relative to a reference reverse transcriptase domain. In some embodiments, the reference reverse transcriptase domain is a viral reverse transcriptase domain, such as an RT domain from M-MLV.

在一些实施例中，逆转录酶结构域具有低于约5x10^-3/nt、5x10^-4/nt或5x10^-6/nt的体外过早终止率(P_off)的较低概率，例如如在1094nt RNA上测量。在实施例中，体外过早终止率如Bibillo和Eickbush(2002)J Biol Chem[生物化学杂志]277(38)：34836-34845(其通过援引以其全文并入本文)中所述确定。In some embodiments, the reverse transcriptase domain has a lower probability of an in vitro premature termination rate ( _Poff ) of less than about ^5x10-3 /nt, ^5x10-4 /nt, or ^5x10-6 /nt, e.g., as measured on a 1094nt RNA. In embodiments, the in vitro premature termination rate is determined as described in Bibillo and Eickbush (2002) J Biol Chem 277(38):34836-34845 (which is incorporated herein by reference in its entirety).

在一些实施例中，逆转录酶结构域能够在细胞中完成至少约30％或50％的整合。完全整合的百分比可以通过将基本上全长整合事件(例如，包含至少98％的预期整合序列的基因组位点)的数量除以细胞群体中总(包括基本上全长和部分)整合事件的数量来测量。在实施例中，使用长读段扩增子测序确定细胞中的整合(例如，跨整合位点)，例如，如Karst等人(2020)bioRxiv doi.org/10.1101/645903(其通过援引以其全文并入本文)中所述。In some embodiments, the reverse transcriptase domain is capable of completing at least about 30% or 50% integration in the cell. The percentage of complete integration can be measured by dividing the number of substantially full-length integration events (e.g., genomic sites containing at least 98% of the expected integration sequence) by the number of total (including substantially full-length and partial) integration events in the cell population. In an embodiment, integration in a cell (e.g., across integration sites) is determined using long read amplicon sequencing, for example, as described in Karst et al. (2020) bioRxiv doi.org/10.1101/645903 (which is incorporated herein by reference in its entirety).

在实施例中，定量细胞中的整合包括计数包含至少约75％、80％、85％、90％、95％、96％、97％、98％、99％或100％的对应于模板RNA(例如长度至少为0.05、0.1、0.5、0.6、0.7、0.8、0.9、1、1.5、2、3、4或5kb的模板RNA，例如长度在0.5-0.6、0.6-0.7、0.7-0.8、0.8-0.9、1.0-1.2、1.2-1.4、1.4-1.6、1.6-1.8、1.8-2.0、2-3、3-4或4-5kb)的DNA序列的整合部分。In embodiments, quantifying integration in a cell comprises counting the integrated portion comprising at least about 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% of a DNA sequence corresponding to a template RNA (e.g., a template RNA of at least 0.05, 0.1, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.5, 2, 3, 4, or 5 kb in length, such as 0.5-0.6, 0.6-0.7, 0.7-0.8, 0.8-0.9, 1.0-1.2, 1.2-1.4, 1.4-1.6, 1.6-1.8, 1.8-2.0, 2-3, 3-4, or 4-5 kb in length).

在一些实施例中，逆转录酶结构域能够在体外聚合dNTP。在实施例中，逆转录酶结构域能够以0.1-50nt/sec(例如0.1-1、1-10或10-50nt/sec)的速率在体外聚合dNTP。在实施例中，通过单分子测定法测量逆转录酶结构域对dNTP的聚合，例如，如Schwartz和Quake(2009)PNAS[美国国家科学院院刊]106(48)：20294-20299(其通过援引以其全文并入)中所述。In some embodiments, the reverse transcriptase domain is capable of polymerizing dNTPs in vitro. In embodiments, the reverse transcriptase domain is capable of polymerizing dNTPs in vitro at a rate of 0.1-50 nt/sec (e.g., 0.1-1, 1-10, or 10-50 nt/sec). In embodiments, polymerization of dNTPs by the reverse transcriptase domain is measured by a single molecule assay, e.g., as described in Schwartz and Quake (2009) PNAS [Proceedings of the National Academy of Sciences of the United States of America] 106(48): 20294-20299 (which is incorporated by reference in its entirety).

在一些实施例中，逆转录酶结构域的体外错误率(例如，核苷酸的错误掺入)是1x10^-3-1x10^-4或1x10^-4-1x10^-5个取代/nt，例如，如Yasukawa等人(2017)Biochem BiophysRes Commun[生物化学与生物物理研究通讯]492(2)：147-153(其通过援引以其全文并入本文)中所述。在一些实施例中，逆转录酶结构域在细胞(例如，HEK293T细胞)中具有的错误率(例如，核苷酸的错误掺入)是1x10^-3-1x10^-4或1x10^-4-1x10^-5个取代/nt，例如，通过长读段扩增子测序，例如，如Karst等人(2020)bioRxiv doi.org/10.1101/645903(其通过援引以其全文并入本文)中所述。In some embodiments, the reverse transcriptase domain has an in vitro error rate (e.g., misincorporation of nucleotides) of ^1x10-3 ^-1x10-4 or ^1x10-4 ^-1x10-5 substitutions/nt, e.g., as described in Yasukawa et al. (2017) Biochem Biophys Res Commun 492(2):147-153 (which is incorporated herein by reference in its entirety). In some embodiments, the reverse transcriptase domain has an error rate (e.g., misincorporation of nucleotides) of ^1x10-3 ^-1x10-4 or ^1x10-4 ^-1x10-5 substitutions/nt in cells (e.g., HEK293T cells), e.g., by long read amplicon sequencing, e.g., as described in Karst et al. (2020) bioRxiv doi.org/10.1101/645903 (which is incorporated herein by reference in its entirety).

在一些实施例中，逆转录酶结构域能够在体外进行靶RNA的逆转录。在一些实施例中，逆转录酶需要至少3个核苷酸的引物来启动模板的逆转录。在一些实施例中，通过检测来自靶RNA的cDNA来确定靶RNA的逆转录(例如，当提供有ssDNA引物时，例如，其与靶在3′端退火至少3、4、5、6、7、8、9或10nt)，例如，如Bibillo和Eickbush(2002)JBiol Chem[生物化学杂志]277(38)：34836-34845(其通过援引以其全文并入本文)中所述。In some embodiments, the reverse transcriptase domain is capable of reverse transcription of a target RNA in vitro. In some embodiments, the reverse transcriptase requires a primer of at least 3 nucleotides to initiate reverse transcription of a template. In some embodiments, reverse transcription of a target RNA is determined by detecting cDNA from the target RNA (e.g., when a ssDNA primer is provided, for example, it anneals to the target at the 3′ end by at least 3, 4, 5, 6, 7, 8, 9 or 10 nt), for example, as described in Bibillo and Eickbush (2002) J Biol Chem [Journal of Biological Chemistry] 277 (38): 34836-34845 (which is incorporated herein by reference in its entirety).

在一些实施例中，与缺乏蛋白质结合基序(例如，3′UTR)的RNA模板相比，例如当将其RNA模板转化为cDNA时，逆转录酶结构域执行逆转录的效率至少高5或10倍(例如，通过cDNA产生)。在实施例中，逆转录效率如Yasukawa等人(2017)Biochem Biophys Res Commun[生物化学与生物物理研究通讯]492(2)：147-153(其通过援引以其全文并入本文)中所述测量。In some embodiments, the reverse transcriptase domain performs reverse transcription at least 5 or 10 times more efficiently than an RNA template lacking a protein binding motif (e.g., 3'UTR), e.g., when its RNA template is converted into cDNA (e.g., produced by cDNA). In embodiments, reverse transcription efficiency is measured as described in Yasukawa et al. (2017) Biochem Biophys Res Commun [Biochemistry and Biophysics Research Communications] 492(2): 147-153 (which is incorporated herein by reference in its entirety).

在一些实施例中，逆转录酶结构域以比任何内源细胞RNA(例如，当在细胞(例如，HEK293T细胞)中表达时)更高的频率(例如，高约5或10倍的频率)特异性结合特定的RNA模板。在实施例中，通过CLIP-seq测量逆转录酶结构域和模板RNA之间的特异性结合频率，例如，如Lin和Miles(2019)Nucleic Acids Res[核酸研究]47(11)：5490-5501(其通过援引以其全文并入本文)中所述。In some embodiments, the reverse transcriptase domain specifically binds to a particular RNA template at a higher frequency (e.g., about 5 or 10 times higher frequency) than any endogenous cellular RNA (e.g., when expressed in a cell (e.g., HEK293T cell). In an embodiment, the specific binding frequency between the reverse transcriptase domain and the template RNA is measured by CLIP-seq, e.g., as described in Lin and Miles (2019) Nucleic Acids Res 47(11): 5490-5501 (which is incorporated herein by reference in its entirety).

在一些实施例中，RT结构域(例如，如表6所列)包含如下表2A所列的一个或多个突变。在一些实施例中，如表6所列的RT结构域包含下表2A中对应行中所列的一个、两个、三个、四个、五个或六个突变。In some embodiments, the RT domain (e.g., as listed in Table 6) comprises one or more mutations as listed in Table 2A below. In some embodiments, the RT domain as listed in Table 6 comprises one, two, three, four, five or six mutations listed in the corresponding row of Table 2A below.

表2A.示例性RT结构域突变(相对于表6对应行中所列的对应野生型序列)Table 2A. Exemplary RT domain mutations (relative to the corresponding wild-type sequence listed in the corresponding row of Table 6)

模板核酸结合结构域Template nucleic acid binding domain

基因修饰多肽通常包含能够与模板核酸(例如，模板RNA)相关联的区域。在一些实施例中，模板核酸结合结构域是RNA结合结构域。在一些实施例中，RNA结合结构域是可与含有特定特征(例如结构基序)的RNA分子相关联的模块化结构域。在其他实施例中，模板核酸结合结构域(例如，RNA结合结构域)包含在逆转录结构域内，例如，逆转录酶来源的组分具有已知的RNA偏好特征。Genetically modified polypeptides typically include a region that can be associated with a template nucleic acid (e.g., a template RNA). In some embodiments, the template nucleic acid binding domain is an RNA binding domain. In some embodiments, the RNA binding domain is a modular domain that can be associated with an RNA molecule containing a specific feature (e.g., a structural motif). In other embodiments, the template nucleic acid binding domain (e.g., an RNA binding domain) is contained within a reverse transcription domain, for example, a component derived from a reverse transcriptase has a known RNA preference feature.

在其他实施例中，模板核酸结合结构域(例如，RNA结合结构域)包含在靶DNA结合结构域内。例如，在一些实施例中，DNA结合结构域是识别包含gRNA的模板核酸(例如，模板RNA)的结构的CRISPR相关蛋白。在一些实施例中，基因修饰多肽包含DNA结合结构域，其包含与gRNA支架相关联的CRISPR相关蛋白，该支架允许DNA结合结构域结合靶基因组DNA序列。在一些实施例中，gRNA支架和gRNA间隔子包含在模板核酸(例如，模板RNA)内，因此DNA结合结构域也是模板核酸结合结构域。在一些实施例中，多肽在多个结构域中具有RNA结合功能，例如，可以结合CRISPR相关DNA结合结构域中的gRNA结构和逆转录酶结构域中另外的序列或结构。In other embodiments, the template nucleic acid binding domain (e.g., RNA binding domain) is included in the target DNA binding domain. For example, in some embodiments, the DNA binding domain is a CRISPR-associated protein that recognizes the structure of the template nucleic acid (e.g., template RNA) containing gRNA. In some embodiments, the gene-modified polypeptide comprises a DNA binding domain, which comprises a CRISPR-associated protein associated with a gRNA scaffold, which allows the DNA binding domain to bind to a target genomic DNA sequence. In some embodiments, the gRNA scaffold and the gRNA spacer are included in the template nucleic acid (e.g., template RNA), so the DNA binding domain is also a template nucleic acid binding domain. In some embodiments, the polypeptide has an RNA binding function in multiple domains, for example, it can be combined with the gRNA structure in the CRISPR-associated DNA binding domain and another sequence or structure in the reverse transcriptase domain.

在一些实施例中，RNA结合结构域能够以比参考RNA结合结构域更大的亲和力结合模板RNA。在一些实施例中，参考RNA结合结构域是来自化脓性链球菌(S.pyogenes)的Cas9的RNA结合结构域。在一些实施例中，RNA结合结构域能够以100pM-10nM(例如，100pM-1nM或1nM-10nM)的亲和力结合模板RNA。在一些实施例中，RNA结合结构域对其模板RNA的亲和力在体外测量，例如通过热泳，例如，如Asmari等人Methods[方法]146：107-119(2018)(其通过援引以其全文并入本文)中所述。在一些实施例中，RNA结合结构域对其模板RNA的亲和力在细胞中测量(例如，通过FRET或CLIP-Seq)。In some embodiments, the RNA binding domain is capable of binding to the template RNA with a greater affinity than the reference RNA binding domain. In some embodiments, the reference RNA binding domain is an RNA binding domain of Cas9 from Streptococcus pyogenes (S. pyogenes). In some embodiments, the RNA binding domain is capable of binding to the template RNA with an affinity of 100pM-10nM (e.g., 100pM-1nM or 1nM-10nM). In some embodiments, the affinity of the RNA binding domain to its template RNA is measured in vitro, for example by thermophoresis, for example, as described in Asmari et al. Methods [method] 146: 107-119 (2018) (which is incorporated herein by reference in its entirety). In some embodiments, the affinity of the RNA binding domain to its template RNA is measured in cells (e.g., by FRET or CLIP-Seq).

在一些实施例中，RNA结合结构域与模板RNA在体外以比乱序RNA高至少约5倍或10倍的频率相关联。在一些实施例中，RNA结合结构域与模板RNA或乱序RNA之间的结合频率通过CLIP-seq测量，例如，如Lin和Miles(2019)Nucleic Acids Res[核酸研究]47(11)：5490-5501(其通过援引以其全文并入本文)中所述。在一些实施例中，RNA结合结构域与模板RNA在细胞(例如，HEK293T细胞)中以比乱序RNA高至少约5倍或10倍的频率相关联。在一些实施例中，RNA结合结构域与模板RNA或乱序RNA之间的关联频率通过CLIP-seq测量，例如，如Lin和Miles(2019)同上中所述。In some embodiments, the RNA binding domain is associated with the template RNA in vitro at a frequency at least about 5 times or 10 times higher than the scrambled RNA. In some embodiments, the binding frequency between the RNA binding domain and the template RNA or scrambled RNA is measured by CLIP-seq, for example, as described in Lin and Miles (2019) Nucleic Acids Res [Nucleic Acids Research] 47 (11): 5490-5501 (which is incorporated herein by reference in its entirety). In some embodiments, the RNA binding domain is associated with the template RNA in a cell (e.g., HEK293T cell) at a frequency at least about 5 times or 10 times higher than the scrambled RNA. In some embodiments, the association frequency between the RNA binding domain and the template RNA or scrambled RNA is measured by CLIP-seq, for example, as described in Lin and Miles (2019) supra.

核酸内切酶结构域和DNA结合结构域Endonuclease domain and DNA binding domain

在一些实施例中，基因修饰多肽具有通过核酸内切酶结构域切割DNA靶位点的功能。在一些实施例中，基因修饰多肽包含DNA结合结构域，例如用于结合靶核酸。在一些实施例中，基因修饰多肽的结构域(例如，Cas结构域)包含两个或更多个较小的结构域(例如，DNA结合结构域和核酸内切酶结构域)。应当理解，当DNA结合结构域(例如，Cas结构域)被描述为与靶核酸序列结合时，在一些实施例中，该结合是由gRNA介导的。In some embodiments, the gene-modified polypeptide has the function of cutting the DNA target site by the endonuclease domain. In some embodiments, the gene-modified polypeptide comprises a DNA binding domain, for example, for binding to a target nucleic acid. In some embodiments, the domain (e.g., Cas domain) of the gene-modified polypeptide comprises two or more smaller domains (e.g., DNA binding domain and endonuclease domain). It should be understood that when the DNA binding domain (e.g., Cas domain) is described as being bound to a target nucleic acid sequence, in some embodiments, the binding is mediated by gRNA.

在一些实施例中，结构域具有两种功能。例如，在一些实施例中，核酸内切酶结构域也是DNA结合结构域。在一些实施例中，核酸内切酶结构域也是模板核酸(例如，模板RNA)结合结构域。例如，在一些实施例中，多肽包含CRISPR相关的核酸内切酶结构域，其结合包含gRNA的模板RNA，结合靶DNA序列(例如，与gRNA的一部分互补)，并切割靶DNA序列。在一些实施例中，来自异源的核酸内切酶结构域或核酸内切酶/DNA结合结构域可被用于或可在本文所述的基因修饰系统中被修饰(例如，通过插入、缺失或取代一个或多个残基)。In some embodiments, the domain has two functions. For example, in some embodiments, the endonuclease domain is also a DNA binding domain. In some embodiments, the endonuclease domain is also a template nucleic acid (e.g., template RNA) binding domain. For example, in some embodiments, the polypeptide comprises an endonuclease domain associated with CRISPR, which binds to a template RNA comprising a gRNA, binds to a target DNA sequence (e.g., complementary to a portion of the gRNA), and cuts the target DNA sequence. In some embodiments, an endonuclease domain or an endonuclease/DNA binding domain from a heterologous source may be used or may be modified in a gene modification system as described herein (e.g., by inserting, deleting, or replacing one or more residues).

在一些实施例中，编码核酸内切酶结构域或核酸内切酶/DNA结合结构域的核酸被从其天然序列改变为具有改变的密码子使用，例如，针对人细胞进行改善。在一些实施例中，核酸内切酶元件是异源核酸内切酶元件，例如Cas核酸内切酶(例如，Cas9)、II型限制性核酸内切酶(例如，Fok1)、大范围核酸酶(例如，I-SceI)或其他核酸内切酶结构域。In some embodiments, the nucleic acid encoding the endonuclease domain or endonuclease/DNA binding domain is changed from its native sequence to have an altered codon usage, e.g., to improve for human cells. In some embodiments, the endonuclease element is a heterologous endonuclease element, e.g., a Cas endonuclease (e.g., Cas9), a type II restriction endonuclease (e.g., Fok1), a meganuclease (e.g., I-SceI), or other endonuclease domains.

在某些方面，选择、设计或构建本文所述的基因修饰多肽的DNA结合结构域以结合所期望的宿主DNA靶序列。在某些实施例中，多肽的DNA结合结构域是异源DNA结合元件。在一些实施例中，异源DNA结合元件是锌指元件或TAL效应子元件，例如锌指或TAL多肽或其功能片段。在一些实施例中，异源DNA结合元件是序列指导的DNA结合元件，例如Cas9、Cpf1或其他已被改变为不具有核酸内切酶活性的CRISPR相关蛋白。在一些实施例中，异源DNA结合元件保留核酸内切酶活性。在一些实施例中，异源DNA结合元件保留部分核酸内切酶活性以切割ssDNA，例如，具有切口酶活性。在特定实施例中，异源DNA结合结构域可以是Cas9、TAL结构域、ZF结构域、Myb结构域、其组合或其倍数中的任何一个或多个。In certain aspects, the DNA binding domain of the gene-modified polypeptide described herein is selected, designed or constructed to bind to the desired host DNA target sequence. In certain embodiments, the DNA binding domain of the polypeptide is a heterologous DNA binding element. In some embodiments, the heterologous DNA binding element is a zinc finger element or a TAL effector element, such as a zinc finger or TAL polypeptide or a functional fragment thereof. In some embodiments, the heterologous DNA binding element is a sequence-guided DNA binding element, such as Cas9, Cpf1 or other CRISPR-associated proteins that have been altered to have no endonuclease activity. In some embodiments, the heterologous DNA binding element retains endonuclease activity. In some embodiments, the heterologous DNA binding element retains partial endonuclease activity to cut ssDNA, for example, with nickase activity. In specific embodiments, the heterologous DNA binding domain can be any one or more of Cas9, TAL domains, ZF domains, Myb domains, combinations thereof, or multiples thereof.

在一些实施例中，例如通过位点特异性突变、增加或减少DNA结合元件(例如锌指的数量和/或特异性)等来修饰DNA结合结构域，以改变DNA结合特异性和亲和力。在一些实施例中，编码DNA结合结构域的核酸序列从其天然序列改变为具有改变的密码子使用，例如，针对人细胞进行改善。在实施例中，该DNA结合结构域相对于野生型DNA结合结构域包含一个或多个修饰、例如经由定向进化(例如，噬菌体辅助的连续进化(PACE))的修饰。In some embodiments, the DNA binding domain is modified, for example, by site-specific mutations, increasing or decreasing the number and/or specificity of DNA binding elements (e.g., zinc fingers), etc., to change DNA binding specificity and affinity. In some embodiments, the nucleic acid sequence encoding the DNA binding domain is changed from its native sequence to have altered codon usage, for example, to improve for human cells. In an embodiment, the DNA binding domain comprises one or more modifications relative to the wild-type DNA binding domain, for example, modifications via directed evolution (e.g., phage-assisted continuous evolution (PACE)).

在一些实施例中，DNA结合结构域包含大范围核酸酶结构域(例如，如本文所述，例如，在核酸内切酶结构域部分中)，或其功能片段。在一些实施例中，大范围核酸酶结构域具有核酸内切酶活性、例如双链切割和/或切口酶活性。在其他实施例中，大范围核酸酶结构域具有降低的活性，例如，缺乏核酸内切酶活性，例如，该大范围核酸酶无催化活性。在一些实施例中，无催化活性的大范围核酸酶用作DNA结合结构域，例如，如Fonfara等人NucleicAcids Res[核酸研究]40(2)：847-860(2012)中所述，该文献通过援引以其全文并入本文。In some embodiments, the DNA binding domain comprises a meganuclease domain (e.g., as described herein, e.g., in an endonuclease domain portion), or a functional fragment thereof. In some embodiments, the meganuclease domain has endonuclease activity, e.g., double-strand cleavage and/or nickase activity. In other embodiments, the meganuclease domain has reduced activity, e.g., lacks endonuclease activity, e.g., the meganuclease is catalytically inactive. In some embodiments, a catalytically inactive meganuclease is used as a DNA binding domain, e.g., as described in Fonfara et al. Nucleic Acids Res [Nucleic Acids Research] 40(2): 847-860 (2012), which is incorporated herein by reference in its entirety.

在一些实施例中，基因修饰多肽包含对DNA结合结构域的修饰，例如，相对于野生型多肽。在一些实施例中，DNA结合结构域包含对原始DNA结合结构域的氨基酸序列的添加、缺失、替换或修饰。在一些实施例中，DNA结合结构域被修饰以包括特异性结合目的靶核酸(例如DNA)序列的异源功能结构域。在一些实施例中，功能结构域替换多肽的先前DNA结合结构域的至少一部分(例如，全部)。在一些实施例中，功能结构域包含锌指(例如，特异性结合目的靶核酸(例如，DNA)序列的锌指)。在一些实施例中，功能结构域包含Cas结构域(例如，特异性结合目的靶核酸(例如，DNA)序列的Cas结构域。在一些实施例中，Cas结构域包含Cas9或其突变体或变体(例如，如本文所述)。在实施例中，Cas结构域与指导RNA(gRNA)相关联，例如，如本文所述。在实施例中，Cas结构域被gRNA导向目的靶核酸(例如，DNA)序列。在实施例中，Cas结构域与gRNA在相同的核酸(例如，RNA)分子中编码。在实施例中，Cas结构域与gRNA在不同的核酸(例如，RNA)分子中编码。In some embodiments, the genetically modified polypeptide comprises a modification to a DNA binding domain, for example, relative to a wild-type polypeptide. In some embodiments, the DNA binding domain comprises an addition, deletion, replacement or modification to the amino acid sequence of the original DNA binding domain. In some embodiments, the DNA binding domain is modified to include a heterologous functional domain that specifically binds to a target nucleic acid (e.g., DNA) sequence of interest. In some embodiments, the functional domain replaces at least a portion (e.g., all) of a previous DNA binding domain of a polypeptide. In some embodiments, the functional domain comprises a zinc finger (e.g., a zinc finger that specifically binds to a target nucleic acid (e.g., DNA) sequence of interest). In some embodiments, the functional domain comprises a Cas domain (e.g., a Cas domain that specifically binds to a target nucleic acid (e.g., DNA) sequence of interest. In some embodiments, the Cas domain comprises Cas9 or a mutant or variant thereof (e.g., as described herein). In embodiments, the Cas domain is associated with a guide RNA (gRNA), e.g., as described herein. In embodiments, the Cas domain is guided to a target nucleic acid (e.g., DNA) sequence of interest by a gRNA. In embodiments, the Cas domain and the gRNA are encoded in the same nucleic acid (e.g., RNA) molecule. In embodiments, the Cas domain and the gRNA are encoded in different nucleic acid (e.g., RNA) molecules.

在一些实施例中，DNA结合结构域能够以比参考DNA结合结构域更大的亲和力结合靶序列(例如，dsDNA靶序列)。在一些实施例中，参考DNA结合结构域是来自化脓性链球菌的Cas9的DNA结合结构域。在一些实施例中，DNA结合结构域能够以100pM-10nM(例如，100pM-1nM或1nM-10nM之间)之间的亲和力结合靶序列(例如，dsDNA靶序列)。In some embodiments, the DNA binding domain is capable of binding to a target sequence (e.g., a dsDNA target sequence) with a greater affinity than a reference DNA binding domain. In some embodiments, the reference DNA binding domain is a DNA binding domain of Cas9 from Streptococcus pyogenes. In some embodiments, the DNA binding domain is capable of binding to a target sequence (e.g., a dsDNA target sequence) with an affinity between 100 pM-10 nM (e.g., between 100 pM-1 nM or 1 nM-10 nM).

在一些实施例中，DNA结合结构域对其靶序列(例如，dsDNA靶序列)的亲和力在体外测量，例如，通过热泳，例如，如Asmari等人Methods[方法]146：107-119(2018)(通过援引以其全文并入本文)中所述。In some embodiments, the affinity of a DNA binding domain for its target sequence (e.g., a dsDNA target sequence) is measured in vitro, e.g., by thermophoresis, e.g., as described in Asmari et al. Methods 146:107-119 (2018), incorporated herein by reference in its entirety.

在实施例中，在例如约100倍摩尔过量的摩尔过量的乱序序列竞争者dsDNA存在的情况下，DNA结合结构域能够例如以100pM-10nM(例如，100pM-1nM或1nM-10nM之间)之间的亲和力结合其靶序列(例如，dsDNA靶序列)。In embodiments, the DNA binding domain is capable of binding its target sequence (e.g., a dsDNA target sequence) with an affinity of, e.g., between 100 pM-10 nM (e.g., between 100 pM-1 nM or 1 nM-10 nM) in the presence of a molar excess of, e.g., about a 100-fold molar excess of a scrambled sequence competitor dsDNA.

在一些实施例中，发现DNA结合结构域与其靶序列(例如，dsDNA靶序列)相关联的频率高于靶细胞(例如，人靶细胞)基因组中的任何其他序列，例如，如通过ChIP-seq测量的(例如，在HEK293T细胞中)，例如，如He和Pu(2010)Curr.Protoc Mol Biol[分子生物学最新方案]第21章(其通过援引以其全文并入本文)中所述。在一些实施例中，发现DNA结合结构域与其靶序列(例如，dsDNA靶序列)以比靶细胞的基因组中任何其他序列更频繁至少约5倍或10倍的频率相关联，例如，如通过ChIP-seq(例如，在HEK293T细胞中)测量的，例如，如He和Pu(2010)，同上中所述。In some embodiments, a DNA binding domain is found to be associated with its target sequence (e.g., a dsDNA target sequence) more frequently than any other sequence in the genome of a target cell (e.g., a human target cell), e.g., as measured by ChIP-seq (e.g., in HEK293T cells), e.g., as described in He and Pu (2010) Curr. Protoc Mol Biol [Current Protocols in Molecular Biology] Chapter 21 (which is incorporated herein by reference in its entirety). In some embodiments, a DNA binding domain is found to be associated with its target sequence (e.g., a dsDNA target sequence) at least about 5-fold or 10-fold more frequently than any other sequence in the genome of a target cell, e.g., as measured by ChIP-seq (e.g., in HEK293T cells), e.g., as described in He and Pu (2010), supra.

在一些实施例中，核酸内切酶结构域具有切口酶活性并且切割靶DNA的一条链。在一些实施例中，切口酶活性减少了靶位点处双链断裂的形成。在一些实施例中，核酸内切酶结构域在靶DNA的第一链和第二链中产生交错的切口结构。在一些实施例中，交错的切口结构在靶位点产生游离3’突出端。在一些实施例中，靶位点处的游离3’突出端提高编辑效率，例如，通过增强模板核酸的3’同源区的访问和退火。在一些实施例中，交错的切口结构减少了靶位点处双链断裂的形成。In some embodiments, the endonuclease domain has nickase activity and cuts a strand of the target DNA. In some embodiments, the nickase activity reduces the formation of double-strand breaks at the target site. In some embodiments, the endonuclease domain produces a staggered nick structure in the first and second strands of the target DNA. In some embodiments, the staggered nick structure produces a free 3' overhang at the target site. In some embodiments, the free 3' overhang at the target site improves editing efficiency, for example, by enhancing access and annealing of the 3' homologous region of the template nucleic acid. In some embodiments, the staggered nick structure reduces the formation of double-strand breaks at the target site.

在一些实施例中，核酸内切酶结构域切割靶DNA的两条链，例如导致靶的平端切割，并且在切割位点的两侧没有ssDNA突出端。本文所述的基因修饰系统的核酸内切酶结构域的氨基酸序列可以与本文所述的核酸内切酶结构域(例如，表8的核酸内切酶结构域)的氨基酸序列至少约50％、至少约60％、至少约70％、至少约80％、至少约85％、至少约90％、至少约95％、至少约96％、至少约97％、至少约98％、至少约99％相同。In some embodiments, the endonuclease domain cuts both strands of the target DNA, for example, resulting in blunt-end cleavage of the target, and there are no ssDNA overhangs on either side of the cleavage site. The amino acid sequence of the endonuclease domain of the gene modification system described herein can be at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to the amino acid sequence of the endonuclease domain described herein (e.g., the endonuclease domain of Table 8).

在某些实施例中，异源核酸内切酶是Fok1或其功能片段。在某些实施例中，异源核酸内切酶是霍利迪(Holliday)连接解离酶或其同源物，例如来自硫磺矿硫化叶菌(Sulfolobus solfataricus)-Ssol Hje的霍利迪连接解离酶(Govindaraju等人，NucleicAcids Research[核酸研究]44：7，2016)。在某些实施例中，异源核酸内切酶是剪接体蛋白诸如Prp8的大片段的核酸内切酶(Mahbub等人，Mobile DNA[移动DNA]8：16，2017)。在某些实施例中，异源核酸内切酶源自CRISPR相关蛋白，例如Cas9。在某些实施例中，异源核酸内切酶被工程改造为仅具有ssDNA切割活性，例如仅具有切口酶活性，例如是Cas9切口酶，例如具有D10A、H840A或N863A突变的SpCas9。表8提供了与切口酶活性相关的示例性Cas蛋白和突变。在又其他实施例中，同源核酸内切酶结构域被修饰，例如通过位点特异性突变，以改变DNA核酸内切酶活性。在又其他实施例中，对核酸内切酶结构域进行修饰以降低DNA序列特异性，例如通过截短以去除赋予DNA序列特异性的结构域或通过突变以灭活赋予DNA序列特异性的区域。In certain embodiments, the heterologous endonuclease is Fok1 or a functional fragment thereof. In certain embodiments, the heterologous endonuclease is a Holliday ligation resolvase or a homolog thereof, such as a Holliday ligation resolvase from Sulfolobus solfataricus-Ssol Hje (Govindaraju et al., Nucleic Acids Research 44:7, 2016). In certain embodiments, the heterologous endonuclease is an endonuclease of a large fragment of a spliceosomal protein such as Prp8 (Mahbub et al., Mobile DNA 8:16, 2017). In certain embodiments, the heterologous endonuclease is derived from a CRISPR-associated protein, such as Cas9. In certain embodiments, the heterologous endonuclease is engineered to have only ssDNA cleavage activity, such as only nickase activity, such as a Cas9 nickase, such as SpCas9 with a D10A, H840A or N863A mutation. Table 8 provides exemplary Cas proteins and mutations associated with nickase activity. In yet other embodiments, the homologous endonuclease domain is modified, for example, by site-specific mutations, to change the DNA endonuclease activity. In yet other embodiments, the endonuclease domain is modified to reduce DNA sequence specificity, for example, by truncating to remove the domain that confers DNA sequence specificity or by mutating to inactivate the region that confers DNA sequence specificity.

在一些实施例中，核酸内切酶结构域具有切口酶活性并且不形成双链断裂。在一些实施例中，核酸内切酶结构域以比双链断裂更高的频率形成单链断裂，例如，至少90％、95％、96％、97％、98％、或99％的断裂是单链断裂，或少于10％、5％、4％、3％、2％、或1％的断裂是双链断裂。在一些实施例中，核酸内切酶基本上不形成双链断裂。在一些实施例中，核酸内切酶不形成可检测水平的双链断裂。In some embodiments, the endonuclease domain has nickase activity and does not form double-strand breaks. In some embodiments, the endonuclease domain forms single-strand breaks at a higher frequency than double-strand breaks, for example, at least 90%, 95%, 96%, 97%, 98%, or 99% of the breaks are single-strand breaks, or less than 10%, 5%, 4%, 3%, 2%, or 1% of the breaks are double-strand breaks. In some embodiments, the endonuclease does not form double-strand breaks substantially. In some embodiments, the endonuclease does not form double-strand breaks of detectable levels.

在一些实施例中，核酸内切酶结构域具有对第一链的靶位点DNA进行切口的切口酶活性；例如，在一些实施例中，核酸内切酶切割基因组DNA的靶位点，该靶位点在将被书写结构域延伸的链上的改变位点附近。在一些实施例中，核酸内切酶结构域具有对第一链的靶位点DNA进行切口并且不对第二链的靶位点DNA进行切口的切口酶活性。例如，当多肽包含具有切口酶活性的CRISPR相关核酸内切酶结构域时，在一些实施例中，所述CRISPR相关核酸内切酶结构域对含有PAM位点的靶位点DNA链进行切口(例如，并且不对不含有PAM位点的靶位点DNA链进行切口)。作为另一个实例，当多肽包含具有切口酶活性的CRISPR相关核酸内切酶结构域时，在一些实施例中，所述CRISPR相关核酸内切酶结构域对不含有PAM位点的靶位点DNA链进行切口(例如，并且不对包含PAM位点的靶位点DNA链进行切口)。In some embodiments, the endonuclease domain has a nickase activity that cuts the target site DNA of the first chain; for example, in some embodiments, the endonuclease cuts the target site of the genomic DNA, which is near the change site on the chain to be extended by the writing domain. In some embodiments, the endonuclease domain has a nickase activity that cuts the target site DNA of the first chain and does not cut the target site DNA of the second chain. For example, when the polypeptide comprises a CRISPR-related endonuclease domain with nickase activity, in some embodiments, the CRISPR-related endonuclease domain cuts the target site DNA chain containing the PAM site (for example, and does not cut the target site DNA chain that does not contain the PAM site). As another example, when the polypeptide comprises a CRISPR-related endonuclease domain with nickase activity, in some embodiments, the CRISPR-related endonuclease domain cuts the target site DNA chain that does not contain the PAM site (for example, and does not cut the target site DNA chain that contains the PAM site).

在一些其他实施例中，核酸内切酶结构域具有切口酶活性，其对第一链和第二链的靶位点DNA进行切口。不希望受理论束缚，在本文所述的多肽的书写结构域(例如，RT结构域)从模板核酸(例如，模板RNA)的异源对象序列聚合(例如，逆转录)之后，细胞DNA修复机制必须修复第一DNA链上的切口。靶位点DNA现在包含两个不同的第一DNA链序列：一个对应于原始基因组DNA(例如，具有游离5′端)，并且第二个对应于从异源对象序列聚合而来的那个(例如，具有游离3′端)。认为这两个不同的序列相互平衡，第一个与第二链杂交，然后另一个，并且细胞DNA修复装置掺入其修复的靶位点中的序列可以是随机过程。不希望受理论束缚，认为向第二链引入另外的切口可能使细胞DNA修复机制偏向于比原始基因组序列更频繁地采用基于异源对象序列的序列(Anzalone等人Nature[自然]576：149-157(2019))。在一些实施例中，另外的切口位于靶位点修饰(例如，插入、缺失或取代)或第一链上的切口的5′或3′的至少10、15、20、25、30、35、40、45、50、55、60、65、70、75、80、85、90、95、100、105、110、115、120、125、130、135、140、145、或150个核苷酸处。In some other embodiments, the endonuclease domain has a nickase activity that cuts the target site DNA of the first and second strands. Without wishing to be bound by theory, after the writing domain (e.g., RT domain) of the polypeptide described herein is polymerized (e.g., reverse transcribed) from a heterologous subject sequence of a template nucleic acid (e.g., template RNA), the cellular DNA repair mechanism must repair the nick on the first DNA strand. The target site DNA now contains two different first DNA strand sequences: one corresponding to the original genomic DNA (e.g., with a free 5' end), and the second corresponding to the one polymerized from the heterologous subject sequence (e.g., with a free 3' end). It is believed that the two different sequences are balanced with each other, the first hybridizing with the second strand, then the other, and the sequence that the cellular DNA repair apparatus incorporates into the target site it repairs can be a random process. Without wishing to be bound by theory, it is believed that the introduction of additional nicks into the second strand may bias the cellular DNA repair mechanism to adopt sequences based on heterologous subject sequences more frequently than the original genomic sequence (Anzalone et al. Nature [Nature] 576: 149-157 (2019)). In some embodiments, the additional nicks are located at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, or 150 nucleotides 5′ or 3′ of the target site modification (e.g., insertion, deletion, or substitution) or the nick on the first strand.

可替代地或另外地，不希望受理论束缚，认为第二链的另外切口可促进第二链合成。在一些实施例中，当基因修饰系统已插入或取代了第一链的一部分时，需要合成对应于第二链中的插入/取代的新序列。Alternatively or additionally, without wishing to be bound by theory, it is believed that additional nicking of the second strand may facilitate second strand synthesis. In some embodiments, when the gene modification system has inserted or replaced a portion of the first strand, a new sequence corresponding to the insertion/replacement in the second strand needs to be synthesized.

在一些实施例中，多肽包含具有核酸内切酶活性的单个结构域(例如，单个核酸内切酶结构域)并且所述结构域对第一链和第二链进行切口。例如，在这样的实施例中，核酸内切酶结构域可以是CRISPR相关核酸内切酶结构域，并且模板核酸(例如，模板RNA)包含指导对第一链进行切口的gRNA间隔子和指导对第二链进行切口的另外gRNA间隔子。在一些实施例中，多肽包含多个具有核酸内切酶活性的结构域，并且第一核酸内切酶结构域对第一链进行切口并且第二核酸内切酶结构域对第二链进行切口(任选地，第一核酸内切酶结构域不(例如，不能)对第二链进行切口，并且第二核酸内切酶结构域不(例如，不能)对第一链进行切口)。In some embodiments, the polypeptide comprises a single domain (e.g., a single endonuclease domain) with endonuclease activity and the domain cuts the first and second chains. For example, in such an embodiment, the endonuclease domain can be a CRISPR-related endonuclease domain, and the template nucleic acid (e.g., template RNA) comprises a gRNA spacer for guiding the cutting of the first chain and another gRNA spacer for guiding the cutting of the second chain. In some embodiments, the polypeptide comprises a plurality of domains with endonuclease activity, and the first endonuclease domain cuts the first chain and the second endonuclease domain cuts the second chain (optionally, the first endonuclease domain does not (e.g., cannot) cut the second chain, and the second endonuclease domain does not (e.g., cannot) cut the first chain).

在一些实施例中，核酸内切酶结构域能够对第一链和第二链进行切口。在一些实施例中，第一和第二链切口出现在靶位点中的相同位置但在相对的链上。在一些实施例中，第二链切口出现在第一切口的交错位置，例如上游或下游。在一些实施例中，如果第二链切口在第一链切口的上游，则核酸内切酶结构域产生靶位点缺失。在一些实施例中，如果第二链切口在第一链切口的下游，则核酸内切酶结构域产生靶位点重复。在一些实施例中，如果第一和第二链的切口出现在靶位点的相同位置，则核酸内切酶结构域不会产生重复和/或缺失。在一些实施例中，核酸内切酶结构域具有改变的活性，这取决于蛋白质构象或RNA结合状态，例如，这促进第一或第二链的切口(例如，如Christensen等人PNAS[美国国家科学院院刊]2006中所述；其通过援引以其全文并入本文)。In some embodiments, the endonuclease domain is capable of nicking the first and second chains. In some embodiments, the first and second chain nicks occur at the same position in the target site but on the opposite chain. In some embodiments, the second chain nick occurs at the staggered position of the first nick, such as upstream or downstream. In some embodiments, if the second chain nick is upstream of the first chain nick, the endonuclease domain produces a target site deletion. In some embodiments, if the second chain nick is downstream of the first chain nick, the endonuclease domain produces a target site repetition. In some embodiments, if the nicks of the first and second chains occur at the same position of the target site, the endonuclease domain does not produce repetitions and/or deletions. In some embodiments, the endonuclease domain has a changed activity, which depends on the protein conformation or RNA binding state, for example, this promotes the nicking of the first or second chain (for example, as described in Christensen et al. PNAS [Proceedings of the National Academy of Sciences of the United States] 2006; it is incorporated herein by reference in its entirety).

在一些实施例中，核酸内切酶结构域包含大范围核酸酶或其功能片段。在一些实施例中，核酸内切酶结构域包含归巢核酸内切酶或其功能片段。在一些实施例中，核酸内切酶结构域包含来自LAGLIDADG、GIY-YIG、HNH、His-Cys盒或PD-(D/E)XK家族的大范围核酸酶，或其功能片段或变体，例如，这些功能片段或变体具有例如如家族名称所示的保守氨基酸基序。在一些实施例中，核酸内切酶结构域包含大范围核酸酶或其片段，其选自例如I-SmaMI(Uniprot F7WD42)、I-SceI(Uniprot P03882)、I-AniI(Uniprot P03880)、I-DmoI(Uniprot P21505)、I-CreI(Uniprot P05725)、I-TevI(Uniprot P13299)、I-OnuI(UniprotQ4VWW5)、或I-BmoI(Uniprot Q9ANR6)。在一些实施例中，大范围核酸酶呈其功能形式时是天然单体，例如I-SceI、I-TevI，或二聚体，例如I-CreI。例如，具有单个LAGLIDADG基序拷贝的LAGLIDADG大范围核酸酶通常形成同二聚体，而具有两个LAGLIDADG基序拷贝的成员通常作为单体被发现。在一些实施例中，通常以二聚体形式形成的大范围核酸酶被表达为融合体，例如，两个亚基作为单个ORF表达并且任选地通过接头连接，例如I-CreI二聚体融合体(Rodriguez-Fornes等人Gene Therapy[基因疗法]2020；通过援引以其全文并入本文)。在一些实施例中，改变大范围核酸酶或其功能片段以有利于双链DNA分子的一条链的切口酶活性，例如I-SceI(K122I和/或K223I)(Niu等人J Mol Biol[分子生物学杂志]2008)、I-AniI(K227M)(McConnell Smith等人PNAS[美国国家科学院院刊]2009)、I-DmoI(Q42A和/或K120M)(Molina等人J Biol Chem[生物化学杂志]2015)。在一些实施例中，具有这种对单链切割的偏好的大范围核酸酶或其功能片段被用作核酸内切酶结构域，例如，具有切口酶活性。在一些实施例中，核酸内切酶结构域包含大范围核酸酶或其功能片段，其天然靶向或工程改造以靶向安全港位点，例如靶向SH6位点的I-CreI(Rodriguez-Fornes等人，同上)。在一些实施例中，核酸内切酶结构域包含大范围核酸酶或其功能片段，其具有序列耐受催化结构域，例如，识别最小基序CNNNG的I-TevI(Kleinstiver等人PNAS[美国国家科学院院刊]2012)。在一些实施例中，将靶序列耐受性催化结构域融合至DNA结合结构域，例如以指导活性，例如通过将I-TevI融合至：(i)锌指以产生Tev-ZFE(Kleinstiver等人PNAS[美国国家科学院院刊]2012)，(ii)其他大范围核酸酶以产生MegaTevs(Wolfs等人Nucleic Acids Res[核酸研究]2014)，和/或(iii)Cas9以产生TevCas9(Wolfs等人PNAS[美国国家科学院院刊]2016)。In some embodiments, the endonuclease domain comprises a meganuclease or a functional fragment thereof. In some embodiments, the endonuclease domain comprises a homing endonuclease or a functional fragment thereof. In some embodiments, the endonuclease domain comprises a meganuclease from the LAGLIDADG, GIY-YIG, HNH, His-Cys box or PD-(D/E)XK family, or a functional fragment or variant thereof, for example, having a conserved amino acid motif, for example, as indicated by the family name. In some embodiments, the endonuclease domain comprises a meganuclease or a fragment thereof selected from, for example, I-SmaMI (Uniprot F7WD42), I-SceI (Uniprot P03882), I-AniI (Uniprot P03880), I-Dmol (Uniprot P21505), I-Crel (Uniprot P05725), I-TevI (Uniprot P13299), I-OnuI (Uniprot Q4VWW5), or I-Bmol (Uniprot Q9ANR6). In some embodiments, the meganuclease is a natural monomer, such as I-SceI, I-TevI, or a dimer, such as I-Crel, in its functional form. For example, LAGLIDADG meganucleases with a single copy of the LAGLIDADG motif typically form homodimers, while members with two copies of the LAGLIDADG motif are typically found as monomers. In some embodiments, meganucleases that are normally formed as dimers are expressed as fusions, e.g., two subunits are expressed as a single ORF and optionally linked by a linker, e.g., an I-Crel dimer fusion (Rodriguez-Fornes et al. Gene Therapy 2020; incorporated herein by reference in its entirety). In some embodiments, meganucleases or functional fragments thereof are altered to favor nickase activity of one strand of a double-stranded DNA molecule, e.g., I-Scel (K122I and/or K223I) (Niu et al. J Mol Biol 2008), I-AniI (K227M) (McConnell Smith et al. PNAS 2009), I-Dmol (Q42A and/or K120M) (Molina et al. J Biol Chem 2015). In some embodiments, meganucleases or functional fragments thereof having such a preference for single-stranded cleavage are used as endonuclease domains, e.g., having nickase activity. In some embodiments, the endonuclease domain comprises a meganuclease or a functional fragment thereof that is naturally targeted or engineered to target a safe harbor site, such as I-Crel (Rodriguez-Fornes et al., supra) that targets the SH6 site. In some embodiments, the endonuclease domain comprises a meganuclease or a functional fragment thereof that has a sequence-tolerant catalytic domain, such as I-TevI (Kleinstiver et al. PNAS [Proceedings of the National Academy of Sciences of the United States of America] 2012) that recognizes the minimal motif CNNNG. In some embodiments, a target sequence-tolerant catalytic domain is fused to a DNA binding domain, e.g., to direct activity, e.g., by fusing I-TevI to: (i) a zinc finger to produce Tev-ZFE (Kleinstiver et al. PNAS [Proceedings of the National Academy of Sciences of the United States of America] 2012), (ii) other meganucleases to produce MegaTevs (Wolfs et al. Nucleic Acids Res [Nucleic Acids Research] 2014), and/or (iii) Cas9 to produce TevCas9 (Wolfs et al. PNAS [Proceedings of the National Academy of Sciences of the United States of America] 2016).

在一些实施例中，核酸内切酶结构域包含限制酶，例如，IIS型或IIP型限制酶。在一些实施例中，核酸内切酶结构域包含IIS型限制酶，例如FokI，或其片段或变体。在一些实施例中，核酸内切酶结构域包含IIP型限制酶，例如PvuII，或其片段或变体。在一些实施例中，二聚体限制酶表达为融合体，从而其作为单链发挥作用，例如，FokI二聚融合体(Minczuk等人Nucleic Acids Res[核酸研究]36(12)：3926-3938(2008))。In some embodiments, the endonuclease domain comprises a restriction enzyme, e.g., a type IIS or type IIP restriction enzyme. In some embodiments, the endonuclease domain comprises a type IIS restriction enzyme, e.g., FokI, or a fragment or variant thereof. In some embodiments, the endonuclease domain comprises a type IIP restriction enzyme, e.g., PvuII, or a fragment or variant thereof. In some embodiments, a dimeric restriction enzyme is expressed as a fusion so that it functions as a single chain, e.g., a FokI dimer fusion (Minczuk et al. Nucleic Acids Res 36(12):3926-3938 (2008)).

例如，在Guha和Edgell Int J Mol Sci[国际分子科学杂志]18(22)：2565(2017)中描述了另外的核酸内切酶结构域的使用，该文献通过援引以其全文并入本文。For example, the use of additional endonuclease domains is described in Guha and Edgell Int J Mol Sci 18(22):2565 (2017), which is incorporated herein by reference in its entirety.

在一些实施例中，基因修饰多肽包含对核酸内切酶结构域的修饰，例如，相对于野生型Cas蛋白。在一些实施例中，核酸内切酶结构域包含对野生型Cas蛋白的氨基酸序列的添加、缺失、替换或修饰。在一些实施例中，核酸内切酶结构域被修饰以包括异源功能结构域，其特异性结合和/或诱导目的靶核酸(例如，DNA)序列的核酸内切酶切割。在一些实施例中，核酸内切酶结构域包含锌指。在实施例中，包含Cas结构域的核酸内切酶结构域与例如如本文所述的指导RNA(gRNA)相关联。在一些实施例中，核酸内切酶结构域被修饰以包括不靶向特定靶核酸(例如，DNA)序列的功能结构域。在实施例中，核酸内切酶结构域包含Fok1结构域。In some embodiments, the gene modification polypeptide comprises a modification to an endonuclease domain, for example, relative to a wild-type Cas protein. In some embodiments, the endonuclease domain comprises an addition, deletion, replacement or modification to the amino acid sequence of a wild-type Cas protein. In some embodiments, the endonuclease domain is modified to include a heterologous functional domain that specifically binds and/or induces endonuclease cleavage of a target nucleic acid (e.g., DNA) sequence of interest. In some embodiments, the endonuclease domain comprises a zinc finger. In an embodiment, the endonuclease domain comprising a Cas domain is associated with, for example, a guide RNA (gRNA) as described herein. In some embodiments, the endonuclease domain is modified to include a functional domain that does not target a specific target nucleic acid (e.g., DNA) sequence. In an embodiment, the endonuclease domain comprises a Fok1 domain.

在一些实施例中，核酸内切酶结构域与靶dsDNA在体外以比乱序dsDNA高至少约5倍或10倍的频率相关联。在一些实施例中，核酸内切酶结构域与靶dsDNA在体外以比乱序dsDNA高至少约5倍或10倍的频率相关联，例如在细胞(例如，HEK293T细胞)中。在一些实施例中，核酸内切酶结构域与靶DNA或乱序DNA之间的关联频率通过ChIP-seq测量，例如，如He和Pu(2010)Curr.Protoc Mol Biol[分子生物学最新方案]第21章(其通过援引以其全文并入本文)中所述。In some embodiments, the endonuclease domain is associated with the target dsDNA in vitro at a frequency at least about 5-fold or 10-fold higher than the scrambled dsDNA. In some embodiments, the endonuclease domain is associated with the target dsDNA in vitro at a frequency at least about 5-fold or 10-fold higher than the scrambled dsDNA, for example in a cell (e.g., HEK293T cell). In some embodiments, the association frequency between the endonuclease domain and the target DNA or scrambled DNA is measured by ChIP-seq, for example, as described in He and Pu (2010) Curr. Protoc Mol Biol [Current Protocols in Molecular Biology] Chapter 21 (which is incorporated herein by reference in its entirety).

在一些实施例中，核酸内切酶结构域可以催化在靶序列处形成切口，例如相对于非靶序列(例如，相对于靶细胞基因组中的任何其他基因组序列)增加至少约5倍或10倍。在一些实施例中，使用NickSeq确定切口形成的水平，例如，如Elacqua等人(2019)bioRxivdoi.org/10.1101/867937(其通过援引以其全文并入本文)中所述。In some embodiments, the endonuclease domain can catalyze the formation of a nick at the target sequence, such as an increase of at least about 5-fold or 10-fold relative to a non-target sequence (e.g., relative to any other genomic sequence in the target cell genome). In some embodiments, the level of nick formation is determined using NickSeq, for example, as described in Elacqua et al. (2019) bioRxivdoi.org/10.1101/867937 (which is incorporated herein by reference in its entirety).

在一些实施例中，核酸内切酶结构域能够在体外对DNA进行切口。在实施例中，切口导致暴露的碱基。在实施例中，暴露的碱基可以使用核酸酶敏感性测定来检测，例如，如Chaudhry和Weinfeld(1995)Nucleic Acids Res[核酸研究]23(19)：3805-3809(其通过援引以其全文并入本文)中所述。在实施例中，暴露的碱基的水平(例如，通过核酸酶敏感性测定检测)相对于参考核酸内切酶结构域增加至少10％、50％或更多。在一些实施例中，参考核酸内切酶结构域是来自化脓性链球菌的Cas9的核酸内切酶结构域。In some embodiments, the endonuclease domain is capable of nicking DNA in vitro. In an embodiment, the nicking results in exposed bases. In an embodiment, the exposed bases can be detected using a nuclease sensitivity assay, for example, as described in Chaudhry and Weinfeld (1995) Nucleic Acids Res [Nucleic Acids Research] 23 (19): 3805-3809 (which is incorporated herein by reference in its entirety). In an embodiment, the level of exposed bases (e.g., detected by a nuclease sensitivity assay) is increased by at least 10%, 50% or more relative to a reference endonuclease domain. In some embodiments, the reference endonuclease domain is an endonuclease domain of Cas9 from Streptococcus pyogenes.

在一些实施例中，核酸内切酶结构域能够在细胞中对DNA进行切口。在实施例中，核酸内切酶结构域能够在HEK293T细胞中对DNA进行切口。在实施例中，在没有Rad51的情况下经历复制的未修复的切口导致切口部位的NHEJ率增加，这可以例如通过使用Rad51抑制测定来检测，例如，如Bothmer等人(2017)Nat Commun[自然通讯]8：13905(其通过援引以其全文并入本文)中所述。在实施例中，NHEJ率增加至0-5％以上。在实施例中，例如在Rad51抑制后，NHEJ率增加至20％-70％(例如，在30％-60％或40％-50％)。In some embodiments, the endonuclease domain is capable of nicking DNA in a cell. In an embodiment, the endonuclease domain is capable of nicking DNA in a HEK293T cell. In an embodiment, an unrepaired nick that undergoes replication in the absence of Rad51 results in an increase in the NHEJ rate at the nick site, which can be detected, for example, by using a Rad51 inhibition assay, for example, as described in Bothmer et al. (2017) Nat Commun [Natural Communications] 8: 13905 (which is incorporated herein by reference in its entirety). In an embodiment, the NHEJ rate is increased to more than 0-5%. In an embodiment, for example, after Rad51 inhibition, the NHEJ rate is increased to 20%-70% (e.g., 30%-60% or 40%-50%).

在一些实施例中，核酸内切酶结构域在切割后释放靶标。在一些实施例中，通过评估酶的多次周转间接指示靶标的释放，例如，如Yourik等人RNA 25(1)：35-44(2019)(其通过援引以其全文并入本文)中所述并如图2所示。在一些实施例中，如通过此类方法测量的，核酸内切酶结构域的k_exp为1x10^-3-1x10-5min-1。In some embodiments, the endonuclease domain releases the target after cleavage. In some embodiments, the release of the target is indirectly indicated by evaluating multiple turnovers of the enzyme, for example, as described in Yourik et al. RNA 25(1): 35-44 (2019) (which is incorporated herein by reference in its entirety) and shown in Figure 2. In some embodiments, the _kexp of the endonuclease domain is ^{1x10-3-1x10-5min} -1 as measured by such methods.

在一些实施例中，核酸内切酶结构域在体外具有大于约1x10⁸s^-1M^-1的催化效率(k_cat/K_m)。在实施例中，核酸内切酶结构域在体外具有大于约1x10⁵、1x10⁶、1x10⁷或1x10⁸s^- ¹M^-1的催化效率。在实施例中，催化效率如Chen等人(2018)Science[科学]360(6387)：436-439(其通过援引以其全文并入本文)所述确定。在一些实施例中，核酸内切酶结构域在细胞中具有大于约1x10⁸s^-1M^-1的催化效率(k_cat/K_m)。在实施例中，核酸内切酶结构域的催化效率在细胞中大于约1x10⁵、1x10⁶、1x10⁷或1x10⁸s^-1M^-1。In some embodiments, the endonuclease domain has a catalytic efficiency (k _cat / K _m ) greater than about 1x10 ⁸ s ^-1 M ^-1 in vitro. In embodiments, the endonuclease domain has a catalytic efficiency greater than about 1x10 ⁵ , 1x10 ⁶ , 1x10 ⁷ or 1x10 ⁸ s ^-1 M ^-1 in vitro. In embodiments, the catalytic efficiency is determined as described in Chen et al ^. (2018) Science 360(6387): 436-439 (which is incorporated herein by reference in its entirety). In some embodiments, the endonuclease domain has a catalytic efficiency (k _cat / K _m ) greater than about 1x10 ⁸ s ^-1 M ^-1 in cells. In embodiments, the catalytic efficiency of the endonuclease domain is greater than about 1x10 ⁵ , 1x10 ⁶ , 1x10 ⁷ or 1x10 ⁸ s ^-1 M ^-1 in cells.

包含Cas结构域的基因修饰多肽Gene modifying polypeptides containing Cas domains

在一些实施例中，本文所述的基因修饰多肽包含Cas结构域。在一些实施例中，Cas结构域可以将基因修饰多肽引导至gRNA间隔子指定的靶位点，从而“顺式”修饰靶核酸序列。在一些实施例中，基因修饰多肽与Cas结构域融合。在一些实施例中，基因修饰多肽包含CRISPR/Cas结构域(在本文中也称为CRISPR相关蛋白)。在一些实施例中，CRISPR/Cas结构域包含参与成簇的调节间隔短回文重复序列(CRISPR)系统的蛋白质(例如Cas蛋白)，并且任选地结合指导RNA，例如单指导RNA(sgRNA)。In some embodiments, the gene modification polypeptide described herein comprises a Cas domain. In some embodiments, the Cas domain can guide the gene modification polypeptide to the target site specified by the gRNA spacer, thereby "cis" modifying the target nucleic acid sequence. In some embodiments, the gene modification polypeptide is fused to the Cas domain. In some embodiments, the gene modification polypeptide comprises a CRISPR/Cas domain (also referred to herein as a CRISPR-associated protein). In some embodiments, the CRISPR/Cas domain comprises proteins (e.g., Cas proteins) involved in a clustered regulatory interspaced short palindromic repeat (CRISPR) system, and is optionally combined with a guide RNA, such as a single guide RNA (sgRNA).

CRISPR系统是最初在细菌和古细菌中发现的自适应防御系统。CRISPR系统使用称为CRISPR相关或“Cas”核酸内切酶的RNA引导性核酸酶(例如，Cas9或Cpf1)来切割外来DNA。例如，在典型的CRISPR-Cas系统中，核酸内切酶通过靶向单链或双链DNA序列的序列特异性的非编码“指导RNA”定向到靶核苷酸序列(例如，基因组中待序列编辑的位点)。已经鉴定了三类(I-III)CRISPR系统。II类CRISPR系统使用单个Cas核酸内切酶(而不是多个Cas蛋白)。一种II类CRISPR系统包括II型Cas核酸内切酶，例如Cas9、CRISPR RNA(“crRNA”)和反式激活crRNA(“tracrRNA”)。crRNA含有“间隔子”序列，即通常对应于靶DNA序列的约20个核苷酸的RNA序列(“原间隔子”)。在野生型系统和一些工程改造的系统中，crRNA还包含与tracrRNA结合的区域，以形成被RNA酶III切割的部分双链结构，产生crRNA/tracrRNA杂交分子。然后，crRNA/tracrRNA杂交体指导Cas核酸内切酶识别并切割靶DNA序列。靶DNA序列通常与“原间隔子相邻基序”(“PAM”)相邻，该基序对于给定的Cas核酸内切酶具有特异性，并且对于与crRNA间隔子匹配的靶位点处的切割活性是需要的。从不同原核物种鉴定的CRISPR核酸内切酶具有独特的PAM序列要求，例如，如表7中针对示例性Cas酶所列出的；PAM序列的实例包括5′-NGG(化脓性链球菌(Streptococcus pyogenes)；SEQ ID NO：11,019)、5′-NNAGAA(嗜热链球菌(Streptococcus thermophilus)CRISPR1；SEQ ID NO：11,020)、5′-NGGNG(嗜热链球菌CRISPR3；SEQ ID NO：11,021)、和5′-NNNGATT(奈瑟氏脑膜炎双球菌(Neisseria meningiditis)；SEQ ID NO：11,022)。一些核酸内切酶，例如，Cas9核酸内切酶，与富含G的PAM位点(例如，5’-NGG(SEQ ID NO：11,023))相关，并且在距离PAM位点上游(该位点的5’)3个核苷酸的位置处，进行靶DNA的平端切割。另一个II类CRISPR系统包括V型核酸内切酶Cpf1，它比Cas9小；实例包括AsCpf1(来自氨基酸球菌属物种(Acidaminococcussp.))和LbCpf1(来自毛螺旋菌属物种(Lachnospiraceae sp.))。Cpf1相关CRISPR阵列被处理成成熟crRNA，而不需要tracrRNA；换言之，在一些实施例中，Cpf1系统仅包含Cpf1核酸酶和crRNA以切割靶DNA序列。Cpf1核酸内切酶通常与富含T的PAM位点例如5′-TTN相关联。Cpf1也可以识别5′-CTA PAM基序。Cpf1通常通过引入具有4或5个核苷酸的5′突出端的错位或交错的双链断裂来切割靶DNA，例如切割如下靶DNA，该靶DNA中的5个核苷酸的错位或交错的切割位于距离编码链上的PAM位点下游(3′)18个核苷酸的位置处和距离互补链上的PAM位点下游23个核苷酸的位置处；由这样的错位切割产生的5个核苷酸的突出端使得通过同源重组的DNA插入比在平端切割的DNA的插入更精确地进行基因组编辑。参见例如，Zetsche等人(2015)Cell[细胞]，163：759-771。The CRISPR system is an adaptive defense system originally found in bacteria and archaea. The CRISPR system uses RNA-guided nucleases (e.g., Cas9 or Cpf1) called CRISPR-related or "Cas" endonucleases to cut foreign DNA. For example, in a typical CRISPR-Cas system, the endonuclease is directed to a target nucleotide sequence (e.g., a site to be sequenced in a genome) by targeting a sequence-specific non-coding "guide RNA" of a single-stranded or double-stranded DNA sequence. Three types (I-III) of CRISPR systems have been identified. Class II CRISPR systems use a single Cas endonuclease (rather than multiple Cas proteins). A Class II CRISPR system includes type II Cas endonucleases, such as Cas9, CRISPR RNA ("crRNA"), and trans-activating crRNA ("tracrRNA"). CrRNA contains a "spacer" sequence, i.e., an RNA sequence ("protospacer") of about 20 nucleotides that generally corresponds to a target DNA sequence. In wild-type systems and some engineered systems, crRNA also includes a region bound to tracrRNA to form a partial double-stranded structure cut by RNA enzyme III, producing crRNA/tracrRNA hybrid molecules. Then, crRNA/tracrRNA hybrids guide Cas endonucleases to recognize and cut target DNA sequences. The target DNA sequence is usually adjacent to a "protospacer adjacent motif" ("PAM"), which is specific for a given Cas endonuclease and is required for the cutting activity at the target site matched to the crRNA spacer. CRISPR endonucleases identified from different prokaryotic species have unique PAM sequence requirements, e.g., as listed in Table 7 for exemplary Cas enzymes; examples of PAM sequences include 5′-NGG (Streptococcus pyogenes; SEQ ID NO: 11,019), 5′-NNAGAA (Streptococcus thermophilus CRISPR1; SEQ ID NO: 11,020), 5′-NGGNG (Streptococcus thermophilus CRISPR3; SEQ ID NO: 11,021), and 5′-NNNGATT (Neisseria meningiditis; SEQ ID NO: 11,022). Some endonucleases, such as the Cas9 endonuclease, are associated with G-rich PAM sites (e.g., 5'-NGG (SEQ ID NO: 11,023)) and perform blunt-end cleavage of the target DNA at a position 3 nucleotides upstream (5' of the site) of the PAM site. Another class II CRISPR system includes a type V endonuclease, Cpf1, which is smaller than Cas9; examples include AsCpf1 (from Acidaminococcus sp.) and LbCpf1 (from Lachnospiraceae sp.). The Cpf1-associated CRISPR array is processed into mature crRNA without the need for tracrRNA; in other words, in some embodiments, the Cpf1 system comprises only the Cpf1 nuclease and crRNA to cleave the target DNA sequence. The Cpf1 endonuclease is typically associated with T-rich PAM sites, such as 5'-TTN. Cpf1 can also recognize a 5'-CTA PAM motif. Cpf1 typically cuts the target DNA by introducing a misplaced or staggered double-strand break with a 5' overhang of 4 or 5 nucleotides, for example, cutting the target DNA in which the misplaced or staggered cut of 5 nucleotides is located at a position 18 nucleotides downstream (3') from the PAM site on the coding strand and at a position 23 nucleotides downstream from the PAM site on the complementary strand; the 5-nucleotide overhang generated by such misplaced cuts allows DNA insertion by homologous recombination to be more accurately edited than the insertion of DNA cut at the blunt end. See, for example, Zetsche et al. (2015) Cell, 163: 759-771.

多种CRISPR相关(Cas)基因或蛋白可以用于本披露提供的技术中，并且Cas蛋白的选择将取决于该方法的具体条件。Cas蛋白的具体实例包括II类系统，包括Cas1、Cas2、Cas3、Cas4、Cas5、Cas6、Cas7、Cas8、Cas9、Cas10、Cpf1、C2C1或C2C3。在一些实施例中，Cas蛋白(例如，Cas9蛋白)可以来自多种原核物种中的任一种。在一些实施例中，特定Cas蛋白(例如，特定Cas9蛋白)被选择以识别特定的原间隔子邻近基序(PAM)序列。在一些实施例中，DNA结合结构域或核酸内切酶结构域包括靶向多肽(例如Cas蛋白，例如Cas9)的序列。在某些实施例中，Cas蛋白(例如，Cas9蛋白)可以从细菌或古细菌中获得或使用已知方法合成。在某些实施例中，Cas蛋白可以来自革兰氏阳性细菌或革兰氏阴性细菌。在某些实施例中，Cas蛋白可以来自链球菌属(例如，化脓性链球菌或嗜热链球菌)、弗朗西斯菌属(例如，新凶手弗朗西斯菌)、葡萄球菌属(例如，金黄色葡萄球菌)、氨基酸球菌属(例如，氨基酸球菌属物种BV3L6)、奈瑟氏球菌(例如，脑膜炎奈瑟球菌)、隐球菌属、棒状杆菌属、嗜血杆菌属、真细菌属、巴斯德氏菌属、普雷沃氏菌属、韦荣球菌属或海洋杆菌属。A variety of CRISPR-related (Cas) genes or proteins can be used in the technology provided by the present disclosure, and the selection of Cas proteins will depend on the specific conditions of the method. Specific examples of Cas proteins include Class II systems, including Cas1, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, Cas10, Cpf1, C2C1 or C2C3. In some embodiments, Cas proteins (e.g., Cas9 proteins) can be from any of a variety of prokaryotic species. In some embodiments, specific Cas proteins (e.g., specific Cas9 proteins) are selected to recognize specific protospacer adjacent motif (PAM) sequences. In some embodiments, DNA binding domains or endonuclease domains include sequences of targeting polypeptides (e.g., Cas proteins, such as Cas9). In certain embodiments, Cas proteins (e.g., Cas9 proteins) can be obtained from bacteria or archaea or synthesized using known methods. In certain embodiments, Cas proteins can be from Gram-positive bacteria or Gram-negative bacteria. In certain embodiments, the Cas protein can be from Streptococcus (e.g., Streptococcus pyogenes or Streptococcus thermophilus), Francisella (e.g., Francisella novicida), Staphylococcus (e.g., Staphylococcus aureus), Acidaminococcus (e.g., Acidaminococcus species BV3L6), Neisseria (e.g., Neisseria meningitidis), Cryptococcus, Corynebacterium, Haemophilus, Eubacterium, Pasteurella, Prevotella, Veillonella, or Oceanobacter.

在一些实施例中，基因修饰多肽可以包含以下SEQ ID NO：4000的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％、99％同一性的序列。在实施例中，以下SEQ ID NO：4000的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％、99％同一性的序列位于基因修饰多肽的N末端。在实施例中，以下SEQ ID NO：4000的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％、99％同一性的序列位于基因修饰多肽的N末端的1、2、3、4、5、6、7、8、9、10、15、20、25或30个氨基酸内。In some embodiments, the gene-modified polypeptide may comprise the following amino acid sequence of SEQ ID NO: 4000, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereof. In embodiments, the following amino acid sequence of SEQ ID NO: 4000, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereof is located at the N-terminus of the gene-modified polypeptide. In embodiments, the following amino acid sequence of SEQ ID NO: 4000, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereof is located within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25 or 30 amino acids of the N-terminus of the gene-modified polypeptide.

示例性N末端NLS-Cas9结构域Exemplary N-terminal NLS-Cas9 domains

在一些实施例中，基因修饰多肽可以包含以下SEQ ID NO：4001的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％、99％同一性的序列。在实施例中，以下SEQ ID NO：4001的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％、99％同一性的序列位于基因修饰多肽的C末端。在实施例中，以下SEQ ID NO：4001的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％、99％同一性的序列位于基因修饰多肽的C末端的1、2、3、4、5、6、7、8、9、10、15、20、25或30个氨基酸内。In some embodiments, the gene-modified polypeptide may comprise the following amino acid sequence of SEQ ID NO: 4001, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereto. In embodiments, the following amino acid sequence of SEQ ID NO: 4001, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereto is located at the C-terminus of the gene-modified polypeptide. In embodiments, the following amino acid sequence of SEQ ID NO: 4001, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereto is located within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25 or 30 amino acids of the C-terminus of the gene-modified polypeptide.

包含NLS的示例性C末端序列Exemplary C-terminal sequences containing NLS

示例性基准序列Exemplary Benchmark Sequences

在一些实施例中，基因修饰多肽可包含如表7或8中所列的Cas结构域或其功能片段，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％、99％同一性的序列。In some embodiments, the gene-modified polypeptide may comprise a Cas domain as listed in Table 7 or 8, or a functional fragment thereof, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereto.

表7.CRISPR/Cas蛋白、物种和突变Table 7. CRISPR/Cas proteins, species and mutations

表8 CRISPR/Cas蛋白的氨基酸序列、物种和突变Table 8 Amino acid sequences, species and mutations of CRISPR/Cas proteins

在一些实施例中，Cas蛋白需要原间隔子邻近基序(PAM)存在于靶DNA序列中或邻近靶DNA序列，以便Cas蛋白结合和/或发挥功能。在一些实施例中，PAM从5′至3′是或包含NGG(SEQ ID NO：11,024)、YG(SEQ ID NO：11,025)、NNGRRT(SEQ ID NO：11,026)、NNNRRT(SEQ ID NO：11,027)、NGA(SEQ ID NO：11,029)、TYCV(SEQ ID NO：11,030)、TATV(SEQ IDNO：11,031)、NTTN(SEQ ID NO：11,032)或NNNGATT(SEQ ID NO：11,033)，其中N代表任何核苷酸，Y代表C或T，R代表A或G，并且V代表A或C或G。在一些实施例中，Cas蛋白是表7或8中列出的蛋白。在一些实施例中，Cas蛋白包含一个或多个改变其PAM的突变。在一些实施例中，Cas蛋白包含E1369R、E1449H和R1556A突变或对应于所述位置的氨基酸的类似取代。在一些实施例中，Cas蛋白包含E782K、N968K和R1015H突变或对应于所述位置的氨基酸的类似取代。在一些实施例中，Cas蛋白包含D1135V、R1335Q和T1337R突变或对应于所述位置的氨基酸的类似取代。在一些实施例中，Cas蛋白包含S542R和K607R突变或对应于所述位置的氨基酸的类似取代。在一些实施例中，Cas蛋白包含S542R、K548V和N552R突变或对应于所述位置的氨基酸的类似取代。工程改造Cas酶以识别改变的PAM序列的示例性进展综述于Collias等人Nature Communications[自然通讯]12：555(2021)，其通过援引以其全文并入本文。In some embodiments, the Cas protein requires a protospacer adjacent motif (PAM) to be present in or adjacent to the target DNA sequence in order for the Cas protein to bind and/or function. In some embodiments, the PAM from 5′ to 3′ is or comprises NGG (SEQ ID NO: 11,024), YG (SEQ ID NO: 11,025), NNGRRT (SEQ ID NO: 11,026), NNNRRT (SEQ ID NO: 11,027), NGA (SEQ ID NO: 11,029), TYCV (SEQ ID NO: 11,030), TATV (SEQ ID NO: 11,031), NTTN (SEQ ID NO: 11,032) or NNNGATT (SEQ ID NO: 11,033), wherein N represents any nucleotide, Y represents C or T, R represents A or G, and V represents A or C or G. In some embodiments, the Cas protein is a protein listed in Table 7 or 8. In some embodiments, the Cas protein comprises one or more mutations that change its PAM. In some embodiments, the Cas protein comprises E1369R, E1449H and R1556A mutations or similar substitutions of amino acids corresponding to the positions. In some embodiments, the Cas protein comprises E782K, N968K and R1015H mutations or similar substitutions of amino acids corresponding to the positions. In some embodiments, the Cas protein comprises D1135V, R1335Q and T1337R mutations or similar substitutions of amino acids corresponding to the positions. In some embodiments, the Cas protein comprises S542R and K607R mutations or similar substitutions of amino acids corresponding to the positions. In some embodiments, the Cas protein comprises S542R, K548V and N552R mutations or similar substitutions of amino acids corresponding to the positions. Exemplary progress in engineering Cas enzymes to recognize altered PAM sequences is reviewed in Collias et al. Nature Communications [Natural Communications] 12: 555 (2021), which is incorporated herein by reference in its entirety.

在一些实施例中，Cas蛋白具有催化活性并切割靶DNA位点的一条或两条链。在一些实施例中，切割靶DNA位点之后形成改变，例如插入或缺失，例如通过细胞修复机制。In some embodiments, the Cas protein has catalytic activity and cuts one or both strands of the target DNA site. In some embodiments, the target DNA site is cut and then an alteration is formed, such as an insertion or deletion, such as by a cellular repair mechanism.

在一些实施例中，Cas蛋白被修饰以失活或部分失活核酸酶，例如，核酸酶缺陷型Cas9。而在由gRNA靶向的特异性DNA序列上，野生型Cas9产生双链断裂(DSB)，具有修饰的功能性的许多CRISPR核酸内切酶是可得的，例如：部分失活的Cas9“切口酶”版本仅产生单链断裂；无催化活性的Cas9(“dCas9”)不会切割靶DNA。在一些实施例中，dCas9与DNA序列的结合可以通过空间位阻干扰该位点处的转录。在一些实施例中，dCas9与锚定序列的结合可以干扰(例如，减少或阻止)基因组复合物(例如，ASMC)的形成和/或维持。在一些实施例中，DNA结合结构域包含无催化活性的Cas9，例如dCas9。许多无催化活性的Cas9蛋白是本领域已知的。在一些实施例中，dCas9包含Cas蛋白的每个核酸内切酶结构域中的突变，例如D10A和H840A或N863A突变。在一些实施例中，无催化活性或部分无催化活性的CRISPR/Cas结构域包含Cas蛋白，该Cas蛋白包含一个或多个突变，例如表7中列出的一个或多个突变。在一些实施例中，在表7的给定行中描述的Cas蛋白包含在表7的同一行中列出的突变中的一个、两个、三个或所有。在一些实施例中，例如未在表7中描述的Cas蛋白包含在表7的行中列出的突变中的一个、两个、三个或所有或在该Cas蛋白中相应位点处的相应突变。In some embodiments, the Cas protein is modified to inactivate or partially inactivate the nuclease, for example, a nuclease-deficient Cas9. While on a specific DNA sequence targeted by a gRNA, the wild-type Cas9 produces a double-strand break (DSB), and many CRISPR endonucleases with modified functionality are available, for example: a partially inactivated version of the Cas9 "nickase" produces only single-strand breaks; a catalytically inactive Cas9 ("dCas9") does not cut the target DNA. In some embodiments, the binding of dCas9 to a DNA sequence can interfere with transcription at the site by steric hindrance. In some embodiments, the binding of dCas9 to an anchor sequence can interfere with (e.g., reduce or prevent) the formation and/or maintenance of a genomic complex (e.g., ASMC). In some embodiments, the DNA binding domain comprises a catalytically inactive Cas9, such as dCas9. Many catalytically inactive Cas9 proteins are known in the art. In some embodiments, dCas9 comprises mutations in each endonuclease domain of the Cas protein, such as D10A and H840A or N863A mutations. In some embodiments, the catalytically inactive or partially inactive CRISPR/Cas domain comprises a Cas protein comprising one or more mutations, such as one or more mutations listed in Table 7. In some embodiments, the Cas protein described in a given row of Table 7 comprises one, two, three, or all of the mutations listed in the same row of Table 7. In some embodiments, the Cas protein, e.g., not described in Table 7, comprises one, two, three, or all of the mutations listed in a row of Table 7 or corresponding mutations at corresponding sites in the Cas protein.

在一些实施例中，无催化活性的例如dCas9或部分失活的Cas9蛋白包含D11突变(例如D11A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含H969突变(例如H969A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含N995突变(例如N995A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9包含在位置D11、H969和N995中的一个、两个或三个处的突变(例如，D11A、H969A和N995A突变)或对应于所述位置的氨基酸的类似取代。In some embodiments, catalytically inactive, for example, dCas9 or partially inactivated Cas9 proteins include D11 mutations (e.g., D11A mutations) or similar substitutions corresponding to the amino acids at the positions. In some embodiments, catalytically inactive Cas9 proteins, such as dCas9 or partially inactivated Cas9 proteins, include H969 mutations (e.g., H969A mutations) or similar substitutions corresponding to the amino acids at the positions. In some embodiments, catalytically inactive Cas9 proteins, such as dCas9 or partially inactivated Cas9 proteins, include N995 mutations (e.g., N995A mutations) or similar substitutions corresponding to the amino acids at the positions. In some embodiments, catalytically inactive Cas9 proteins, such as dCas9, include mutations (e.g., D11A, H969A, and N995A mutations) at one, two, or three positions in positions D11, H969, and N995, or similar substitutions corresponding to the amino acids at the positions.

在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含D10突变(例如D10A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含H557突变(例如H557A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9包含D10突变(例如，D10A突变)和H557突变(例如，H557A突变)或对应于所述位置的氨基酸的类似取代。In some embodiments, a catalytically inactive Cas9 protein, such as a dCas9 or partially inactivated Cas9 protein, comprises a D10 mutation (e.g., a D10A mutation) or a similar substitution of an amino acid corresponding to the position. In some embodiments, a catalytically inactive Cas9 protein, such as a dCas9 or partially inactivated Cas9 protein, comprises a H557 mutation (e.g., an H557A mutation) or a similar substitution of an amino acid corresponding to the position. In some embodiments, a catalytically inactive Cas9 protein, such as a dCas9, comprises a D10 mutation (e.g., a D10A mutation) and a H557 mutation (e.g., an H557A mutation) or a similar substitution of an amino acid corresponding to the position.

在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含D839突变(例如D839A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含H840突变(例如H840A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含N863突变(例如N863A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9包含D10突变(例如D10A)、D839突变(例如D839A)、H840突变(例如H840A)和N863突变(例如N863A)或对应于所述位置的氨基酸的类似取代。In some embodiments, a catalytically inactive Cas9 protein, such as a dCas9 or partially inactivated Cas9 protein, comprises a D839 mutation (e.g., a D839A mutation) or a similar substitution corresponding to the amino acid at the position. In some embodiments, a catalytically inactive Cas9 protein, such as a dCas9 or partially inactivated Cas9 protein, comprises a H840 mutation (e.g., an H840A mutation) or a similar substitution corresponding to the amino acid at the position. In some embodiments, a catalytically inactive Cas9 protein, such as a dCas9 or partially inactivated Cas9 protein, comprises a N863 mutation (e.g., an N863A mutation) or a similar substitution corresponding to the amino acid at the position. In some embodiments, a catalytically inactive Cas9 protein, such as a dCas9, comprises a D10 mutation (e.g., D10A), a D839 mutation (e.g., D839A), an H840 mutation (e.g., H840A), and an N863 mutation (e.g., N863A) or a similar substitution corresponding to the amino acid at the position.

在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含E993突变(例如E993A突变)或对应于所述位置的氨基酸的类似取代。In some embodiments, a catalytically inactive Cas9 protein, such as a dCas9 or a partially inactive Cas9 protein, comprises an E993 mutation (eg, an E993A mutation) or a similar substitution of an amino acid corresponding to that position.

在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含D917突变(例如D917A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含E1006突变(例如E1006A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含D1255突变(例如D1255A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9包含D917突变(例如D917A)、E1006突变(例如E1006A)和D1255突变(例如D1255A)或对应于所述位置的氨基酸的类似取代。In some embodiments, a catalytically inactive Cas9 protein, such as a dCas9 or partially inactivated Cas9 protein, comprises a D917 mutation (e.g., a D917A mutation) or a similar substitution corresponding to the amino acid at the position. In some embodiments, a catalytically inactive Cas9 protein, such as a dCas9 or partially inactivated Cas9 protein, comprises an E1006 mutation (e.g., an E1006A mutation) or a similar substitution corresponding to the amino acid at the position. In some embodiments, a catalytically inactive Cas9 protein, such as a dCas9 or partially inactivated Cas9 protein, comprises a D1255 mutation (e.g., a D1255A mutation) or a similar substitution corresponding to the amino acid at the position. In some embodiments, a catalytically inactive Cas9 protein, such as a dCas9, comprises a D917 mutation (e.g., D917A), an E1006 mutation (e.g., E1006A), and a D1255 mutation (e.g., D1255A) or a similar substitution corresponding to the amino acid at the position.

在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含D16突变(例如D16A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含D587突变(例如D587A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，部分失活的Cas结构域具有切口酶活性。在一些实施例中，部分失活的Cas9结构域是Cas9切口酶结构域。在一些实施例中，无催化活性的Cas结构域或失活Cas结构域不产生可检测的双链断裂形成。在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含H588突变(例如H588A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9或部分失活的Cas9蛋白包含N611突变(例如N611A突变)或对应于所述位置的氨基酸的类似取代。在一些实施例中，无催化活性的Cas9蛋白例如dCas9包含D16突变(例如D16A)、D587突变(例如D587A)、H588突变(例如H588A)和N611突变(例如N611A)或对应于所述位置的氨基酸的类似取代。In some embodiments, Cas9 proteins without catalytic activity, such as dCas9 or partially inactivated Cas9 proteins, include D16 mutations (e.g., D16A mutations) or similar substitutions corresponding to the amino acids at the positions. In some embodiments, Cas9 proteins without catalytic activity, such as dCas9 or partially inactivated Cas9 proteins, include D587 mutations (e.g., D587A mutations) or similar substitutions corresponding to the amino acids at the positions. In some embodiments, the partially inactivated Cas domain has nickase activity. In some embodiments, the partially inactivated Cas9 domain is a Cas9 nickase domain. In some embodiments, Cas domains without catalytic activity or inactivated Cas domains do not produce detectable double-strand break formation. In some embodiments, Cas9 proteins without catalytic activity, such as dCas9 or partially inactivated Cas9 proteins, include H588 mutations (e.g., H588A mutations) or similar substitutions corresponding to the amino acids at the positions. In some embodiments, Cas9 proteins without catalytic activity, such as dCas9 or partially inactivated Cas9 proteins, include N611 mutations (e.g., N611A mutations) or similar substitutions corresponding to the amino acids at the positions. In some embodiments, the catalytically inactive Cas9 protein, e.g., dCas9, comprises a D16 mutation (e.g., D16A), a D587 mutation (e.g., D587A), a H588 mutation (e.g., H588A), and a N611 mutation (e.g., N611A), or similar substitutions of the amino acids corresponding to said positions.

在一些实施例中，DNA结合结构域或核酸内切酶结构域可以包含Cas分子，该Cas分子包含或连接(例如，共价地)gRNA(例如，模板核酸，例如，包含gRNA的模板RNA)。In some embodiments, the DNA binding domain or the endonuclease domain can comprise a Cas molecule that comprises or is linked (e.g., covalently) to a gRNA (e.g., a template nucleic acid, e.g., a template RNA comprising a gRNA).

在一些实施例中，核酸内切酶结构域或DNA结合结构域包含化脓性链球菌Cas9(SpCas9)或其功能片段或变体。在一些实施例中，核酸内切酶结构域或DNA结合结构域包含经修饰的SpCas9。在实施例中，经修饰的SpCas9包含改变了原间隔子邻近基序(PAM)特异性的修饰。在实施例中，PAM对核酸序列5′-NGT-3′具有特异性。在实施例中，经修饰的SpCas9包含例如在位置L1111、D1135、G1218、E1219、A1322、或R1335中的一个或多个处的一个或多个氨基酸取代，例如，该一个或多个氨基酸取代选自L1111R、D1135V、G1218R、E1219F、A1322R、R1335V。在实施例中，经修饰的SpCas9包含氨基酸取代T1337R和一个或多个另外的氨基酸取代，例如，该一个或多个另外的氨基酸取代选自L1111、D1135L、S1136R、G1218S、E1219V、D1332A、D1332S、D1332T、D1332V、D1332L、D1332K、D1332R、R1335Q、T1337、T1337L、T1337Q、T1337I、T1337V、T1337F、T1337S、T1337N、T1337K、T1337H、T1337Q、和T1337M，或其对应的氨基酸取代。在实施例中，经修饰的SpCas9包含：(i)一个或多个氨基酸取代，其选自D1135L、S1136R、G1218S、E1219V、A1322R、R1335Q、和T1337；以及(ii)一个或多个氨基酸取代，其选自L1111R、G1218R、E1219F、D1332A、D1332S、D1332T、D1332V、D1332L、D1332K、D1332R、T1337L、T1337I、T1337V、T1337F、T1337S、T1337N、T1337K、T1337R、T1337H、T1337Q、和T1337M，或其对应的氨基酸取代。In some embodiments, the endonuclease domain or DNA binding domain comprises Streptococcus pyogenes Cas9 (SpCas9) or a functional fragment or variant thereof. In some embodiments, the endonuclease domain or DNA binding domain comprises a modified SpCas9. In an embodiment, the modified SpCas9 comprises a modification that changes the specificity of the protospacer adjacent motif (PAM). In an embodiment, the PAM is specific for the nucleic acid sequence 5′-NGT-3′. In an embodiment, the modified SpCas9 comprises one or more amino acid substitutions, for example, at one or more of positions L1111, D1135, G1218, E1219, A1322, or R1335, for example, the one or more amino acid substitutions are selected from L1111R, D1135V, G1218R, E1219F, A1322R, R1335V. In an embodiment, the modified SpCas9 comprises the amino acid substitution T1337R and one or more additional amino acid substitutions, for example, the one or more additional amino acid substitutions are selected from L1111, D1135L, S1136R, G1218S, E1219V, D1332A, D1332S, D1332T, D1332V, D1332L, D1332K, D1332R, R1335Q, T1337, T1337L, T1337Q, T1337I, T1337V, T1337F, T1337S, T1337N, T1337K, T1337H, T1337Q, and T1337M, or their corresponding amino acid substitutions. In an embodiment, the modified SpCas9 comprises: (i) one or more amino acid substitutions selected from D1135L, S1136R, G1218S, E1219V, A1322R, R1335Q, and T1337; and (ii) one or more amino acid substitutions selected from L1111R, G1218R, E1219F, D1332A, D1332S, D1332T, D1332V, D1332L, D1332K, D1332R, T1337L, T1337I, T1337V, T1337F, T1337S, T1337N, T1337K, T1337R, T1337H, T1337Q, and T1337M, or their corresponding amino acid substitutions.

在一些实施例中，核酸内切酶结构域或DNA结合结构域包含Cas结构域，例如Cas9结构域。在实施例中，核酸内切酶结构域或DNA结合结构域包含核酸酶活性Cas结构域、Cas切口酶(nCas)结构域或无核酸酶活性Cas(dCas)结构域。在实施例中，核酸内切酶结构域或DNA结合结构域包含核酸酶活性Cas9结构域、Cas9切口酶(nCas9)结构域或无核酸酶活性Cas9(dCas9)结构域。在一些实施例中，核酸内切酶结构域或DNA结合结构域包含Cas9的结构域Cas9(例如，dCas9和nCas9)、Cas12a/Cpf1、Cas12b/C2c1、Cas12c/C2c3、Cas12d/CasY、Cas12e/CasX、Cas12g、Cas12h、或Cas12i。在一些实施例中，核酸内切酶结构域或DNA结合结构域包含Cas9(例如，dCas9和nCas9)、Cas12a/Cpf1、Cas12b/C2c1、Cas12c/C2c3、Cas12d/CasY、Cas12e/CasX、Cas12g、Cas12h、或Cas12i。在一些实施例中，核酸内切酶结构域或DNA结合结构域包含化脓性链球菌或嗜热链球菌Cas9，或其功能片段。在一些实施例中，核酸内切酶结构域或DNA结合结构域包含Cas9序列，例如，如Chylinski、Rhun，和Charpentier(2013)RNA Biology[RNA生物学]10：5、726-737中所述；该文献通过援引并入本文。在一些实施例中，核酸内切酶结构域或DNA结合结构域包含Cas的HNH核酸酶亚结构域和/或RuvC1亚结构域，例如，如本文所述的Cas9，或其变体。在一些实施例中，核酸内切酶结构域或DNA结合结构域包含Cas12a/Cpf1、Cas12b/C2c1、Cas12c/C2c3、Cas12d/CasY、Cas12e/CasX、Cas12g、Cas12h、或Cas12i。在一些实施例中，核酸内切酶结构域或DNA结合结构域包含Cas多肽(例如酶)或其功能片段。在实施例中，Cas多肽(例如，酶)选自Cas1、Cas1B、Cas2、Cas3、Cas4、Cas5、Cas5d、Cas5t、Cas5h、Cas5a、Cas6、Cas7、Cas8、Cas8a、Cas8b、Cas8c、Cas9(例如，Csn1或Csx12)、Cas10、Cas10d、Cas12a/Cpf1、Cas12b/C2c1、Cas12c/C2c3、Cas12d/CasY、Cas12e/CasX、Cas12g、Cas12h、Cas12i、Csy1、Csy2、Csy3、Csy4、Cse1、Cse2、Cse3、Cse4、Cse5e、Csc1、Csc2、Csa5、Csn1、Csn2、Csm1、Csm2、Csm3、Csm4、Csm5、Csm6、Cmr1、Cmr3、Cmr4、Cmr5、Cmr6、Csb1、Csb2、Csb3、Csx17、Csx14、Csx10、Csx16、CsaX、Csx3、Csx1、Csx1S、Csx11、Csf1、Csf2、CsO、Csf4、Csd1、Csd2、Cst1、Cst2、Csh1、Csh2、Csa1、Csa2、Csa3、Csa4、Csa5、II型Cas效应子蛋白、V型Cas效应子蛋白、VI型Cas效应子蛋白、CARF、DinG、Cpf1、Cas12b/C2c1、Cas12c/C2c3、Cas12b/C2c1、Cas12c/C2c3、SpCas9(K855A)、eSpCas9(1.1)、SpCas9-HF1、超精确的Cas9变体(HypaCas9)、其同源物、其经修饰的或经工程改造的版本、和/或其功能片段。在实施例中，Cas9包含例如选自H840A、D10A、P475A、W476A、N477A、D1125A、W1126A、和D1127A的一个或多个取代。在实施例中，Cas9包含在选自以下的位置处的一个或多个突变：D10、G12、G17、E762、H840、N854、N863、H982、H983、A984、D986、和/或A987，例如，选自D10A、G12A、G17A、E762A、H840A、N854A、N863A、H982A、H983A、A984A、和/或D986A的一个或多个取代。在一些实施例中，核酸内切酶结构域或DNA结合结构域包含来自以下的Cas(例如，Cas9)序列或其片段或变体：溃疡棒状杆菌(Corynebacterium ulcerans)、白喉棒状杆菌、梅毒螺原体(Spiroplasma syrphidicola)、中间普雷沃氏菌(Prevotellaintermedia)、台湾螺原体(Spiroplasma taiwanense)、海豚链球菌(Streptococcusiniae)、波罗的海贝尔氏菌(Belliella baltica)、扭曲冷弯曲菌(Psychroflexustorquis)、嗜热链球菌、无害李斯特菌(Listeria innocua)、空肠弯曲杆菌、脑膜炎奈瑟球菌、化脓性链球菌或金黄色葡萄球菌。In some embodiments, the endonuclease domain or DNA binding domain comprises a Cas domain, such as a Cas9 domain. In an embodiment, the endonuclease domain or DNA binding domain comprises a nuclease active Cas domain, a Cas nickase (nCas) domain, or a nuclease-free Cas (dCas) domain. In an embodiment, the endonuclease domain or DNA binding domain comprises a nuclease active Cas9 domain, a Cas9 nickase (nCas9) domain, or a nuclease-free Cas9 (dCas9) domain. In some embodiments, the endonuclease domain or DNA binding domain comprises a domain Cas9 (e.g., dCas9 and nCas9), Cas12a/Cpf1, Cas12b/C2c1, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, or Cas12i of Cas9. In some embodiments, the endonuclease domain or DNA binding domain comprises Cas9 (e.g., dCas9 and nCas9), Cas12a/Cpf1, Cas12b/C2c1, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, or Cas12i. In some embodiments, the endonuclease domain or DNA binding domain comprises Streptococcus pyogenes or Streptococcus thermophilus Cas9, or a functional fragment thereof. In some embodiments, the endonuclease domain or DNA binding domain comprises a Cas9 sequence, for example, as described in Chylinski, Rhun, and Charpentier (2013) RNA Biology 10: 5, 726-737; the document is incorporated herein by reference. In some embodiments, the endonuclease domain or DNA binding domain comprises the HNH nuclease subdomain and/or RuvC1 subdomain of Cas, for example, Cas9 as described herein, or a variant thereof. In some embodiments, the endonuclease domain or DNA binding domain comprises Cas12a/Cpf1, Cas12b/C2c1, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, or Cas12i. In some embodiments, the endonuclease domain or DNA binding domain comprises a Cas polypeptide (e.g., an enzyme) or a functional fragment thereof. In an embodiment, the Cas polypeptide (e.g., an enzyme) is selected from Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5d, Cas5t, Cas5h, Cas5a, Cas6, Cas7, Cas8, Cas8a, Cas8b, Cas8c, Cas9 (e.g., Csn1 or Csx12), Cas10, Cas10d, Cas12a/Cpf1, Cas12b/C2c1, Cas1 2c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, Cas12i, Csy1, Csy2, Csy3, Csy4, Cse1, Cse2, Cse3, Cse4, Cse5e, Csc1, Csc2, Csa5, Csn1, Csn2, Csm1, Csm2, Csm3, Csm4, Csm5 , Csm6, Cmr1, Cmr3, Cmr4 、Cmr5、Cmr6、Csb1、Csb2、Csb3、Csx17、Csx14、Csx10、Csx16、CsaX、Csx3、Csx1、Csx1S、Csx11、Csf1、Csf2、CsO、Csf4、Csd1、Csd2、Cst1、Cst2、Csh1、Csh2、Csa1、Csa2、Csa3、Csa4、Csa5、Type II Cas effector protein、Type V Cas Effector protein, type VI Cas effector protein, CARF, DinG, Cpf1, Cas12b/C2c1, Cas12c/C2c3, Cas12b/C2c1, Cas12c/C2c3, SpCas9 (K855A), eSpCas9 (1.1), SpCas9-HF1, ultra-precise Cas9 variant (HypaCas9), homologs thereof, modified or engineered versions thereof, and/or functional fragments thereof. In an embodiment, Cas9 comprises one or more substitutions, for example, selected from H840A, D10A, P475A, W476A, N477A, D1125A, W1126A, and D1127A. In an embodiment, Cas9 comprises one or more mutations at positions selected from: D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or A987, for example, one or more substitutions selected from D10A, G12A, G17A, E762A, H840A, N854A, N863A, H982A, H983A, A984A, and/or D986A. In some embodiments, the endonuclease domain or DNA binding domain comprises a Cas (e.g., Cas9) sequence or a fragment or variant thereof from Corynebacterium ulcerans, Corynebacterium diphtheriae, Spiroplasma syrphidicola, Prevotella intermedia, Spiroplasma taiwanense, Streptococcus iniae, Belliella baltica, Psychroflexus torquis, Streptococcus thermophilus, Listeria innocua, Campylobacter jejuni, Neisseria meningitidis, Streptococcus pyogenes, or Staphylococcus aureus.

在一些实施例中，核酸内切酶结构域或DNA结合结构域包含例如包含一个或多个取代(例如，在位置D917、E1006A、D1255处)或其任何组合的Cpf1结构域，该一个或多个取代例如选自D917A、E1006A、D1255A、D917A/E1006A、D917A/D1255A、E1006A/D1255A、和D917A/E1006A/D1255A。In some embodiments, the endonuclease domain or DNA binding domain comprises a Cpf1 domain, e.g., comprising one or more substitutions (e.g., at positions D917, E1006A, D1255), or any combination thereof, the one or more substitutions, e.g., selected from D917A, E1006A, D1255A, D917A/E1006A, D917A/D1255A, E1006A/D1255A, and D917A/E1006A/D1255A.

在一些实施例中，核酸内切酶结构域或DNA结合结构域包含spCas9、spCas9-VRQR(SEQ ID NO：5019)、spCas9-VRER(SEQ ID NO：5020)、xCas9(sp)、saCas9、saCas9-KKH、spCas9-MQKSER(SEQ ID NO：5021)、spCas9-LRKIQK(SEQ ID NO：5022)或spCas9-LRVSQL(SEQ ID NO：5023)。In some embodiments, the nuclease domain or DNA binding domain comprises spCas9, spCas9-VRQR (SEQ ID NO: 5019), spCas9-VRER (SEQ ID NO: 5020), xCas9(sp), saCas9, saCas9-KKH, spCas9-MQKSER (SEQ ID NO: 5021), spCas9-LRKIQK (SEQ ID NO: 5022), or spCas9-LRVSQL (SEQ ID NO: 5023).

在一些实施例中，基因修饰多肽具有包含Cas9切口酶例如Cas9 H840A的核酸内切酶结构域。在实施例中，Cas9 H840A具有以下氨基酸序列：Cas9切口酶(H840A)：In some embodiments, the gene modifying polypeptide has an endonuclease domain comprising a Cas9 nickase, such as Cas9 H840A. In an embodiment, Cas9 H840A has the following amino acid sequence: Cas9 nickase (H840A):

在一些实施例中，基因修饰多肽包含含有D10A和/或H840A突变的dCas9序列，例如，以下序列：In some embodiments, the gene modifying polypeptide comprises a dCas9 sequence comprising a D10A and/or H840A mutation, for example, the following sequence:

TAL效应子和锌指核酸酶TAL effectors and zinc finger nucleases

在一些实施例中，核酸内切酶结构域或DNA结合结构域包含TAL效应子分子。TAL效应子分子，例如特异性结合DNA序列的TAL效应子分子，通常包含多个TAL效应子结构域或其片段，以及任选地天然存在的TAL效应子的一个或多个附加部分(例如，多个TAL效应子结构域的N和/或C末端)。许多TAL效应子是本领域技术人员已知的并且是可商购的，例如从赛默飞世尔科技公司(Thermo Fisher Scientific)商购。In some embodiments, the endonuclease domain or DNA binding domain comprises a TAL effector molecule. TAL effector molecules, such as TAL effector molecules that specifically bind to a DNA sequence, typically comprise multiple TAL effector domains or fragments thereof, and optionally one or more additional portions of naturally occurring TAL effectors (e.g., N and/or C termini of multiple TAL effector domains). Many TAL effectors are known to those skilled in the art and are commercially available, for example, from Thermo Fisher Scientific.

天然存在的TALE是由多种细菌病原体(包括植物病原体黄单胞菌属(Xanthomonas))分泌的天然效应子蛋白，其调节宿主植物中的基因表达并促进细菌定植和存活。TAL效应子的特异性结合基于串联排列的几乎相同的典型33或34个氨基酸重复序列的中心重复结构域(重复可变二残基，RVD结构域)。Naturally occurring TALEs are natural effector proteins secreted by a variety of bacterial pathogens, including the plant pathogen Xanthomonas, that regulate gene expression in host plants and promote bacterial colonization and survival. The specific binding of TAL effectors is based on a central repeat domain (repeat variable diresidue, RVD domain) of nearly identical, typically 33 or 34 amino acid repeats arranged in tandem.

TAL效应子家族的成员主要在其重复序列的数量和顺序上不同。重复序列的数量范围通常为1.5至33.5个重复，并且C末端重复通常长度较短(例如，约20个氨基酸)，并且通常被称为“半重复”。TAL效应子的每个重复序列通常具有一个重复序列对一个碱基对的相关性，其中不同的重复序列类型表现出不同的碱基对特异性(一个重复序列识别靶基因序列上的一个碱基对)。通常，重复序列数量越少，蛋白质-DNA相互作用越弱。已证明6.5个重复序列的数量足以激活报告基因的转录(Scholze等人，2010)。Members of the TAL effector family differ primarily in the number and order of their repeat sequences. The number of repeat sequences typically ranges from 1.5 to 33.5 repeats, and the C-terminal repeat is typically shorter in length (e.g., about 20 amino acids) and is often referred to as a "half repeat". Each repeat sequence of a TAL effector typically has a one repeat sequence to one base pair correlation, with different repeat sequence types exhibiting different base pair specificities (one repeat sequence recognizes one base pair on the target gene sequence). Generally, the fewer the number of repeat sequences, the weaker the protein-DNA interaction. It has been shown that a number of 6.5 repeat sequences is sufficient to activate transcription of a reporter gene (Scholze et al., 2010).

重复序列至重复序列的变异主要发生在氨基酸位置12和13处，因此它们被称为“高变的”，并负责与靶DNA启动子序列相互作用的特异性，如表9所示，其列出了示例性重复序列可变双残基(RVD)及其与核酸碱基靶标的对应关系。Repeat-to-repeat variation occurs primarily at amino acid positions 12 and 13, which are therefore referred to as "hypervariable," and are responsible for the specificity of interaction with the target DNA promoter sequence, as shown in Table 9, which lists exemplary repeat variable diresidues (RVDs) and their correspondence to nucleic acid base targets.

表9-RVD和核酸碱基特异性Table 9 - RVD and nucleic acid base specificity

因此，有可能修饰TAL效应子的重复序列以靶向特定的DNA序列。进一步的研究表明，RVD NK可以靶向G。TAL效应子的靶位点也倾向于包括在被第一重复序列靶向的5′碱基侧翼的T，但这种识别的确切机制尚不清楚。迄今已知超过113种TAL效应子序列。来自黄单胞菌属的TAL效应子的非限制性实例包括Hax2、Hax3、Hax4、AvrXa7、AvrXa10和AvrBs3。Therefore, it is possible to modify the repeat sequence of TAL effectors to target specific DNA sequences. Further studies have shown that RVD NK can target G. The target sites of TAL effectors also tend to include T flanking the 5′ base targeted by the first repeat sequence, but the exact mechanism of this recognition is unclear. More than 113 TAL effector sequences are known to date. Non-limiting examples of TAL effectors from Xanthomonas include Hax2, Hax3, Hax4, AvrXa7, AvrXa10, and AvrBs3.

相应地，本文所述的TAL效应子分子的TAL效应子结构域可以源自来自任何细菌物种(例如黄单胞菌属物种，例如米糠黄单胞菌(Xanthomonas oryzae pv.Oryzae)的非洲菌株(Yu等人2011)、野油菜黄单胞菌萝卜致病变种(Xanthomonas campestris pv.raphani)菌株756C和水稻细菌性条斑病菌(Xanthomonas oryzae pv.oryzicola)菌株BLS256(Bogdanove等人2011))的TAL效应子。在一些实施例中，TAL效应子结构域包含RVD结构域以及也来自天然存在的TAL效应子的一个或多个侧翼序列(RVD结构域的N末端和/或C末端侧上的序列)。它可以包含比天然存在的TAL效应子的RVD更多或更少的重复序列。TAL效应子分子可以被设计成基于上述编码和本领域已知的其他编码来靶向给定的DNA序列。TAL效应子结构域(例如，重复序列(单体或模块))的数量及其特定序列可基于所期望的DNA靶序列来选择。例如，为了适应特定的靶序列，可以去除或添加TAL效应子结构域，例如重复序列。在一个实施例中，本发明的TAL效应子分子包含6.5至33.5个TAL效应子结构域，例如重复序列。在一个实施例中，本发明的TAL效应子分子包含8至33.5个TAL效应子结构域，例如重复序列，例如10至25个TAL效应子结构域，例如重复序列，例如10至14个TAL效应子结构域，例如重复序列。Accordingly, the TAL effector domain of the TAL effector molecules described herein can be derived from TAL effectors from any bacterial species (e.g., Xanthomonas species, such as African strains of Xanthomonas oryzae pv. Oryzae (Yu et al. 2011), Xanthomonas campestris pv. raphani strain 756C, and Xanthomonas oryzae pv. oryzicola strain BLS256 (Bogdanove et al. 2011)). In some embodiments, the TAL effector domain comprises an RVD domain and one or more flanking sequences (sequences on the N-terminal and/or C-terminal side of the RVD domain) that are also from a naturally occurring TAL effector. It may contain more or fewer repeat sequences than the RVD of a naturally occurring TAL effector. TAL effector molecules can be designed to target a given DNA sequence based on the above-mentioned encodings and other encodings known in the art. The number of TAL effector domains (e.g., repeat sequences (monomers or modules)) and their specific sequences can be selected based on the desired DNA target sequence. For example, in order to adapt to a specific target sequence, TAL effector domains, such as repeat sequences, can be removed or added. In one embodiment, the TAL effector molecules of the present invention contain 6.5 to 33.5 TAL effector domains, such as repeat sequences. In one embodiment, the TAL effector molecules of the present invention contain 8 to 33.5 TAL effector domains, such as repeat sequences, such as 10 to 25 TAL effector domains, such as repeat sequences, such as 10 to 14 TAL effector domains, such as repeat sequences.

在一些实施例中，TAL效应子分子包含对应于与DNA靶序列完全匹配的TAL效应子结构域。在一些实施例中，允许DNA靶序列上的重复序列和靶碱基对之间的错配，只要它允许包含TAL效应子分子的多肽的功能。通常，TALE结合与错配数量呈负相关。在一些实施例中，本发明的多肽的TAL效应子分子与靶DNA序列包含不超过7个错配、6个错配、5个错配、4个错配、3个错配、2个错配或1个错配，并且任选地没有错配。不希望受理论束缚，一般来说，TAL效应子分子中TAL效应子结构域的数量越少，将被容许的错配数量就越少，并且仍然允许包含TAL效应子分子的多肽的功能。结合亲和力被认为取决于匹配的重复-DNA组合的总和。例如，具有25个或更多个TAL效应子结构域的TAL效应子分子可能能够耐受多达7个错配。In some embodiments, the TAL effector molecule comprises a TAL effector domain corresponding to a complete match with a DNA target sequence. In some embodiments, mismatches between the repeat sequence and the target base pair on the DNA target sequence are allowed as long as it allows the function of the polypeptide containing the TAL effector molecule. Generally, TALE binding is negatively correlated with the number of mismatches. In some embodiments, the TAL effector molecule of the polypeptide of the present invention contains no more than 7 mismatches, 6 mismatches, 5 mismatches, 4 mismatches, 3 mismatches, 2 mismatches or 1 mismatch with the target DNA sequence, and optionally no mismatches. Without wishing to be bound by theory, in general, the fewer the number of TAL effector domains in the TAL effector molecule, the fewer the number of mismatches that will be tolerated, and the function of the polypeptide containing the TAL effector molecule is still allowed. Binding affinity is believed to depend on the sum of the matched repeat-DNA combinations. For example, a TAL effector molecule with 25 or more TAL effector domains may be able to tolerate up to 7 mismatches.

除了TAL效应子结构域之外，本发明的TAL效应子分子还可以包含源自天然存在的TAL效应子的另外序列。包含在TAL效应子分子的TAL效应子结构域部分每一侧上的一个或多个C末端和/或N末端序列的长度可以变化，并且由本领域技术人员选择，例如基于Zhang等人(2011)的研究。Zhang等人已经表征了Hax3来源的基于TAL效应子的蛋白质中的许多C末端和N末端截短突变体，并且已经鉴定了有助于与靶序列最佳结合并因此激活转录的关键元件。通常，发现转录活性与N末端的长度呈负相关。关于C末端，鉴定了Hax 3序列前68个氨基酸内DNA结合残基的重要元件。因此，在一些实施例中，天然存在的TAL效应子的TAL效应子结构域的C末端侧上的前68个氨基酸包括在TAL效应子分子中。因此，在实施例中，TAL效应子分子包含1)一个或多个源自天然存在的TAL效应子的TAL效应子结构域；2)至少70、80、90、100、110、120、130、140、150、170、180、190、200、220、230、240、250、260、270、280个或更多个来自TAL效应子结构域N末端侧上的天然存在的TAL效应子的氨基酸；和/或3)至少68、80、90、100、110、120、130、140、150、170、180、190、200、220、230、240、250、260个或更多个来自TAL效应子结构域C末端侧上的天然存在的TAL效应子的氨基酸。In addition to the TAL effector domain, the TAL effector molecules of the present invention may also include additional sequences derived from naturally occurring TAL effectors. The length of one or more C-terminal and/or N-terminal sequences contained on each side of the TAL effector domain portion of the TAL effector molecule can vary and be selected by a person skilled in the art, for example based on the studies of Zhang et al. (2011). Zhang et al. have characterized many C-terminal and N-terminal truncation mutants in TAL effector-based proteins derived from Hax3, and have identified key elements that contribute to optimal binding to target sequences and thereby activate transcription. In general, transcriptional activity was found to be negatively correlated with the length of the N-terminus. With respect to the C-terminus, an important element of DNA binding residues within the first 68 amino acids of the Hax 3 sequence was identified. Therefore, in some embodiments, the first 68 amino acids on the C-terminal side of the TAL effector domain of a naturally occurring TAL effector are included in the TAL effector molecule. Thus, in an embodiment, the TAL effector molecule comprises 1) one or more TAL effector domains derived from a naturally occurring TAL effector; 2) at least 70, 80, 90, 100, 110, 120, 130, 140, 150, 170, 180, 190, 200, 220, 230, 240, 250, 260, 270, 280 or more amino acids from a naturally occurring TAL effector on the N-terminal side of the TAL effector domain; and/or 3) at least 68, 80, 90, 100, 110, 120, 130, 140, 150, 170, 180, 190, 200, 220, 230, 240, 250, 260 or more amino acids from a naturally occurring TAL effector on the C-terminal side of the TAL effector domain.

在一些实施例中，核酸内切酶结构域或DNA结合结构域是或包含Zn指分子。Zn指分子包含Zn指蛋白，例如天然存在的Zn指蛋白或工程化的Zn指蛋白、或其片段。许多Zn指蛋白是本领域技术人员已知的并且是可商购的，例如从西格玛奥德里奇公司(Sigma-Aldrich)商购。In some embodiments, the endonuclease domain or the DNA binding domain is or comprises a Zn finger molecule. The Zn finger molecule comprises a Zn finger protein, such as a naturally occurring Zn finger protein or an engineered Zn finger protein, or a fragment thereof. Many Zn finger proteins are known to those skilled in the art and are commercially available, for example, from Sigma-Aldrich.

在一些实施例中，Zn指分子包含非天然存在的Zn指蛋白，其被工程改造以与选择的靶DNA序列结合。例如，参见Beerli等人(2002)Nature Biotechnol.[自然生物技术]20：135-141；Pabo等人(2001)Ann.Rev.Biochem.[生物化学年度综述]70：313-340；Isalan等人(2001)Nature Biotechnol.[自然生物技术]19：656-660；Segal等人(2001)Curr.Opin.Biotechnol.[生物技术当前观点]12：632-637；Choo等人(2000)Curr.Opin.Struct.Biol.[当代结构生物学观点]10：411-416；美国专利号6,453,242、6,534,261、6,599,692、6,503,717、6,689,558、7,030,215、6,794,136、7,067,317、7,262,054、7,070,934、7,361,635、7,253,273；以及美国专利公开号2005/0064474、2007/0218528、2005/0267061，均通过援引以其全文并入本文。In some embodiments, the Zn finger molecule comprises a non-naturally occurring Zn finger protein that is engineered to bind to a selected target DNA sequence. See, for example, Beerli et al. (2002) Nature Biotechnol. 20:135-141; Pabo et al. (2001) Ann. Rev. Biochem. 70:313-340; Isalan et al. (2001) Nature Biotechnol. 19:656-660; Segal et al. (2001) Curr. Opin. Biotechnol. 12:632-637; Choo et al. (2000) Curr. Opin. Struct. Biol. 10:411-416; U.S. Pat. Nos. 6,453,242, 6,534,264. 1, 6,599,692, 6,503,717, 6,689,558, 7,030,215, 6,794,136, 7,067,317, 7,262,054, 7,070,934, 7,361,635, 7,253,273; and U.S. Patent Publication Nos. 2005/0064474, 2007/0218528, and 2005/0267061, all of which are incorporated herein by reference in their entirety.

与天然存在的Zn指蛋白相比，工程改造的Zn指蛋白可能具有新型结合特异性。工程改造方法包括但不限于合理设计和各种类型的选择。合理设计包括，例如，使用包含三联体(或四联体)核苷酸序列和单个Zn指氨基酸序列的数据库，其中每个三联体或四联体核苷酸序列与结合特定三联体或四联体序列的锌指的一个或多个氨基酸序列相关联。参见例如，美国专利号6,453，242和6，534,261，其通过援引以其全文并入本文。Compared to naturally occurring Zn finger proteins, engineered Zn finger proteins may have novel binding specificities. Engineering methods include, but are not limited to, rational design and various types of selection. Rational design includes, for example, using a database comprising triplet (or quadruple) nucleotide sequences and single Zn finger amino acid sequences, wherein each triplet or quadruple nucleotide sequence is associated with one or more amino acid sequences of zinc fingers that bind to a particular triplet or quadruple sequence. See, for example, U.S. Patent Nos. 6,453,242 and 6,534,261, which are incorporated herein by reference in their entirety.

示例性选择方法(包括噬菌体展示和双杂交系统)披露于以下中：美国专利号5,789,538、5,925,523、6,007,988、6,013,453、6,410,248、6,140,466、6,200,759、和6,242,568；以及国际专利公开号WO 98/37186、WO 98/53057、WO 00/27878、以及WO 01/88197和GB2,338,237。另外，增强锌指蛋白的结合特异性已经例如，在国际专利公开号WO 02/077227中描述。Exemplary selection methods (including phage display and two-hybrid systems) are disclosed in the following: U.S. Patent Nos. 5,789,538, 5,925,523, 6,007,988, 6,013,453, 6,410,248, 6,140,466, 6,200,759, and 6,242,568; and International Patent Publication Nos. WO 98/37186, WO 98/53057, WO 00/27878, and WO 01/88197 and GB 2,338,237. In addition, the binding specificity of zinc finger proteins has been enhanced, for example, described in International Patent Publication No. WO 02/077227.

另外，如这些和其他参考文献中所披露的，锌指结构域和/或多指锌指蛋白可以使用任何合适的接头序列(包括例如，长度为5个或更多个氨基酸的接头)连接在一起。另参见美国专利号6,479,626、6,903,185、和7,153,949的示例性接头序列长度为6个或更多个氨基酸。本文所述的蛋白质可以包括蛋白质的单个锌指之间的合适接头的任何组合。另外，增强锌指结合结构域的结合特异性已经例如，在共同拥有的国际专利公开号WO 02/077227中描述。In addition, as disclosed in these and other references, zinc finger domains and/or multi-finger zinc finger proteins can be linked together using any suitable linker sequence (including, for example, a linker of 5 or more amino acids in length). See also U.S. Patent Nos. 6,479,626, 6,903,185, and 7,153,949 for exemplary linker sequences of 6 or more amino acids in length. The proteins described herein may include any combination of suitable linkers between the individual zinc fingers of the protein. In addition, enhancing the binding specificity of zinc finger binding domains has been described, for example, in co-owned International Patent Publication No. WO 02/077227.

Zn指蛋白和用于设计和构建融合蛋白(和编码其的多核苷酸)的方法是本领域技术人员已知的，并在以下中详细描述：美国专利号6,140,0815、789,538、6,453,242、6,534,261、5,925,523、6,007,988、6,013,453、和6,200,759；国际专利公开号WO 95/19431、WO96/06166、WO 98/53057、WO 98/54311、WO 00/27878、WO 01/60970、WO 01/88197、WO 02/099084、WO 98/53058、WO 98/53059、WO 98/53060、WO 02/016536、和WO 03/016496。Zn finger proteins and methods for designing and constructing fusion proteins (and polynucleotides encoding the same) are known to those skilled in the art and are described in detail in the following: U.S. Patent Nos. 6,140,0815; 789,538; 6,453,242; 6,534,261; 5,925,523; 6,007,988; 6,013,453; and 6,200,759; International Patent Publication Nos. WO 95/19431; WO 96/06166; WO 98/53057; WO 98/54311; WO 00/27878; WO 01/60970; WO 01/88197; WO 02/099084; WO 98/53058; WO 98/53059; WO 98/53060; WO 02/016536, and WO 03/016496.

另外，如这些和其他参考文献中所披露的，Zn指蛋白和/或多指Zn指蛋白可以使用任何合适的接头序列(包括例如，长度为5个或更多个氨基酸的接头)连接在一起，例如作为融合蛋白。另参见美国专利号6,479,626；6,903,185、和7,153,949的示例性接头序列长度为6个或更多个氨基酸。本文所述的Zn指分子可以包括Zn指分子的单个锌指蛋白和/或多指Zn指蛋白之间的合适接头的任何组合。In addition, as disclosed in these and other references, Zn finger proteins and/or multi-finger Zn finger proteins can be linked together, for example as fusion proteins, using any suitable linker sequence (including, for example, a linker of 5 or more amino acids in length). See also U.S. Pat. Nos. 6,479,626; 6,903,185, and 7,153,949 for exemplary linker sequences of 6 or more amino acids in length. The Zn finger molecules described herein may include any combination of suitable linkers between the individual zinc finger proteins and/or multi-finger Zn finger proteins of the Zn finger molecule.

在某些实施例中，DNA结合结构域或核酸内切酶结构域包含Zn指分子，该Zn指分子包含与靶DNA序列结合(以序列特异性方式)的工程化锌指蛋白。在一些实施例中，Zn指分子包含一种Zn指蛋白或其片段。在其他实施例中，Zn指分子包含多种Zn指蛋白(或其片段)，例如2、3、4、5、6或更多种Zn指蛋白(并且任选地，不超过12、11、10、9、8、7、6、5、4、3或2种Zn指蛋白)。在一些实施例中，Zn指分子包含至少三种Zn指蛋白。在一些实施例中，Zn指分子包含四个、五个或六个指。在一些实施例中，Zn指分子包含8、9、10、11或12个指。在一些实施例中，包含三种Zn指蛋白的Zn指分子识别包含9或10个核苷酸的靶DNA序列。在一些实施例中，包含四种Zn指蛋白的Zn指分子识别包含12至14个核苷酸的靶DNA序列。在一些实施例中，包含六种Zn指蛋白的Zn指分子识别包含18至21个核苷酸的靶DNA序列。In certain embodiments, the DNA binding domain or the endonuclease domain comprises a Zn finger molecule comprising an engineered zinc finger protein that binds to a target DNA sequence (in a sequence-specific manner). In some embodiments, the Zn finger molecule comprises a Zn finger protein or a fragment thereof. In other embodiments, the Zn finger molecule comprises a plurality of Zn finger proteins (or fragments thereof), such as 2, 3, 4, 5, 6 or more Zn finger proteins (and optionally, no more than 12, 11, 10, 9, 8, 7, 6, 5, 4, 3 or 2 Zn finger proteins). In some embodiments, the Zn finger molecule comprises at least three Zn finger proteins. In some embodiments, the Zn finger molecule comprises four, five or six fingers. In some embodiments, the Zn finger molecule comprises 8, 9, 10, 11 or 12 fingers. In some embodiments, a Zn finger molecule comprising three Zn finger proteins recognizes a target DNA sequence comprising 9 or 10 nucleotides. In some embodiments, a Zn finger molecule comprising four Zn finger proteins recognizes a target DNA sequence comprising 12 to 14 nucleotides. In some embodiments, a Zn finger molecule comprising six Zn finger proteins recognizes a target DNA sequence comprising 18 to 21 nucleotides.

在一些实施例中，Zn指分子包含双手Zn指蛋白。双手锌指蛋白是这样的蛋白质，其中两簇锌指蛋白被间插氨基酸分开，使得两个锌指结构域与两个不连续的靶DNA序列结合。双手型锌指结合蛋白的实例是SIP1，其中四种锌指蛋白的簇位于蛋白质的氨基末端处，并且三种Zn指蛋白的簇位于羧基末端处(参见Remade等人(1999)EMBO Journal[欧洲分子生物学杂志]18(18)：5073-5084)。这些蛋白质中的每一簇锌指均能够与独特的靶序列结合，并且这两个靶序列之间的间隔可以包含许多核苷酸。In some embodiments, the Zn finger molecule comprises a two-handed Zn finger protein. A two-handed zinc finger protein is a protein in which two clusters of zinc finger proteins are separated by intervening amino acids, so that the two zinc finger domains bind to two non-contiguous target DNA sequences. An example of a two-handed zinc finger binding protein is SIP1, in which a cluster of four zinc finger proteins is located at the amino terminus of the protein and a cluster of three Zn finger proteins is located at the carboxyl terminus (see Remade et al. (1999) EMBO Journal [European Molecular Biology Journal] 18 (18): 5073-5084). Each cluster of zinc fingers in these proteins is capable of binding to a unique target sequence, and the interval between the two target sequences can contain many nucleotides.

接头Connectors

在一些实施例中，基因修饰多肽可以包含接头，例如肽接头，例如表10中描述的接头。在一些实施例中，基因修饰多肽在N末端至C末端方向上包含Cas结构域(例如，表8的Cas结构域)、表10的接头(或与其具有至少70％、80％、85％、90％、95％或99％同一性的序列)、以及RT结构域(例如，表6的RT结构域)。在一些实施例中，基因修饰多肽包含在核酸内切酶和RT结构域之间的柔性接头，例如，包含氨基酸序列SGGSSGGSSGSETPGTSESATPESSGGSSGGSS(SEQ ID NO：11,002)的接头。在一些实施例中，基因修饰多肽的RT结构域可以位于核酸内切酶结构域的C末端。在一些实施例中，基因修饰多肽的RT结构域可以位于核酸内切酶结构域的N末端。In some embodiments, the gene-modified polypeptide may comprise a linker, such as a peptide linker, such as a linker described in Table 10. In some embodiments, the gene-modified polypeptide comprises a Cas domain (e.g., a Cas domain of Table 8), a linker of Table 10 (or a sequence having at least 70%, 80%, 85%, 90%, 95% or 99% identity thereto), and a RT domain (e.g., an RT domain of Table 6) in the N-terminal to C-terminal direction. In some embodiments, the gene-modified polypeptide comprises a flexible linker between the endonuclease and the RT domain, for example, a linker comprising the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGSS (SEQ ID NO: 11,002). In some embodiments, the RT domain of the gene-modified polypeptide may be located at the C-terminus of the endonuclease domain. In some embodiments, the RT domain of the gene-modified polypeptide may be located at the N-terminus of the endonuclease domain.

表10示例性接头序列Table 10 Exemplary linker sequences

在一些实施例中，基因修饰多肽的接头包含选自以下的基序：(SGGS)_n(SEQ IDNO：5025)、(GGGS)_n(SEQ ID NO：5026)、(GGGGS)_n(SEQ ID NO：5027)、(G)_n、(EAAAK)_n(SEQ IDNO：5028)、(GGS)_n或(XP)_n。In some embodiments, the linker of the gene modifying polypeptide comprises a motif selected from the group consisting of: (SGGS) _n (SEQ ID NO: 5025), (GGGS) _n (SEQ ID NO: 5026), (GGGGS) _n (SEQ ID NO: 5027), (G) _n , (EAAAK) _n (SEQ ID NO: 5028), (GGS) _n , or (XP) _n .

通过池化筛选选择基因修饰多肽Selection of gene-modifying peptides by pooled screening

可以筛选候选基因修饰多肽来评价候选物的基因编辑能力。例如，可以使用为靶向编辑人基因组中的编码序列而设计的RNA基因修饰系统。在某些实施例中，这样的基因修饰系统可以与池化筛选方法结合使用。Candidate gene modification polypeptides can be screened to evaluate the gene editing ability of the candidate. For example, an RNA gene modification system designed for targeted editing of coding sequences in the human genome can be used. In certain embodiments, such a gene modification system can be used in combination with a pooled screening method.

例如，可以将基因修饰多肽候选物文库和模板指导RNA(tgRNA)引入哺乳动物细胞，以通过池化筛选方法测试候选物的基因编辑能力。在特别的实施例中，将基因修饰多肽候选物文库引入哺乳动物细胞，然后将tgRNA引入细胞。For example, a gene-modified polypeptide candidate library and a template guide RNA (tgRNA) can be introduced into mammalian cells to test the gene editing ability of the candidate by a pooled screening method. In a particular embodiment, a gene-modified polypeptide candidate library is introduced into mammalian cells, and then tgRNA is introduced into the cells.

可用于筛选的哺乳动物细胞的代表性非限制性实例包括HEK293T细胞、U2OS细胞、HeLa细胞、HepG2细胞、Huh7细胞、K562细胞、或iPS细胞。Representative, non-limiting examples of mammalian cells that can be used for screening include HEK293T cells, U2OS cells, HeLa cells, HepG2 cells, Huh7 cells, K562 cells, or iPS cells.

基因修饰多肽候选物可以包含1)Cas核酸酶，例如野生型Cas核酸酶(例如野生型Cas9核酸酶)、突变Cas核酸酶(例如Cas切口酶，例如Cas9切口酶，例如Cas9 N863A切口酶)、或选自表7或表8的Cas核酸酶，2)肽接头，例如来自表D或表10的序列，其可能表现出不同程度的长度、柔性、疏水性和/或二级结构；和3)逆转录酶(RT)，例如来自表D或表6的RT结构域。基因修饰多肽候选文库包含：多个不同的基因修饰多肽候选物，其在Cas核酸酶、肽接头或RT结构域组分中的一个、两个或全部三个方面彼此不同；或多个编码此类基因修饰多肽候选物的核酸表达载体。The gene-modified polypeptide candidate may comprise 1) a Cas nuclease, such as a wild-type Cas nuclease (e.g., a wild-type Cas9 nuclease), a mutant Cas nuclease (e.g., a Cas nickase, such as a Cas9 nickase, such as a Cas9 N863A nickase), or a Cas nuclease selected from Table 7 or Table 8, 2) a peptide linker, such as a sequence from Table D or Table 10, which may exhibit varying degrees of length, flexibility, hydrophobicity, and/or secondary structure; and 3) a reverse transcriptase (RT), such as an RT domain from Table D or Table 6. The gene-modified polypeptide candidate library comprises: a plurality of different gene-modified polypeptide candidates, which differ from each other in one, two, or all three aspects of the Cas nuclease, peptide linker, or RT domain components; or a plurality of nucleic acid expression vectors encoding such gene-modified polypeptide candidates.

为了筛选基因修饰多肽候选物，可以使用包含基因修饰多肽组分和tgRNA组分的双组分系统。基因修饰组分可以包括例如表达载体，例如表达质粒或慢病毒载体，其编码基因修饰多肽候选物，例如包含人密码子优化的核酸，其编码基因修饰多肽候选物，例如如上所述的Cas-接头-RT融合体。在特定实施例中，利用慢病毒盒，其包含：(i)用于在哺乳动物细胞中表达的启动子，例如CMV启动子；(ii)基因修饰文库候选物，例如包含表7或表8的Cas核酸酶、表10的肽接头和表6的RT的Cas-接头-RT融合体，例如如表D的Cas-接头-RT融合体；(iii)自切割多肽，例如T2A肽；(iv)能够在哺乳动物细胞中选择的标志物，例如嘌呤霉素抗性基因；和(v)终止信号，例如聚A尾。In order to screen gene modification polypeptide candidates, a two-component system comprising a gene modification polypeptide component and a tgRNA component can be used. The gene modification component may include, for example, an expression vector, such as an expression plasmid or a lentiviral vector, encoding a gene modification polypeptide candidate, such as a nucleic acid comprising human codon optimization, encoding a gene modification polypeptide candidate, such as a Cas-linker-RT fusion as described above. In a particular embodiment, a lentiviral cassette is utilized, comprising: (i) a promoter for expression in mammalian cells, such as a CMV promoter; (ii) a gene modification library candidate, such as a Cas nuclease comprising Table 7 or Table 8, a peptide linker of Table 10, and a Cas-linker-RT fusion of RT of Table 6, such as a Cas-linker-RT fusion of Table D; (iii) a self-cleaving polypeptide, such as a T2A peptide; (iv) a marker capable of selection in mammalian cells, such as a puromycin resistance gene; and (v) a termination signal, such as a poly A tail.

tgRNA组分可以包含tgRNA或表达载体，例如表达质粒，其产生tgRNA，例如利用U6启动子来驱动tgRNA的表达，其中tgRNA是被Cas识别并将其定位至目的基因组基因座的非编码RNA序列，并且还通过RT结构域作为将所期望编辑逆转录到基因组中的模板。The tgRNA component can include a tgRNA or an expression vector, such as an expression plasmid, which produces the tgRNA, for example using a U6 promoter to drive the expression of the tgRNA, wherein the tgRNA is a non-coding RNA sequence that is recognized by Cas and localized to the genomic locus of interest, and also serves as a template for reverse transcription of the desired edits into the genome through the RT domain.

为了制备表达基因修饰多肽文库候选物的细胞池，可以用基因修饰候选多肽文库的池化基因修饰多肽候选表达载体制剂(例如慢病毒制剂)转导哺乳动物细胞(例如HEK293T或U2OS细胞)。在特定实施例中，利用慢病毒质粒，并将HEK293 Lenti-X细胞接种于15cm平板中(约12x10⁶个细胞)后再进行慢病毒质粒转染。在这样的实施例中，可以使用慢病毒包装混合物(博塞塔公司(Biosettia))进行慢病毒质粒转染，并根据制造商的方案在第二天使用Lipofectamine 2000和Opti-MEM培养基对基因修饰候选文库的质粒DNA进行转染。在这样的实施例中，可以通过第二天的完全培养基更换去除细胞外DNA，并且可以在48小时后收获含有病毒的培养基。慢病毒培养基可使用Lenti-X浓缩液(宝生物科学公司(TaKaRa Biosciences))浓缩，然后可制备5mL慢病毒等分试样并储存于-80℃。慢病毒滴度测定是通过在选择后(例如，在嘌呤霉素选择后)计数菌落形成单位来进行的。In order to prepare a cell pool expressing a candidate for a gene-modified polypeptide library, a mammalian cell (e.g., HEK293T or U2OS cell) can be transduced with a pooled gene-modified polypeptide candidate expression vector preparation (e.g., a lentiviral preparation) of a gene-modified candidate polypeptide library. In a specific embodiment, a lentiviral plasmid is used, and HEK293 Lenti-X cells are inoculated in a 15 cm plate (about 12x10 ⁶ cells) before lentiviral plasmid transfection. In such an embodiment, a lentiviral packaging mixture (Biosettia) can be used for lentiviral plasmid transfection, and the plasmid DNA of the gene-modified candidate library can be transfected using Lipofectamine 2000 and Opti-MEM medium according to the manufacturer's protocol on the second day. In such an embodiment, extracellular DNA can be removed by replacing the complete medium on the second day, and the culture medium containing the virus can be harvested after 48 hours. The lentiviral culture medium can be concentrated using Lenti-X concentrate (TaKaRa Biosciences), and then 5 mL lentiviral aliquots can be prepared and stored at -80 ° C. Lentivirus titer determination is performed by counting colony forming units after selection (eg, after puromycin selection).

为了监测靶DNA的基因编辑，可以利用携带靶DNA的哺乳动物细胞，例如HEK293T或U2OS细胞。在监测靶DNA基因编辑的其他实施例中，可以利用携带靶DNA基因组着陆垫的哺乳动物细胞，例如HEK293T或U2OS细胞。在特定实施例中，靶DNA基因组着陆垫可以包含要编辑以治疗目的疾病或病症的基因。在其他特定实施例中，靶DNA是表达表现出可检测特征的蛋白质的基因序列，可以监测这些特征以确定是否发生了基因编辑。例如，在某些实施例中，利用表达蓝色荧光蛋白(BFP)或绿色荧光蛋白(GFP)的基因组着陆垫。在某些实施例中，将包含靶DNA(例如靶DNA基因组着陆垫)的哺乳动物细胞(例如HEK293T或U2OS细胞)以每基因修饰文库候选物500x-3000x个细胞接种在培养板中，并以0.2-0.3的感染复数(MOI)进行转导，以最大程度地减少每个细胞的多重感染。感染后48小时可加入嘌呤霉素(2.5ug/mL)以选择受感染的细胞。在这样的实施例中，细胞可在嘌呤霉素选择下保持至少7天，然后扩大规模以引入tgRNA，例如，tgRNA电穿孔。In order to monitor the gene editing of the target DNA, mammalian cells carrying the target DNA, such as HEK293T or U2OS cells, can be used. In other embodiments of monitoring the gene editing of the target DNA, mammalian cells carrying the target DNA genome landing pad, such as HEK293T or U2OS cells, can be used. In a specific embodiment, the target DNA genome landing pad may include a gene to be edited to treat a disease or condition of interest. In other specific embodiments, the target DNA is a gene sequence expressing a protein that exhibits detectable characteristics, and these characteristics can be monitored to determine whether gene editing has occurred. For example, in certain embodiments, a genome landing pad expressing blue fluorescent protein (BFP) or green fluorescent protein (GFP) is utilized. In certain embodiments, mammalian cells (e.g., HEK293T or U2OS cells) containing the target DNA (e.g., the target DNA genome landing pad) are inoculated in a culture plate with 500x-3000x cells per gene modification library candidate and transduced at a multiplicity of infection (MOI) of 0.2-0.3 to minimize multiple infections of each cell. Puromycin (2.5 ug/mL) can be added 48 hours after infection to select infected cells. In such embodiments, cells can be maintained under puromycin selection for at least 7 days and then expanded to introduce tgRNA, for example, tgRNA electroporation.

为了确定是否发生基因编辑，可以用基因修饰多肽文库候选物感染含有要编辑的靶DNA的哺乳动物细胞，然后用设计用于编辑靶DNA的tgRNA转染。随后，可以分析细胞以确定靶基因座的编辑是否根据设计的结果发生，或者是否无编辑发生或发生不完美编辑，例如通过使用细胞分选和序列分析。To determine whether gene editing occurs, mammalian cells containing the target DNA to be edited can be infected with a gene modifying polypeptide library candidate and then transfected with a tgRNA designed to edit the target DNA. Subsequently, the cells can be analyzed to determine whether editing of the target locus occurs according to the designed results, or whether no editing occurs or imperfect editing occurs, for example, by using cell sorting and sequence analysis.

在特定实施例中，为了确定是否发生基因组编辑，可以用基因修饰文库候选物感染表达BFP或GFP的哺乳动物细胞(例如HEK293T或U2OS细胞)，然后用tgRNA质粒或RNA转染或电穿孔，例如通过使用200ng tgRNA质粒对250,000个细胞/孔进行电穿孔，该质粒设计用于将BFP转化为GFP或GFP转化为BFP，其中细胞计数确保每个文库候选物的覆盖率＞250x-1000x。在这样的实施例中，可以通过在电穿孔后4-10天针对颜色转化的荧光蛋白(FP)的表达用荧光激活细胞分选(FACS)对细胞进行分选来评估该测定中各种构建体的基因组编辑能力。对细胞进行分选和收获，分为未编辑细胞(显示原始荧光蛋白信号)、已编辑细胞(显示转化的荧光蛋白信号)和不完美编辑细胞(不显示荧光蛋白信号)的不同群体。还可以收获未分选的细胞样品作为输入群体，以确定分析过程中的候选富集。In a specific embodiment, in order to determine whether genome editing occurs, mammalian cells (such as HEK293T or U2OS cells) expressing BFP or GFP can be infected with gene modification library candidates, and then transfected or electroporated with tgRNA plasmids or RNA, such as by using 200ng tgRNA plasmids to electroporate 250,000 cells/well, the plasmid is designed to convert BFP into GFP or GFP into BFP, wherein the cell count ensures that the coverage of each library candidate is >250x-1000x. In such an embodiment, the genome editing ability of various constructs in the assay can be evaluated by sorting cells with fluorescence activated cell sorting (FACS) for the expression of color-converted fluorescent protein (FP) 4-10 days after electroporation. Cells are sorted and harvested, divided into different groups of unedited cells (displaying original fluorescent protein signals), edited cells (displaying converted fluorescent protein signals) and imperfectly edited cells (not showing fluorescent protein signals). Unsorted cell samples can also be harvested as input populations to determine candidate enrichment during analysis.

为了确定哪些基因修饰文库候选物在测定中表现出基因组编辑能力，从分选的细胞群体中收获基因组DNA(gDNA)，并通过对每个群体中的基因修饰文库候选物进行测序进行分析。简而言之，可以使用针对基因修饰多肽表达载体(例如慢病毒盒)具有特异性的引物从基因组中扩增基因修饰候选物，在第二轮PCR中扩增以稀释基因组DNA，然后进行测序，例如通过下一代测序平台进行测序。在对测序读段进行质量控制之后，将至少约1500个核苷酸并且通常不超过约3200个核苷酸的读段映射到基因修饰多肽文库序列，并且那些与文库序列至少约80％匹配的读段被认为已成功与给定候选物比对，以用于此池化筛选。为了识别能够在测定中进行基因编辑(例如，BFP到GFP、或GFP到BFP编辑)的候选物，将编辑群体中每个文库候选物的读取计数与初始未分选群体中的读取计数进行比较。In order to determine which gene modification library candidates show genome editing ability in the assay, genomic DNA (gDNA) is harvested from the sorted cell population, and the gene modification library candidates in each population are sequenced for analysis. In short, the gene modification candidates can be amplified from the genome using primers specific for a gene modification polypeptide expression vector (such as a lentiviral cassette), amplified in the second round of PCR to dilute the genomic DNA, and then sequenced, such as by a next-generation sequencing platform. After quality control of sequencing reads, at least about 1500 nucleotides and generally no more than about 3200 nucleotides are mapped to the gene modification polypeptide library sequence, and those reads that match at least about 80% of the library sequence are considered to have been successfully compared with a given candidate, for this pooling screening. In order to identify candidates that can be gene edited (for example, BFP to GFP or GFP to BFP editing) in the assay, the read counts of each library candidate in the editing population are compared with the read counts in the initial unsorted population.

为了进行池化筛选，具有基因组编辑能力的基因修饰候选物是根据编辑的(转化的FP)群体相对于未分选(输入)细胞的富集来识别的。在一些实施例中，相对于输入至少1.0、1.5、2.0、2.5、3.0、4.0、5.0、6.0、7.0、8.0、9.0、10、15、20、25、30、40、50、60、70、80、90或至少100倍的富集表明潜在有用的基因编辑活性，例如至少2倍的富集。在一些实施例中，通过对富集率取以2为底的对数，将富集转化为对数值。在一些实施例中，log2富集分数为至少0、1、2、3、4、5、5.5、6.0、6.2、6.3、6.4、6.5、或至少6.6表明潜在有用的基因编辑活性，例如log2富集分数为至少1.0。在特定实施例中，可以利用参考(例如元件ID号：17380)将观察到的基因修饰候选物的富集值与在类似条件下观察到的富集值进行比较。In order to perform pool screening, gene modification candidates with genome editing capabilities are identified based on the enrichment of the edited (converted FP) population relative to unsorted (input) cells. In some embodiments, an enrichment of at least 1.0, 1.5, 2.0, 2.5, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90 or at least 100 times relative to the input indicates potentially useful gene editing activity, such as at least 2 times enrichment. In some embodiments, the enrichment is converted to a logarithm with a base of 2 for the enrichment rate. In some embodiments, a log2 enrichment score of at least 0, 1, 2, 3, 4, 5, 5.5, 6.0, 6.2, 6.3, 6.4, 6.5 or at least 6.6 indicates potentially useful gene editing activity, such as a log2 enrichment score of at least 1.0. In certain embodiments, a reference (eg, element ID number: 17380) can be used to compare the observed enrichment values of a gene modification candidate to the enrichment values observed under similar conditions.

在一些实施例中，可以使用多个tgRNA来筛选基因修饰候选文库。在特定实施例中，可以利用多个tgRNA来优化模板/Cas-接头-RT融合对，例如用于特定靶基因的基因编辑，例如用于治疗疾病的基因靶标。在特别的实施例中，可以使用阵列形式的多种不同的tgRNA来进行筛选基因修饰候选物的池化方法。In some embodiments, multiple tgRNAs can be used to screen a library of candidate genes for gene modification. In certain embodiments, multiple tgRNAs can be used to optimize template/Cas-linker-RT fusion pairs, such as for gene editing of specific target genes, such as gene targets for treating diseases. In particular embodiments, a pooling method for screening candidate genes for gene modification can be performed using a variety of different tgRNAs in array form.

在一些实施例中，可以使用多种类型的编辑，例如不同长度的插入、取代和/或缺失，来筛选基因修饰候选文库。In some embodiments, multiple types of edits, such as insertions, substitutions, and/or deletions of varying lengths, can be used to screen a library of gene modification candidates.

在一些实施例中，可以使用多个靶序列(例如不同的荧光蛋白)来筛选基因修饰候选文库。在一些实施例中，可以使用多个靶序列(例如不同的荧光蛋白)来筛选基因修饰候选文库。在一些实施例中，可以使用多种细胞类型，例如HEK293T或U2OS，来筛选基因修饰候选文库。本领域的普通技术人员将理解，给定的候选物可表现出改变的编辑能力，甚至在不同条件下获得或丧失任何可观察或有用的活性，包括tgRNA序列(例如，核苷酸修饰、PBS长度、RT模板长度)、靶序列、靶位置、编辑类型、相对于基因修饰多肽的第一链切口的突变位置或细胞类型。因此，在一些实施例中，跨多个参数筛选基因修饰文库候选物，例如，使用至少两种细胞类型中的至少两种不同的tgRNA，并且通过在任何单一条件下的富集来识别基因编辑活性。在其他实施例中，通过在至少两种条件下(例如，在所有筛选条件下)的富集来识别在不同tgRNA和细胞类型中具有更强活性的候选物。为清楚起见，在任何给定条件下表现出很少或没有富集的候选物不被认为是在所有条件下都无活性的，并且可以用不同的参数进行筛选或在多肽水平上重新配置，例如通过交换、改组或进化结构域(例如，RT结构域)、接头或其他信号(例如，NLS)。In some embodiments, multiple target sequences (e.g., different fluorescent proteins) can be used to screen the gene modification candidate library. In some embodiments, multiple target sequences (e.g., different fluorescent proteins) can be used to screen the gene modification candidate library. In some embodiments, multiple cell types, such as HEK293T or U2OS, can be used to screen the gene modification candidate library. It will be understood by those of ordinary skill in the art that a given candidate may exhibit a changed editing ability, even obtaining or losing any observable or useful activity under different conditions, including tgRNA sequence (e.g., nucleotide modification, PBS length, RT template length), target sequence, target position, editing type, mutation position relative to the first chain nick of the gene modification polypeptide, or cell type. Therefore, in some embodiments, the gene modification library candidate is screened across multiple parameters, for example, using at least two different tgRNAs in at least two cell types, and identifying gene editing activity by enrichment under any single condition. In other embodiments, candidates with stronger activity in different tgRNAs and cell types are identified by enrichment under at least two conditions (e.g., under all screening conditions). For clarity, candidates that show little or no enrichment under any given condition are not considered inactive under all conditions and can be screened with different parameters or reconfigured at the polypeptide level, for example by swapping, shuffling or evolving domains (e.g., RT domains), linkers or other signals (e.g., NLS).

示例性Cas9-接头-RT融合体序列Exemplary Cas9-Linker-RT Fusion Sequences

在一些实施例中，基因修饰多肽包含接头序列和RT序列。在一些实施例中，基因修饰多肽包含如表D中所列的接头序列，或与其具有至少75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含如表D中所列的RT结构域的氨基酸序列，或与其具有至少75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含如表D中所列的接头序列，或与其具有至少75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的氨基酸序列；以及如表D中所列的RT结构域的氨基酸序列，或与其具有至少75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含：(i)如表D的行中所列的接头序列，或与其具有至少75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的氨基酸序列；以及(ii)如表D的同一行中所列的RT结构域的氨基酸序列，或与其具有至少75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的氨基酸序列。In some embodiments, the genetically modified polypeptide comprises a linker sequence and an RT sequence. In some embodiments, the genetically modified polypeptide comprises a linker sequence as listed in Table D, or an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto. In some embodiments, the genetically modified polypeptide comprises an amino acid sequence of an RT domain as listed in Table D, or an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto. In some embodiments, the gene modifying polypeptide comprises a linker sequence as listed in Table D, or an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and an amino acid sequence of an RT domain as listed in Table D, or an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, the gene modifying polypeptide comprises: (i) a linker sequence as listed in a row of Table D, or an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and (ii) an amino acid sequence of an RT domain as listed in the same row of Table D, or an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.

示例性基因修饰多肽Exemplary Gene Modifying Polypeptides

在一些实施例中，基因修饰多肽(例如，作为本文所述系统的一部分的基因修饰多肽)包含SEQ ID NO：1-7743中任一个的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含SEQ IDNO：1-7743中任一个的氨基酸序列，或与其具有至少80％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含SEQ ID NO：1-7743中任一个的氨基酸序列，或与其具有至少90％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含SEQ ID NO：1-7743中任一个的氨基酸序列，或与其具有至少95％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含SEQ ID NO：1-7743中任一个的氨基酸序列，或与其具有至少99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含SEQ ID NO：1-7743中任一个的氨基酸序列。在一些实施例中，基因修饰多肽包含SEQ ID NO：6001-7743中任一个的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含SEQ ID NO：4501-4541中任一个的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In some embodiments, a genetically modified polypeptide (e.g., a genetically modified polypeptide as part of a system described herein) comprises an amino acid sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identical thereto. In some embodiments, a genetically modified polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence that is at least 80% identical thereto. In some embodiments, a genetically modified polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence that is at least 90% identical thereto. In some embodiments, a genetically modified polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence that is at least 95% identical thereto. In some embodiments, a genetically modified polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence that is at least 99% identical thereto. In some embodiments, a genetically modified polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 1-7743. In some embodiments, the gene-modified polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the gene-modified polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.

在一些实施例中，基因修饰多肽包含如表A1中所列的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In some embodiments, the genetically modified polypeptide comprises an amino acid sequence as listed in Table A1, or an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identical thereto.

在一些实施例中，基因修饰多肽包含如表T1中所列的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含接头，其含有如表T1中所列的接头序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含RT结构域，其含有如表T1中所列的RT结构域序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含：(i)接头，其含有如表T1的行中所列的接头序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列；和(ii)RT结构域，其含有如表T1的同一行中所列的RT结构域序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In some embodiments, the genetically modified polypeptide comprises an amino acid sequence as listed in Table T1, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the genetically modified polypeptide comprises a linker comprising a linker sequence as listed in Table T1, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the genetically modified polypeptide comprises an RT domain comprising an RT domain sequence as listed in Table T1, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the gene-modified polypeptide comprises: (i) a linker comprising a linker sequence as listed in a row of Table T1, or an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identical thereto; and (ii) an RT domain comprising an RT domain sequence as listed in the same row of Table T1, or an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identical thereto.

表T1.示例性基因修饰多肽的选择Table T1. Selection of Exemplary Gene Modifying Polypeptides

在一些实施例中，基因修饰多肽包含如表T2中所列的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含接头，其含有如表T2中所列的接头序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含RT结构域，其含有如表T2中所列的RT结构域序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含：(i)接头，其含有如表T2的行中所列的接头序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列；和(ii)RT结构域，其含有如表T2的同一行中所列的RT结构域序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In some embodiments, the genetically modified polypeptide comprises an amino acid sequence as listed in Table T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the genetically modified polypeptide comprises a linker comprising a linker sequence as listed in Table T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the genetically modified polypeptide comprises an RT domain comprising an RT domain sequence as listed in Table T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the gene-modifying polypeptide comprises: (i) a linker comprising a linker sequence as listed in a row of Table T2, or an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identical thereto; and (ii) an RT domain comprising an RT domain sequence as listed in the same row of Table T2, or an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identical thereto.

表T2.示例性基因修饰多肽的选择Table T2. Selection of Exemplary Gene Modifying Polypeptides

示例性基因修饰多肽的子序列Subsequences of Exemplary Gene Modifying Polypeptides

在一些实施例中，基因修饰多肽按N末端至C末端顺序包含N末端甲硫氨酸残基、第一核定位信号(NLS)、DNA结合结构域、接头、RT结构域和/或第二NLS中的一个或多个(例如，1、2、3、4、5或全部6个)。在一些实施例中，基因修饰多肽按N末端至C末端顺序包含NLS(例如，第一NLS)、DNA结合结构域、接头和RT结构域，其中接头和RT结构域是SEQ ID NO：1-7743中任一个的基因修饰多肽的接头和RT结构域，或与所述接头和RT结构域具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽按N末端至C末端顺序包含DNA结合结构域、接头、RT结构域和NLS(例如，第二NLS)，其中接头和RT结构域是SEQ ID NO：1-7743中任一个的基因修饰多肽的接头和RT结构域，或与所述接头和RT结构域具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽按N末端至C末端顺序包含第一NLS、DNA结合结构域、接头、RT结构域和第二NLS，其中接头和RT结构域是SEQ ID NO：1-7743中任一个的基因修饰多肽的接头和RT结构域，或与所述接头和RT结构域具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽进一步包含N末端甲硫氨酸残基。In some embodiments, the gene-modified polypeptide comprises one or more (e.g., 1, 2, 3, 4, 5, or all 6) of an N-terminal methionine residue, a first nuclear localization signal (NLS), a DNA binding domain, a linker, a RT domain, and/or a second NLS in order from N-terminus to C-terminus. In some embodiments, the gene-modified polypeptide comprises an NLS (e.g., a first NLS), a DNA binding domain, a linker, and a RT domain in order from N-terminus to C-terminus, wherein the linker and RT domain are the linker and RT domain of the gene-modified polypeptide of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity with the linker and RT domain. In some embodiments, the gene-modified polypeptide comprises a DNA binding domain, a linker, a RT domain, and an NLS (e.g., a second NLS) in N-terminal to C-terminal order, wherein the linker and RT domain are the linker and RT domain of the gene-modified polypeptide of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity with the linker and RT domain. In some embodiments, the gene-modified polypeptide comprises a first NLS, a DNA binding domain, a linker, a RT domain, and a second NLS in N-terminal to C-terminal order, wherein the linker and RT domain are the linker and RT domain of the gene-modified polypeptide of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity with the linker and RT domain. In some embodiments, the gene-modified polypeptide further comprises an N-terminal methionine residue.

在一些实施例中，基因修饰多肽按N末端至C末端顺序包含以下中的一个或多个(例如1、2、3、4、5或全部6个)：N末端甲硫氨酸残基、第一核定位信号(NLS)(例如属于以下的基因修饰多肽：SEQ ID NO：1-7743中任一个和/或如表A1、T1或T2中任一个所列的，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列)、DNA结合结构域(例如Cas结构域，例如SpyCas9结构域，例如如表8所列的，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列；或以下的基因修饰多肽的DNA结合结构域：SEQ ID NO：1-7743中任一个和/或如表A1、T1或T2中任一个所列的，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列)、接头(例如，属于以下的基因修饰多肽：SEQ ID NO：1-7743中任一个和/或如表A1、T1或T2中任一个所列的，或与其具有至少70％、75％、80％、85％、90％、95％、或99％同一性的氨基酸序列)、RT结构域(例如，属于以下的基因修饰多肽：SEQ ID NO：1-7743中任一个和/或如表A1、T1或T2中任一个所列的，或与其具有至少70％、75％、80％、85％、90％、95％、或99％同一性的氨基酸序列)、以及第二NLS(例如，属于以下的基因修饰多肽：SEQ ID NO：1-7743中任一个和/或如表A1、T1或T2中任一个所列的，或与其具有至少70％、75％、80％、85％、90％、95％、或99％同一性的氨基酸序列)。在一些实施例中，基因修饰多肽进一步包含(例如，第二NLS的C末端)T2A序列和/或嘌呤霉素序列(例如，属于以下的基因修饰多肽：SEQ ID NO：1-7743中任一个和/或如表A1、T1或T2中任一个所列的，或与其具有至少70％、75％、80％、85％、90％、95％、或99％同一性的氨基酸序列)。在一些实施例中，编码基因修饰多肽(例如，如本文所述)的核酸编码T2A序列，例如，其中T2A序列位于编码基因修饰多肽的区域与第二区之间，其中第二区任选地编码可选择标志物，例如嘌呤霉素。In some embodiments, the gene modifying polypeptide comprises one or more (e.g., 1, 2, 3, 4, 5, or all 6) of the following in order from N-terminus to C-terminus: an N-terminal methionine residue, a first nuclear localization signal (NLS) (e.g., a gene modifying polypeptide belonging to any one of SEQ ID NOs: 1-7743 and/or as listed in any one of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto), a DNA binding domain (e.g., a Cas domain, e.g., a SpyCas9 domain, e.g., as listed in Table 8, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto; or a DNA binding domain of the following gene modifying polypeptide: SEQ ID NOs: NO: 1-7743 and/or as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto), a linker (e.g., a gene modifying polypeptide belonging to any one of SEQ ID NO: 1-7743 and/or as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto), an RT domain (e.g., a gene modifying polypeptide belonging to any one of SEQ ID NO: 1-7743 and/or as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto), and a second NLS (e.g., a gene modifying polypeptide belonging to SEQ ID NO: 1-7743 and/or as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto). In some embodiments, the gene-modified polypeptide further comprises (e.g., C-terminal to the second NLS) a T2A sequence and/or a puromycin sequence (e.g., a gene-modified polypeptide belonging to any one of SEQ ID NOs: 1-7743 and/or an amino acid sequence listed in any one of Tables A1, T1 or T2, or at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identical thereto). In some embodiments, the nucleic acid encoding the gene-modified polypeptide (e.g., as described herein) encodes a T2A sequence, e.g., wherein the T2A sequence is located between the region encoding the gene-modified polypeptide and the second region, wherein the second region optionally encodes a selectable marker, e.g., puromycin.

在某些实施例中，第一NLS包含基因修饰多肽的第一NLS序列，该基因修饰多肽具有SEQ ID NO：1-7743中的任一个的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，第一NLS包含如表A1、T1或T2中任一个所列基因修饰多肽的第一NLS序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，第一NLS序列包含C-mycNLS。在某些实施例中，第一NLS包含氨基酸序列PAAKRVKLD(SEQ ID NO：11,095)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In certain embodiments, the first NLS comprises a first NLS sequence of a gene-modifying polypeptide having an amino acid sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the first NLS comprises a first NLS sequence of a gene-modifying polypeptide as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the first NLS sequence comprises C-myc NLS. In certain embodiments, the first NLS comprises the amino acid sequence PAAKRVKLD (SEQ ID NO: 11,095), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

在某些实施例中，基因修饰多肽进一步包含第一NLS和DNA结合结构域之间的间隔子序列。在某些实施例中，第一NLS和DNA结合结构域之间的间隔子序列包含1、2、3、4、5、6、7、8、9或10个氨基酸。在某些实施例中，第一NLS和DNA结合结构域之间的间隔子序列包含氨基酸序列GG。In certain embodiments, the genetically modified polypeptide further comprises a spacer sequence between the first NLS and the DNA binding domain. In certain embodiments, the spacer sequence between the first NLS and the DNA binding domain comprises 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids. In certain embodiments, the spacer sequence between the first NLS and the DNA binding domain comprises the amino acid sequence GG.

在某些实施例中，DNA结合结构域包含以下的基因修饰多肽的DNA结合结构域：SEQID NO：1-7743中的任一个，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，DNA结合结构域包含如表A1、T1或T2中任一个所列基因修饰多肽的DNA结合结构域，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，DNA结合结构域包含Cas结构域(例如，如表8所列)。在某些实施例中，DNA结合结构域包含SpyCas9多肽(例如，如表8所列，例如，Cas9 N863A多肽)的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，DNA结合结构域包含氨基酸序列：In certain embodiments, the DNA binding domain comprises the DNA binding domain of the following gene modification polypeptide: any one of SEQ ID NO: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the DNA binding domain comprises the DNA binding domain of a gene modification polypeptide as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the DNA binding domain comprises a Cas domain (e.g., as listed in Table 8). In certain embodiments, the DNA binding domain comprises the amino acid sequence of a SpyCas9 polypeptide (e.g., as listed in Table 8, e.g., Cas9 N863A polypeptide), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the DNA binding domain comprises the amino acid sequence:

或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。or an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identical thereto.

在某些实施例中，基因修饰多肽进一步包含DNA结合结构域和接头之间的间隔子序列。在某些实施例中，DNA结合结构域和接头之间的间隔子序列包含1、2、3、4、5、6、7、8、9或10个氨基酸。在某些实施例中，DNA结合结构域和接头之间的间隔子序列包含氨基酸序列GG。In certain embodiments, the genetically modified polypeptide further comprises a spacer sequence between the DNA binding domain and the joint. In certain embodiments, the spacer sequence between the DNA binding domain and the joint comprises 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids. In certain embodiments, the spacer sequence between the DNA binding domain and the joint comprises the amino acid sequence GG.

在某些实施例中，接头包含以下的基因修饰多肽的接头序列：SEQ ID NO：1-7743中的任一个，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，接头包含如表A1、T1或T2中任一个所列基因修饰多肽的接头序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，接头包含如表D或10中所列的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In certain embodiments, the linker comprises a linker sequence of a gene-modifying polypeptide as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the linker comprises a linker sequence of a gene-modifying polypeptide as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the linker comprises an amino acid sequence as listed in Table D or 10, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

在某些实施例中，基因修饰多肽进一步包含接头和RT结构域之间的间隔子序列。在某些实施例中，接头和RT结构域之间的间隔子序列包含1、2、3、4、5、6、7、8、9或10个氨基酸。在某些实施例中，接头和RT结构域之间的间隔子序列包含氨基酸序列GG。In certain embodiments, the gene modification polypeptide further comprises a spacer sequence between a joint and a RT domain. In certain embodiments, the spacer sequence between a joint and a RT domain comprises 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids. In certain embodiments, the spacer sequence between a joint and a RT domain comprises an amino acid sequence GG.

在某些实施例中，RT结构域包含以下的基因修饰多肽的RT结构域序列：SEQ IDNO：1-7743中的任一个，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，RT结构域包含如表A1、T1或T2中任一个所列基因修饰多肽的RT结构域序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，RT结构域包含如表D或6中所列的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域的长度为约400-500、500-600、600-700、700-800、800-900或900-1000个氨基酸。In certain embodiments, the RT domain comprises the RT domain sequence of the following gene-modifying polypeptide: any one of SEQ ID NO: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the RT domain comprises the RT domain sequence of the gene-modifying polypeptide as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the RT domain comprises an amino acid sequence as listed in Table D or 6, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the RT domain is about 400-500, 500-600, 600-700, 700-800, 800-900, or 900-1000 amino acids in length.

在某些实施例中，基因修饰多肽进一步包含RT结构域和第二NLS之间的间隔子序列。在某些实施例中，RT结构域和第二NLS之间的间隔子序列包含1、2、3、4、5、6、7、8、9或10个氨基酸。在某些实施例中，RT结构域和第二NLS之间的间隔子序列包含氨基酸序列AG。In certain embodiments, the genetically modified polypeptide further comprises a spacer sequence between the RT domain and the second NLS. In certain embodiments, the spacer sequence between the RT domain and the second NLS comprises 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids. In certain embodiments, the spacer sequence between the RT domain and the second NLS comprises the amino acid sequence AG.

在某些实施例中，第二NLS包含SEQ ID NO：1-7743中任一个的基因修饰多肽的第二NLS序列。在某些实施例中，第二NLS包含如表A1、T1或T2中任一个所列基因修饰多肽的第二NLS序列。在某些实施例中，第二NLS序列包含多个部分NLS序列。在实施例中，NLS序列(例如，第二NLS序列)包含第一部分NLS序列，例如，其包含氨基酸序列KRTADGSEFE(SEQ ID NO：11,097)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在实施例中，NLS序列(例如，第二NLS序列)包含第二部分NLS序列。在实施例中，NLS序列(例如，第二NLS序列)包含SV40A5 NLS，例如两组分SV40A5 NLS，例如，其包含氨基酸序列KRTADGSEFESPKKKAKVE(SEQ ID NO：11,098)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，NLS序列(例如，第二NLS序列)包含氨基酸序列KRTADGSEFEKRTADGSEFESPKKKAKVE(SEQ ID NO：11,099)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In certain embodiments, the second NLS comprises a second NLS sequence of a gene-modifying polypeptide of any one of SEQ ID NOs: 1-7743. In certain embodiments, the second NLS comprises a second NLS sequence of a gene-modifying polypeptide as listed in any one of Tables A1, T1 or T2. In certain embodiments, the second NLS sequence comprises a plurality of partial NLS sequences. In an embodiment, an NLS sequence (e.g., a second NLS sequence) comprises a first partial NLS sequence, for example, comprising the amino acid sequence KRTADGSEFE (SEQ ID NO: 11,097), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In an embodiment, an NLS sequence (e.g., a second NLS sequence) comprises a second partial NLS sequence. In embodiments, the NLS sequence (e.g., the second NLS sequence) comprises an SV40A5 NLS, e.g., a two-component SV40A5 NLS, e.g., comprising the amino acid sequence KRTADGSEFESPKKKAKVE (SEQ ID NO: 11,098), or an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identical thereto. In certain embodiments, the NLS sequence (e.g., the second NLS sequence) comprises the amino acid sequence KRTADGSEFEKRTADGSEFESPKKKAKVE (SEQ ID NO: 11,099), or an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identical thereto.

在某些实施例中，基因修饰多肽进一步包含第二NLS和T2A序列和/或嘌呤霉素序列之间的间隔子序列。在某些实施例中，第二NLS和T2A序列和/或嘌呤霉素序列之间的间隔子序列包含1、2、3、4、5、6、7、8、9或10个氨基酸。在某些实施例中，第二NLS和T2A序列和/或嘌呤霉素序列之间的间隔子序列包含氨基酸序列GSG。In certain embodiments, the genetically modified polypeptide further comprises a spacer sequence between a second NLS and a T2A sequence and/or a puromycin sequence. In certain embodiments, the spacer sequence between a second NLS and a T2A sequence and/or a puromycin sequence comprises 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids. In certain embodiments, the spacer sequence between a second NLS and a T2A sequence and/or a puromycin sequence comprises the amino acid sequence GSG.

接头和RT结构域Linker and RT domains

在一些实施例中，基因修饰多肽包含接头(例如，如本文所述)和RT结构域(例如，如本文所述)。在某些实施例中，基因修饰多肽按N末端至C末端的顺序包含接头(例如，如本文所述)和RT结构域(例如，如本文所述)。In some embodiments, the gene-modified polypeptide comprises a linker (e.g., as described herein) and a RT domain (e.g., as described herein). In certain embodiments, the gene-modified polypeptide comprises a linker (e.g., as described herein) and a RT domain (e.g., as described herein) in the order from N-terminus to C-terminus.

在某些实施例中，接头包含如表10中所列的接头序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，接头包含以下的接头序列：SEQ ID NO：1-7743中的任一个，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，接头包含以下的接头序列：SEQ ID NO：6001-7743中的任一个，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，接头包含以下的接头序列：SEQ ID NO：4501-4541中的任一个，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，接头包含表A1、T1或T2中任一个所列示例性基因修饰多肽的接头序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，RT结构域包含如表6中所列的RT结构域序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，RT结构域包含表A1、T1或T2中任一个所列示例性基因修饰多肽的RT结构域序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In certain embodiments, the linker comprises a linker sequence as listed in Table 10, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the linker comprises the following linker sequence: any one of SEQ ID NO: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the linker comprises the following linker sequence: any one of SEQ ID NO: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the linker comprises the following linker sequence: any one of SEQ ID NO: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the linker comprises a linker sequence of any one of the exemplary gene-modifying polypeptides listed in Table A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the RT domain comprises an RT domain sequence as listed in Table 6, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the RT domain comprises an RT domain sequence of any one of the exemplary gene-modifying polypeptides listed in Table A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

在一些实施例中，基因修饰多肽包含SEQ ID NO：1-7743中任一个的基因修饰多肽的一部分(其中该部分包含接头和RT结构域)，或与所述部分具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In some embodiments, the genetically modified polypeptide comprises a portion of the genetically modified polypeptide of any one of SEQ ID NOs: 1-7743 (wherein the portion comprises a linker and an RT domain), or an amino acid sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identical to the portion.

在一些实施例中，基因修饰多肽包含SEQ ID NO：1-7743中任一个的基因修饰多肽的接头，或与所述接头具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含SEQ ID NO：6001-7743中任一个的基因修饰多肽的接头，或与所述接头具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含SEQ ID NO：4501-4541中任一个的基因修饰多肽的接头，或与所述接头具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含如表A1、T1或T2中任一个所列基因修饰多肽的接头，或包含与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列的接头。In some embodiments, the gene-modified polypeptide comprises a linker of a gene-modified polypeptide of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the gene-modified polypeptide comprises a linker of a gene-modified polypeptide of any one of SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the gene-modified polypeptide comprises a linker of a gene-modified polypeptide of any one of SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the gene-modified polypeptide comprises a linker of a gene-modified polypeptide as listed in any one of Tables A1, T1 or T2, or a linker comprising an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

在一些实施例中，基因修饰多肽包含SEQ ID NO：1-7743中任一个的基因修饰多肽的RT结构域，或与所述RT结构域具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含SEQ ID NO：6001-7743中任一个的基因修饰多肽的RT结构域，或与所述RT结构域具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含SEQ ID NO：4501-4541中任一个的基因修饰多肽的RT结构域，或与所述RT结构域具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽包含如表A1、T1或T2中任一个所列基因修饰多肽的RT结构域，或包含与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列的RT结构域。In some embodiments, the gene-modified polypeptide comprises the RT domain of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to the RT domain. In some embodiments, the gene-modified polypeptide comprises the RT domain of any one of SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to the RT domain. In some embodiments, the gene-modified polypeptide comprises the RT domain of any one of SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to the RT domain. In some embodiments, the gene-modified polypeptide comprises an RT domain of a gene-modified polypeptide as listed in any one of Tables A1, T1 or T2, or an RT domain comprising an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

在某些实施例中，基因修饰多肽的接头和RT结构域包含具有SEQ ID NO：1-7743中任一个的氨基酸序列的基因修饰多肽的接头和RT结构域的氨基酸序列(或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列)。在某些实施例中，基因修饰多肽的接头和RT结构域包含与SEQ ID NO：1-7743中任一个的接头和RT结构域具有至少80％同一性的接头和RT结构域的氨基酸序列。在某些实施例中，基因修饰多肽的接头和RT结构域包含与SEQ ID NO：1-7743中任一个的接头和RT结构域具有至少90％同一性的接头和RT结构域的氨基酸序列。在某些实施例中，基因修饰多肽的接头和RT结构域包含与SEQID NO：1-7743中任一个的接头和RT结构域具有至少95％同一性的接头和RT结构域的氨基酸序列。在某些实施例中，基因修饰多肽的接头和RT结构域包含与SEQ ID NO：1-7743中任一个的接头和RT结构域具有至少99％同一性的接头和RT结构域的氨基酸序列。在某些实施例中，基因修饰多肽的接头和RT结构域包含具有SEQ ID NO：6001-7743中任一个的氨基酸序列的基因修饰多肽的接头和RT结构域的氨基酸序列(或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列)。在某些实施例中，基因修饰多肽的接头和RT结构域包含具有SEQ ID NO：4501-4541中任一个的氨基酸序列的基因修饰多肽的接头和RT结构域的氨基酸序列(或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列)。在某些实施例中，基因修饰多肽的接头和RT结构域包含来自表A1、T1或T2中任一个的单行(例如，来自表A1、T1或T2中任一个中所列的单个示例性基因修饰多肽)的接头和RT结构域的氨基酸序列(或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列)。In certain embodiments, the linker and RT domain of the gene modifying polypeptide comprises an amino acid sequence of a linker and RT domain of a gene modifying polypeptide having an amino acid sequence of any one of SEQ ID NOs: 1-7743 (or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto). In certain embodiments, the linker and RT domain of the gene modifying polypeptide comprises an amino acid sequence of a linker and RT domain having at least 80% identity with a linker and RT domain of any one of SEQ ID NOs: 1-7743. In certain embodiments, the linker and RT domain of the gene modifying polypeptide comprises an amino acid sequence of a linker and RT domain having at least 90% identity with a linker and RT domain of any one of SEQ ID NOs: 1-7743. In certain embodiments, the linker and RT domain of the gene modifying polypeptide comprises an amino acid sequence of a linker and RT domain having at least 95% identity with a linker and RT domain of any one of SEQ ID NOs: 1-7743. In certain embodiments, the linker and RT domain of the gene modifying polypeptide comprises an amino acid sequence of a linker and RT domain having at least 99% identity to the linker and RT domain of any one of SEQ ID NOs: 1-7743. In certain embodiments, the linker and RT domain of the gene modifying polypeptide comprises an amino acid sequence of a linker and RT domain of a gene modifying polypeptide having an amino acid sequence of any one of SEQ ID NOs: 6001-7743 (or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereof). In certain embodiments, the linker and RT domain of the gene modifying polypeptide comprises an amino acid sequence of a linker and RT domain of a gene modifying polypeptide having an amino acid sequence of any one of SEQ ID NOs: 4501-4541 (or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereof). In certain embodiments, the linker and RT domain of the gene modifying polypeptide comprises an amino acid sequence of a linker and RT domain from a single row of any one of Tables A1, T1 or T2 (e.g., from a single exemplary gene modifying polypeptide listed in any one of Tables A1, T1 or T2) (or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto).

在某些实施例中，基因修饰多肽的接头和RT结构域包含来自选自SEQ ID NO：1-7743的两个不同氨基酸序列的接头和RT结构域的氨基酸序列(或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列)。在某些实施例中，基因修饰多肽的接头和RT结构域包含来自表A1、T1或T2中任一个的不同行的接头和RT结构域的氨基酸序列(或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列)。In certain embodiments, the linker and RT domain of the gene modifying polypeptide comprise an amino acid sequence of a linker and RT domain from two different amino acid sequences selected from SEQ ID NOs: 1-7743 (or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereof). In certain embodiments, the linker and RT domain of the gene modifying polypeptide comprise an amino acid sequence of a linker and RT domain from different rows of any one of Tables A1, T1 or T2 (or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereof).

在某些实施例中，基因修饰多肽进一步包含第一NLS(例如，5’NLS)，例如如本文所述。在某些实施例中，基因修饰多肽进一步包含第二NLS(例如，3’NLS)，例如如本文所述。在某些实施例中，基因修饰多肽进一步包含N末端甲硫氨酸残基。In certain embodiments, the genetically modified polypeptide further comprises a first NLS (e.g., a 5' NLS), e.g., as described herein. In certain embodiments, the genetically modified polypeptide further comprises a second NLS (e.g., a 3' NLS), e.g., as described herein. In certain embodiments, the genetically modified polypeptide further comprises an N-terminal methionine residue.

RT家族和突变体RT families and mutants

在某些实施例中，基因修饰多肽包含来自选自以下家族的RT结构域序列的氨基酸序列：AVIRE、BAEVM、FFV、FLV、FOAMV、GALV、KORV、MLVAV、MLVBM、MLVCB、MLVFF、MLVMS、PERV、SFV1、SFV3L、WMSV、XMRV6、BLVAU、BLVJ、HTL1A、HTL1C、HTL1L、HTL32、HTL3P、HTLV2、JSRV、MLVF5、MLVRD、MMTVB、MPMV、SFVCP、SMRVH、SRV1、SRV2和WDSV。在某些实施例中，基因修饰多肽包含来自选自以下家族的RT结构域序列的氨基酸序列：AVIRE、BAEVM、FFV、FLV、FOAMV、GALV、KORV、MLVAV、MLVBM、MLVCB、MLVFF、MLVMS、PERV、SFV1、SFV3L、WMSV和XMRV6。In certain embodiments, the gene modifying polypeptide comprises an amino acid sequence from an RT domain sequence selected from the following families: AVIRE, BAEVM, FFV, FLV, FOAMV, GALV, KORV, MLVAV, MLVBM, MLVCB, MLVFF, MLVMS, PERV, SFV1, SFV3L, WMSV, XMRV6, BLVAU, BLVJ, HTL1A, HTL1C, HTL1L, HTL32, HTL3P, HTLV2, JSRV, MLVF5, MLVRD, MMTVB, MPMV, SFVCP, SMRVH, SRV1, SRV2, and WDSV. In certain embodiments, the gene modifying polypeptide comprises an amino acid sequence from an RT domain sequence selected from the following families: AVIRE, BAEVM, FFV, FLV, FOAMV, GALV, KORV, MLVAV, MLVBM, MLVCB, MLVFF, MLVMS, PERV, SFV1, SFV3L, WMSV, and XMRV6.

在某些实施例中，基因修饰多肽包含来自MLVMS RT结构域的RT结构域序列的氨基酸序列。在实施例中，RT结构域序列的氨基酸序列包含表M1第1列所列的一个或多个点突变，或与其相对应的点突变。在实施例中，RT结构域序列的氨基酸序列包含表M1第3列(Gen1MLVMS)所列的一个或多个点突变，或与其相对应的点突变。在实施例中，RT结构域序列的氨基酸序列包含表M2第1列和第2列所列的RT结构域的氨基酸位置处的一个或多个点突变，或与其相对应的氨基酸位置处的一个或多个点突变。In certain embodiments, the genetically modified polypeptide comprises an amino acid sequence of an RT domain sequence from an MLVMS RT domain. In embodiments, the amino acid sequence of the RT domain sequence comprises one or more point mutations listed in column 1 of Table M1, or point mutations corresponding thereto. In embodiments, the amino acid sequence of the RT domain sequence comprises one or more point mutations listed in column 3 of Table M1 (Gen1MLVMS), or point mutations corresponding thereto. In embodiments, the amino acid sequence of the RT domain sequence comprises one or more point mutations at the amino acid positions of the RT domain listed in columns 1 and 2 of Table M2, or one or more point mutations at amino acid positions corresponding thereto.

在某些实施例中，基因修饰多肽包含来自AVIRE RT结构域的RT结构域序列的氨基酸序列。在实施例中，RT结构域序列的氨基酸序列包含表M1第2列所列的一个或多个点突变，或与其相对应的点突变。在实施例中，RT结构域序列的氨基酸序列包含表M1第4列(Gen2AVIRE)所列的一个或多个点突变，或与其相对应的点突变。在实施例中，RT结构域序列的氨基酸序列包含表M2第3列和第4列所列的RT结构域的氨基酸位置处的一个或多个点突变，或与其相对应的氨基酸位置处的一个或多个点突变。在某些实施例中，RT结构域包含IENSSP(例如，在C末端)。In certain embodiments, the genetically modified polypeptide comprises an amino acid sequence of an RT domain sequence from an AVIRE RT domain. In embodiments, the amino acid sequence of the RT domain sequence comprises one or more point mutations listed in column 2 of Table M1, or point mutations corresponding thereto. In embodiments, the amino acid sequence of the RT domain sequence comprises one or more point mutations listed in column 4 of Table M1 (Gen2AVIRE), or point mutations corresponding thereto. In embodiments, the amino acid sequence of the RT domain sequence comprises one or more point mutations at the amino acid positions of the RT domain listed in columns 3 and 4 of Table M2, or one or more point mutations at amino acid positions corresponding thereto. In certain embodiments, the RT domain comprises IENSSP (e.g., at the C-terminus).

表M1.MLVMS和AVIRE RT结构域中的示例性点突变Table M1. Exemplary point mutations in the MLVMS and AVIRE RT domains

表M2.示例性MLVMS和AVIRE RT结构域中可突变的位置Table M2. Exemplary positions that can be mutated in MLVMS and AVIRE RT domains

在某些实施例中，基因修饰多肽包含γ逆转录病毒来源的RT结构域。在某些实施例中，基因修饰多肽的γ逆转录病毒来源的RT结构域包含来自选自以下家族的RT结构域序列的氨基酸序列：AVIRE、BAEVM、FFV、FLV、FOAMV、GALV、KORV、MLVAV、MLVBM、MLVCB、MLVFF、MLVMS、PERV、SFV1、SFV3L、WMSV和XMRV6。在一些实施例中，基因修饰多肽的γ逆转录病毒来源的RT结构域并非源自PERV。在一些实施例中，所述RT包括表2A所示的一个、两个、三个、四个、五个、六个或更多个突变，其对应于鼠白血病病毒逆转录酶的RT结构域中的突变D200N、L603W、T330P、D524G、E562Q、D583N、P51L、S67R、E67K、T197A、H204R、E302K、F309N、W313F、L435G、N454K、H594Q、L671P、E69K或D653N。在一些实施例中，基因修饰多肽进一步包含接头，该接头与SEQ ID NO：1-7743中任一个的接头结构域具有至少99％同一性。在一些实施例中，基因修饰多肽进一步包含接头，该接头与SEQ ID NO：5217或SEQ ID NO：11,041具有至少99％或100％同一性。In certain embodiments, the genetically modified polypeptide comprises a RT domain of gamma retrovirus origin. In certain embodiments, the RT domain of gamma retrovirus origin of the genetically modified polypeptide comprises an amino acid sequence from an RT domain sequence selected from the following families: AVIRE, BAEVM, FFV, FLV, FOAMV, GALV, KORV, MLVAV, MLVBM, MLVCB, MLVFF, MLVMS, PERV, SFV1, SFV3L, WMSV and XMRV6. In some embodiments, the RT domain of gamma retrovirus origin of the genetically modified polypeptide is not derived from PERV. In some embodiments, the RT comprises one, two, three, four, five, six or more mutations shown in Table 2A, corresponding to mutations D200N, L603W, T330P, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H204R, E302K, F309N, W313F, L435G, N454K, H594Q, L671P, E69K or D653N in the RT domain of murine leukemia virus reverse transcriptase. In some embodiments, the genetically modified polypeptide further comprises a linker having at least 99% identity to the linker domain of any one of SEQ ID NOs: 1-7743. In some embodiments, the genetically modified polypeptide further comprises a linker having at least 99% or 100% identity to SEQ ID NO: 5217 or SEQ ID NO: 11,041.

在实施例中，RT结构域包含AVIRE RT的RT结构域的氨基酸序列(例如AVIRE_P03360序列，例如SEQ ID NO：8001)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、G330P、L605W、T306K和W313F组成的组的，或同源RT结构域中的对应位置的一、二、三、四或五个突变的AVIRE RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、G330P和L605W组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的AVIRE RT的氨基酸序列。In embodiments, the RT domain comprises an amino acid sequence of the RT domain of AVIRE RT (e.g., an AVIRE_P03360 sequence, e.g., SEQ ID NO: 8001), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of AVIRE RT further comprising one, two, three, four, or five mutations selected from the group consisting of D200N, G330P, L605W, T306K, and W313F, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of AVIRE RT further comprising one, two, or three mutations selected from the group consisting of D200N, G330P, and L605W, or corresponding positions in homologous RT domains.

在实施例中，RT结构域包含BAEVM RT的RT结构域的氨基酸序列(例如BAEVM_P10272序列，例如SEQ ID NO：8004)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D198N、E328P、L602W、T304K和W311F组成的组的，或同源RT结构域中的对应位置的一、二、三、四或五个突变的BAEVM RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D198N、E328P和L602W组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的BAEVM RT的氨基酸序列。In an embodiment, the RT domain comprises an amino acid sequence of an RT domain of BAEVM RT (e.g., a BAEVM_P10272 sequence, e.g., SEQ ID NO: 8004), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of BAEVM RT further comprising one, two, three, four, or five mutations selected from the group consisting of D198N, E328P, L602W, T304K, and W311F, or corresponding positions in a homologous RT domain. In some embodiments, the RT domain comprises an amino acid sequence of BAEVM RT further comprising one, two, or three mutations selected from the group consisting of D198N, E328P, and L602W, or corresponding positions in a homologous RT domain.

在实施例中，RT结构域包含FFV RT的RT结构域的氨基酸序列(例如FFV_O93209序列，例如SEQ ID NO：8012)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D21N、T293N、T419P和L393K组成的组的，或同源RT结构域中的对应位置的一、二、三、或四个突变的FFV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D21N、T293N和T419P组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的FFV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含突变D21N的FFV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由T207N、T333P和L307K组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的FFV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由T207N和T333P组成的组的，或同源RT结构域中的对应位置的一或二个突变的FFVRT的氨基酸序列。In an embodiment, the RT domain comprises an amino acid sequence of the RT domain of FFV RT (e.g., FFV_093209 sequence, e.g., SEQ ID NO: 8012), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of FFV RT further comprising one, two, three, or four mutations selected from the group consisting of D21N, T293N, T419P and L393K, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of FFV RT further comprising one, two, or three mutations selected from the group consisting of D21N, T293N and T419P, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of FFV RT further comprising mutation D21N. In some embodiments, the RT domain comprises an amino acid sequence of FFV RT further comprising one, two, or three mutations selected from the group consisting of T207N, T333P, and L307K, or corresponding positions in a homologous RT domain. In some embodiments, the RT domain comprises an amino acid sequence of FFV RT further comprising one or two mutations selected from the group consisting of T207N and T333P, or corresponding positions in a homologous RT domain.

在实施例中，RT结构域包含FLV RT的RT结构域的氨基酸序列(例如FLV_P10273序列，例如SEQ ID NO：8019)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D199N、L602W、T305K和W312F组成的组的，或同源RT结构域中的对应位置的一、二、三、或四个突变的FLV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D199N和L602W组成的组的，或同源RT结构域中的对应位置的一或二个突变的FLV RT的氨基酸序列。In an embodiment, the RT domain comprises an amino acid sequence of the RT domain of FLV RT (e.g., FLV_P10273 sequence, e.g., SEQ ID NO: 8019), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of FLV RT further comprising one, two, three, or four mutations selected from the group consisting of D199N, L602W, T305K and W312F, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of FLV RT further comprising one or two mutations selected from the group consisting of D199N and L602W, or corresponding positions in homologous RT domains.

在实施例中，RT结构域包含FOAMV RT的RT结构域的氨基酸序列(例如FOAMV_P14350序列，例如SEQ ID NO：8021)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D24N、T296N、S420P和L396K组成的组的，或同源RT结构域中的对应位置的一、二、三、或四个突变的FOAMVRT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D24N、T296N和S420P组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的FOAMV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含突变D24N、或同源RT结构域中的对应位置的FOAMV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由T207N、S331P和L307K组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的FOAMV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由T207N和S331P组成的组的，或同源RT结构域中的对应位置的一或二个突变的FOAMV RT的氨基酸序列。In an embodiment, the RT domain comprises an amino acid sequence of the RT domain of FOAMV RT (e.g., a FOAMV_P14350 sequence, e.g., SEQ ID NO: 8021), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of FOAMV RT further comprising one, two, three, or four mutations selected from the group consisting of D24N, T296N, S420P, and L396K, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of FOAMV RT further comprising one, two, or three mutations selected from the group consisting of D24N, T296N, and S420P, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of FOAMV RT further comprising mutation D24N, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of FOAMV RT further comprising one, two, or three mutations selected from the group consisting of T207N, S331P, and L307K, or corresponding positions in a homologous RT domain. In some embodiments, the RT domain comprises an amino acid sequence of FOAMV RT further comprising one or two mutations selected from the group consisting of T207N and S331P, or corresponding positions in a homologous RT domain.

在实施例中，RT结构域包含GALV RT的RT结构域的氨基酸序列(例如GALV_P21414序列，例如SEQ ID NO：8027)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D198N、E328P、L600W、T304K和W311F组成的组的，或同源RT结构域中的对应位置的一、二、三、四或五个突变的GALV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D198N、E328P和L600W组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的GALV RT的氨基酸序列。In embodiments, the RT domain comprises an amino acid sequence of an RT domain of GALV RT (e.g., a GALV_P21414 sequence, e.g., SEQ ID NO: 8027), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of GALV RT further comprising one, two, three, four, or five mutations selected from the group consisting of D198N, E328P, L600W, T304K, and W311F, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of GALV RT further comprising one, two, or three mutations selected from the group consisting of D198N, E328P, and L600W, or corresponding positions in homologous RT domains.

在实施例中，RT结构域包含KORV RT的RT结构域的氨基酸序列(例如KORV_Q9TTC1序列，例如SEQ ID NO：8047)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D32N、D322N、E452P、L274W、T428K和W435F组成的组的，或同源RT结构域中的对应位置的一、二、三、四、五或六个突变的GALV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D32N、D322N、E452P和L274W组成的组的，或同源RT结构域中的对应位置的一、二、三、或四个突变的GALV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含突变D32N的GALV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D231N、E361P、L633W、T337K和W344F组成的组的，或同源RT结构域中的对应位置的一、二、三、四或五个突变的KORV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D231N、E361P和L633W组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的KORV RT的氨基酸序列。In embodiments, the RT domain comprises an amino acid sequence of the RT domain of KORV RT (e.g., a KORV_Q9TTC1 sequence, e.g., SEQ ID NO: 8047), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of GALV RT further comprising one, two, three, four, five, or six mutations selected from the group consisting of D32N, D322N, E452P, L274W, T428K, and W435F, or corresponding positions in a homologous RT domain. In some embodiments, the RT domain comprises an amino acid sequence of GALV RT further comprising one, two, three, or four mutations selected from the group consisting of D32N, D322N, E452P, and L274W, or corresponding positions in a homologous RT domain. In some embodiments, the RT domain comprises an amino acid sequence of GALV RT further comprising mutation D32N. In some embodiments, the RT domain comprises an amino acid sequence of KORV RT further comprising one, two, three, four or five mutations selected from the group consisting of D231N, E361P, L633W, T337K and W344F, or corresponding positions in a homologous RT domain. In some embodiments, the RT domain comprises an amino acid sequence of KORV RT further comprising one, two, or three mutations selected from the group consisting of D231N, E361P and L633W, or corresponding positions in a homologous RT domain.

在实施例中，RT结构域包含MLVAV RT的RT结构域的氨基酸序列(例如MLVAV_P03356序列，例如SEQ ID NO：8053)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、T330P、L603W、T306K和W313F组成的组的，或同源RT结构域中的对应位置的一、二、三、四或五个突变的MLVAV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、T330P和L603W组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的MLVAV RT的氨基酸序列。In embodiments, the RT domain comprises an amino acid sequence of the RT domain of MLVAV RT (e.g., MLVAV_P03356 sequence, e.g., SEQ ID NO: 8053), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of MLVAV RT further comprising one, two, three, four, or five mutations selected from the group consisting of D200N, T330P, L603W, T306K, and W313F, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of MLVAV RT further comprising one, two, or three mutations selected from the group consisting of D200N, T330P, and L603W, or corresponding positions in homologous RT domains.

在实施例中，RT结构域包含MLVBM RT的RT结构域的氨基酸序列(例如MLVBM_Q7SVK7序列，例如SEQ ID NO：8056)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D199N、T329P、L602W、T305K和W312F组成的组的，或同源RT结构域中的对应位置的一、二、三、四或五个突变的MLVBM RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、T330P和L603W组成的组的，或同源RT结构域中的对应位置的一、二、和三个突变的MLVBM RT的氨基酸序列。In an embodiment, the RT domain comprises an amino acid sequence of the RT domain of MLVBM RT (e.g., MLVBM_Q7SVK7 sequence, e.g., SEQ ID NO: 8056), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of MLVBM RT further comprising one, two, three, four, or five mutations selected from the group consisting of D199N, T329P, L602W, T305K, and W312F, or corresponding positions in a homologous RT domain. In some embodiments, the RT domain comprises an amino acid sequence of MLVBM RT further comprising one, two, and three mutations selected from the group consisting of D200N, T330P, and L603W, or corresponding positions in a homologous RT domain.

在实施例中，RT结构域包含MLVCB RT的RT结构域的氨基酸序列(例如MLVCB_P08361序列，例如SEQ ID NO：8062)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、T330P、L603W、T306K和W313F组成的组的，或同源RT结构域中的对应位置的一、二、三、四或五个突变的MLVCB RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、T330P和L603W组成的组的，或同源RT结构域中的对应位置的一、二、和三个突变的MLVCB RT的氨基酸序列。In an embodiment, the RT domain comprises an amino acid sequence of the RT domain of MLVCB RT (e.g., MLVCB_P08361 sequence, e.g., SEQ ID NO: 8062), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of MLVCB RT further comprising one, two, three, four, or five mutations selected from the group consisting of D200N, T330P, L603W, T306K, and W313F, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of MLVCB RT further comprising one, two, and three mutations selected from the group consisting of D200N, T330P, and L603W, or corresponding positions in homologous RT domains.

在实施例中，RT结构域包含MLVFF RT的RT结构域的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、T330P、L603W、T306K和W313F组成的组的，或同源RT结构域中的对应位置的一、二、三、四或五个突变的MLVFF RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、T330P和L603W组成的组的，或同源RT结构域中的对应位置的一、二、和三个突变的MLVFF RT的氨基酸序列。In embodiments, the RT domain comprises the amino acid sequence of the RT domain of MLVFF RT, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of MLVFF RT further comprising one, two, three, four or five mutations selected from the group consisting of D200N, T330P, L603W, T306K and W313F, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises the amino acid sequence of MLVFF RT further comprising one, two, and three mutations selected from the group consisting of D200N, T330P and L603W, or corresponding positions in homologous RT domains.

在实施例中，RT结构域包含MLVMS RT的RT结构域的氨基酸序列(例如MLVMS_参考序列，例如SEQ ID NO：8137；或MLVMS_P03355序列，例如SEQ ID NO：8070)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、T330P、L603W、T306K、W313F和H8Y组成的组的，或同源RT结构域中的对应位置的一、二、三、四、五或六个突变的MLVMSRT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、T330P、L603W、T306K和W313F组成的组的，或同源RT结构域中的对应位置的一、二、三、四或五个突变的MLVMS RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、T330P和L603W组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的MLVMSRT的氨基酸序列。In embodiments, the RT domain comprises an amino acid sequence of an RT domain of MLVMS RT (e.g., a MLVMS_reference sequence, e.g., SEQ ID NO: 8137; or a MLVMS_P03355 sequence, e.g., SEQ ID NO: 8070), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of MLVMS RT further comprising one, two, three, four, five, or six mutations selected from the group consisting of D200N, T330P, L603W, T306K, W313F, and H8Y, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of MLVMS RT further comprising one, two, three, four, or five mutations selected from the group consisting of D200N, T330P, L603W, T306K, and W313F, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises the amino acid sequence of MLVMSRT further comprising one, two, or three mutations selected from the group consisting of D200N, T330P, and L603W, or corresponding positions in a homologous RT domain.

在实施例中，RT结构域包含PERV RT的RT结构域的氨基酸序列(例如PERV_Q4VFZ2序列，例如SEQ ID NO：8099)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D196N、E326P、L599W、T302K和W309F组成的组的，或同源RT结构域中的对应位置的一、二、三、四或五个突变的PERV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D196N、E326P和L599W组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的PERV RT的氨基酸序列。In an embodiment, the RT domain comprises an amino acid sequence of the RT domain of PERV RT (e.g., a PERV_Q4VFZ2 sequence, e.g., SEQ ID NO: 8099), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of PERV RT further comprising one, two, three, four, or five mutations selected from the group consisting of D196N, E326P, L599W, T302K, and W309F, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of PERV RT further comprising one, two, or three mutations selected from the group consisting of D196N, E326P, and L599W, or corresponding positions in homologous RT domains.

在实施例中，RT结构域包含SFV1 RT的RT结构域的氨基酸序列(例如SFV1_P23074序列，例如SEQ ID NO：8105)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D24N、T296N、N420P和L396K组成的组的，或同源RT结构域中的对应位置的一、二、三、或四个突变的SFV1 RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D24N、T296N和N420P组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的SFV1 RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含D24N、或同源RT结构域中的对应位置的SFV1 RT的氨基酸序列。In embodiments, the RT domain comprises an amino acid sequence of the RT domain of SFV1 RT (e.g., SFV1_P23074 sequence, e.g., SEQ ID NO: 8105), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of SFV1 RT further comprising one, two, three, or four mutations selected from the group consisting of D24N, T296N, N420P, and L396K, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of SFV1 RT further comprising one, two, or three mutations selected from the group consisting of D24N, T296N, and N420P, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of SFV1 RT further comprising D24N, or corresponding positions in homologous RT domains.

在实施例中，RT结构域包含SFV3L RT的RT结构域的氨基酸序列(例如SFV3L_P27401序列，例如SEQ ID NO：8111)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D24N、T296N、N422P和L396K组成的组的，或同源RT结构域中的对应位置的一、二、三、或四个突变的SFV3LRT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D24N、T296N和N422P组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的SFV3L RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含突变D24N、或同源RT结构域中的对应位置的SFV3L RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由T307N、N333P和L307K组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的SFV3L RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由T307N和N333P组成的组的，或同源RT结构域中的对应位置的一或二个突变的SFV3L RT的氨基酸序列。In embodiments, the RT domain comprises an amino acid sequence of the RT domain of SFV3L RT (e.g., SFV3L_P27401 sequence, e.g., SEQ ID NO: 8111), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of SFV3L RT further comprising one, two, three, or four mutations selected from the group consisting of D24N, T296N, N422P, and L396K, or corresponding positions in a homologous RT domain. In some embodiments, the RT domain comprises an amino acid sequence of SFV3L RT further comprising one, two, or three mutations selected from the group consisting of D24N, T296N, and N422P, or corresponding positions in a homologous RT domain. In some embodiments, the RT domain comprises an amino acid sequence of SFV3L RT further comprising mutation D24N, or corresponding positions in a homologous RT domain. In some embodiments, the RT domain comprises an amino acid sequence of SFV3L RT further comprising one, two, or three mutations selected from the group consisting of T307N, N333P, and L307K, or corresponding positions in a homologous RT domain. In some embodiments, the RT domain comprises an amino acid sequence of SFV3L RT further comprising one or two mutations selected from the group consisting of T307N and N333P, or corresponding positions in a homologous RT domain.

在实施例中，RT结构域包含WMSV RT的RT结构域的氨基酸序列(例如WMSV_P03359序列，例如SEQ ID NO：8131)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D198N、E328P、L600W、T304K和W311F组成的组的，或同源RT结构域中的对应位置的一、二、三、四或五个突变的WMSV RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D198N、E328P和L600W组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的WMSV RT的氨基酸序列。In embodiments, the RT domain comprises an amino acid sequence of the RT domain of WMSV RT (e.g., a WMSV_P03359 sequence, e.g., SEQ ID NO: 8131), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of WMSV RT further comprising one, two, three, four, or five mutations selected from the group consisting of D198N, E328P, L600W, T304K, and W311F, or corresponding positions in homologous RT domains. In some embodiments, the RT domain comprises an amino acid sequence of WMSV RT further comprising one, two, or three mutations selected from the group consisting of D198N, E328P, and L600W, or corresponding positions in homologous RT domains.

在实施例中，RT结构域包含XMRV6 RT的RT结构域的氨基酸序列(例如XMRV6_A1Z651序列，例如SEQ ID NO：8134)，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、T330P、L603W、T306K和W313F组成的组的，或同源RT结构域中的对应位置的一、二、三、四或五个突变的XMRV6 RT的氨基酸序列。在一些实施例中，RT结构域包含进一步包含选自由D200N、T330P和L603W组成的组的，或同源RT结构域中的对应位置的一、二、或三个突变的XMRV6 RT的氨基酸序列。In embodiments, the RT domain comprises an amino acid sequence of an RT domain of XMRV6 RT (e.g., an XMRV6_A1Z651 sequence, e.g., SEQ ID NO: 8134), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises an amino acid sequence of an XMRV6 RT further comprising one, two, three, four, or five mutations selected from the group consisting of D200N, T330P, L603W, T306K, and W313F, or corresponding positions in a homologous RT domain. In some embodiments, the RT domain comprises an amino acid sequence of an XMRV6 RT further comprising one, two, or three mutations selected from the group consisting of D200N, T330P, and L603W, or corresponding positions in a homologous RT domain.

在某些实施例中，基因修饰多肽的RT结构域包含AVIRE RT的RT结构域的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在实施例中，RT结构域包含表A5第1列所列序列中包含的RT结构域的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽进一步包含接头，该接头与SEQ ID NO：5217或SEQ ID NO：11,041具有至少99％或100％同一性。In certain embodiments, the RT domain of the genetically modified polypeptide comprises the amino acid sequence of the RT domain of AVIRE RT, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In embodiments, the RT domain comprises the amino acid sequence of the RT domain contained in the sequence listed in column 1 of Table A5, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the genetically modified polypeptide further comprises a linker having at least 99% or 100% identity to SEQ ID NO: 5217 or SEQ ID NO: 11,041.

在某些实施例中，基因修饰多肽的RT结构域包含MLVMS RT的RT结构域的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在实施例中，RT结构域包含表A5第2-6列中任一个所列序列中包含的RT结构域的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在一些实施例中，基因修饰多肽进一步包含接头，该接头与SEQ ID NO：5217或SEQ ID NO：11,041具有至少99％或100％同一性。In certain embodiments, the RT domain of the genetically modified polypeptide comprises the amino acid sequence of the RT domain of MLVMS RT, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In embodiments, the RT domain comprises the amino acid sequence of the RT domain contained in any one of the sequences listed in columns 2-6 of Table A5, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In some embodiments, the genetically modified polypeptide further comprises a linker having at least 99% or 100% identity to SEQ ID NO: 5217 or SEQ ID NO: 11,041.

表A5.包含AVIRE RT结构域或MLVMS RT结构域的示例性基因修饰多肽.Table A5. Exemplary gene-modifying polypeptides comprising an AVIRE RT domain or an MLVMS RT domain.

系统system

在一方面，本披露涉及一种系统，其包含编码基因修饰多肽(例如，如本文所述)的核酸分子和模板核酸(例如，模板RNA，例如，如本文所述)。在某些实施例中，编码基因修饰多肽的核酸分子相对于本文所述的核酸分子在编码区中(例如，在编码RT结构域的序列中)包含一个或多个沉默突变。在某些实施例中，该系统进一步包含gRNA(例如，与诱导切口的多肽结合的gRNA，例如在基因修饰多肽结合的靶DNA的相对链中)。On the one hand, the present disclosure relates to a system comprising a nucleic acid molecule encoding a gene-modified polypeptide (e.g., as described herein) and a template nucleic acid (e.g., a template RNA, e.g., as described herein). In certain embodiments, the nucleic acid molecule encoding the gene-modified polypeptide comprises one or more silent mutations in the coding region (e.g., in the sequence encoding the RT domain) relative to the nucleic acid molecule described herein. In certain embodiments, the system further comprises a gRNA (e.g., a gRNA bound to a polypeptide that induces a nick, e.g., in the opposite strand of the target DNA to which the gene-modified polypeptide binds).

在某些实施例中，编码基因修饰多肽的核酸分子编码多肽，该多肽具有选自SEQID NO：1-7743的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，编码基因修饰多肽的核酸分子编码多肽，该多肽具有选自SEQ ID NO：6001-7743的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，编码基因修饰多肽的核酸分子编码多肽，该多肽具有选自SEQ ID NO：4501-4541的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，编码基因修饰多肽的核酸分子编码如表A1、T1或T2中任一个所列的多肽，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In certain embodiments, the nucleic acid molecule encoding a genetically modified polypeptide encodes a polypeptide having an amino acid sequence selected from SEQ ID NO: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding a genetically modified polypeptide encodes a polypeptide having an amino acid sequence selected from SEQ ID NO: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding a genetically modified polypeptide encodes a polypeptide having an amino acid sequence selected from SEQ ID NO: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding the genetically modified polypeptide encodes a polypeptide as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

在某些实施例中，编码基因修饰多肽的核酸分子包含编码选自SEQ ID NO：1-7743的氨基酸序列的一部分的序列(其中该部分包含接头和RT结构域)，或与所述部分具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，编码基因修饰多肽的核酸分子包含编码选自SEQ ID NO：6001-7743的氨基酸序列的一部分的序列(其中该部分包含接头和RT结构域)，或与所述部分具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，编码基因修饰多肽的核酸分子包含编码选自SEQ ID NO：4501-4541的氨基酸序列的一部分的序列(其中该部分包含接头和RT结构域)，或与所述部分具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，编码基因修饰多肽的核酸分子包含编码表A1、T1或T2中任一个所列多肽的一部分的序列(其中该部分包含接头和RT结构域)，或与所述部分具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In certain embodiments, the nucleic acid molecule encoding a gene-modifying polypeptide comprises a sequence encoding a portion of an amino acid sequence selected from SEQ ID NOs: 1-7743, wherein the portion comprises a linker and an RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity with the portion. In certain embodiments, the nucleic acid molecule encoding a gene-modifying polypeptide comprises a sequence encoding a portion of an amino acid sequence selected from SEQ ID NOs: 6001-7743, wherein the portion comprises a linker and an RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity with the portion. In certain embodiments, the nucleic acid molecule encoding a gene-modifying polypeptide comprises a sequence encoding a portion of an amino acid sequence selected from SEQ ID NOs: 4501-4541, wherein the portion comprises a linker and an RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity with the portion. In certain embodiments, the nucleic acid molecule encoding the genetically modified polypeptide comprises a sequence encoding a portion of a polypeptide listed in any one of Tables A1, T1 or T2 (wherein the portion comprises a linker and an RT domain), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

在某些实施例中，编码基因修饰多肽的核酸分子包含编码选自SEQ ID NO：1-7743的氨基酸序列的接头的序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，编码基因修饰多肽的核酸分子包含编码具有选自SEQID NO：6001-7743的氨基酸序列的多肽的接头的序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，编码基因修饰多肽的核酸分子包含编码具有选自SEQ ID NO：4501-4541的氨基酸序列的多肽的接头的序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，编码基因修饰多肽的核酸分子包含编码如表A1、T1或T2中任一个所列多肽的接头的序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In certain embodiments, the nucleic acid molecule encoding a genetically modified polypeptide comprises a sequence encoding a linker of an amino acid sequence selected from SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding a genetically modified polypeptide comprises a sequence encoding a linker of a polypeptide having an amino acid sequence selected from SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding a genetically modified polypeptide comprises a sequence encoding a linker of a polypeptide having an amino acid sequence selected from SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding the gene-modified polypeptide comprises a sequence encoding a linker of a polypeptide as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

在某些实施例中，编码基因修饰多肽的核酸分子包含编码选自SEQ ID NO：1-7743的氨基酸序列的RT结构域的序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，编码基因修饰多肽的核酸分子包含编码具有选自SEQ ID NO：6001-7743的氨基酸序列的多肽的RT结构域的序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，编码基因修饰多肽的核酸分子包含编码具有选自SEQ ID NO：4501-4541的氨基酸序列的多肽的RT结构域的序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，编码基因修饰多肽的核酸分子包含编码如表A1、T1或T2中任一个所列多肽的RT结构域的序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In certain embodiments, the nucleic acid molecule encoding a genetically modified polypeptide comprises a sequence encoding the RT domain of an amino acid sequence selected from SEQ ID NO: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereof. In certain embodiments, the nucleic acid molecule encoding a genetically modified polypeptide comprises a sequence encoding the RT domain of a polypeptide having an amino acid sequence selected from SEQ ID NO: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereof. In certain embodiments, the nucleic acid molecule encoding a genetically modified polypeptide comprises a sequence encoding the RT domain of a polypeptide having an amino acid sequence selected from SEQ ID NO: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereof. In certain embodiments, the nucleic acid molecule encoding the genetically modified polypeptide comprises a sequence encoding the RT domain of a polypeptide as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

在一方面，本披露涉及一种系统，其包含基因修饰多肽(例如，如本文所述)和模板核酸(例如，模板RNA，例如，如本文所述)。In one aspect, the disclosure relates to a system comprising a gene modifying polypeptide (eg, as described herein) and a template nucleic acid (eg, a template RNA, eg, as described herein).

在某些实施例中，基因修饰多肽包含多肽，该多肽具有选自SEQ ID NO：1-7743的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，基因修饰多肽包含多肽，该多肽具有选自SEQ ID NO：6001-7743的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，基因修饰多肽包含多肽，该多肽具有选自SEQ ID NO：4501-4541的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，基因修饰多肽包含如表A1、T1或T2中任一个所列的多肽，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In certain embodiments, the genetically modified polypeptide comprises a polypeptide having an amino acid sequence selected from SEQ ID NO: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the genetically modified polypeptide comprises a polypeptide having an amino acid sequence selected from SEQ ID NO: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the genetically modified polypeptide comprises a polypeptide having an amino acid sequence selected from SEQ ID NO: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the genetically modified polypeptide comprises a polypeptide as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

在某些实施例中，基因修饰多肽包含选自SEQ ID NO：1-7743的氨基酸序列的一部分(其中该部分包含接头和RT结构域)，或与所述部分具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，基因修饰多肽包含选自SEQ IDNO：6001-7743的氨基酸序列的一部分(其中该部分包含接头和RT结构域)，或与所述部分具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，基因修饰多肽包含选自SEQ ID NO：4501-4541的氨基酸序列的一部分(其中该部分包含接头和RT结构域)，或与所述部分具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，基因修饰多肽包含表A1、T1或T2中任一个所列多肽的一部分(其中该部分包含接头和RT结构域)，或与所述部分具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In certain embodiments, the gene-modified polypeptide comprises a portion of an amino acid sequence selected from SEQ ID NOs: 1-7743, wherein the portion comprises a linker and an RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the gene-modified polypeptide comprises a portion of an amino acid sequence selected from SEQ ID NOs: 6001-7743, wherein the portion comprises a linker and an RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the gene-modified polypeptide comprises a portion of an amino acid sequence selected from SEQ ID NOs: 4501-4541, wherein the portion comprises a linker and an RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto. In certain embodiments, the genetically modified polypeptide comprises a portion of a polypeptide listed in any one of Tables A1, T1 or T2 (wherein the portion comprises a linker and an RT domain), or an amino acid sequence that is at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identical to the portion.

在某些实施例中，基因修饰多肽包含以下的接头：选自SEQ ID NO：1-7743的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，基因修饰多肽包含编码具有选自SEQ ID NO：6001-7743的氨基酸序列的多肽的接头的序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，基因修饰多肽包含编码具有选自SEQ ID NO：4501-4541的氨基酸序列的多肽的接头的序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，基因修饰多肽包含如表A1、T1或T2中任一个所列多肽的接头，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In certain embodiments, the genetically modified polypeptide comprises a linker comprising an amino acid sequence selected from SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereof. In certain embodiments, the genetically modified polypeptide comprises a sequence encoding a linker for a polypeptide having an amino acid sequence selected from SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereof. In certain embodiments, the genetically modified polypeptide comprises a sequence encoding a linker for a polypeptide having an amino acid sequence selected from SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereof. In certain embodiments, the gene-modified polypeptide comprises a linker of a polypeptide as listed in any one of Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

在某些实施例中，基因修饰多肽包含以下的RT结构域：选自SEQ ID NO：1-7743的氨基酸序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，基因修饰多肽包含编码具有选自SEQ ID NO：6001-7743的氨基酸序列的多肽的RT结构域的序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，基因修饰多肽包含编码具有选自SEQ ID NO：4501-4541的氨基酸序列的多肽的RT结构域的序列，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。在某些实施例中，基因修饰多肽包含如表A1、T1或T2中任一个所列多肽的RT结构域，或与其具有至少70％、75％、80％、85％、90％、95％或99％同一性的氨基酸序列。In certain embodiments, the genetically modified polypeptide comprises an RT domain of an amino acid sequence selected from SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the genetically modified polypeptide comprises a sequence encoding an RT domain of a polypeptide having an amino acid sequence selected from SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the genetically modified polypeptide comprises a sequence encoding an RT domain of a polypeptide having an amino acid sequence selected from SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the genetically modified polypeptide comprises the RT domain of any of the polypeptides listed in Tables A1, T1 or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

基因修饰系统的定位序列Targeting sequences for gene modification systems

在某些实施例中，基因编辑器系统RNA进一步包含细胞内定位序列，例如，核定位序列(NLS)。在一些实施例中，基因修饰多肽包含如SEQ ID NO：4000和/或SEQ ID NO：4001中包含的NLS，或具有与其具有至少75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的氨基酸序列的NLS。In certain embodiments, the gene editor system RNA further comprises an intracellular localization sequence, e.g., a nuclear localization sequence (NLS). In some embodiments, the gene modifying polypeptide comprises an NLS as contained in SEQ ID NO: 4000 and/or SEQ ID NO: 4001, or an NLS having an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

核定位序列可以是促进RNA输入细胞核中的RNA序列。在某些实施例中，核定位信号位于模板RNA上。在某些实施例中，基因修饰多肽在第一RNA上编码，并且模板RNA是第二单独RNA，并且核定位信号位于模板RNA上而不是在编码基因修饰多肽的RNA上。尽管不希望受理论的束缚，但是在一些实施例中，编码基因修饰多肽的RNA主要靶向细胞质以促进其翻译，而模板RNA主要靶向细胞核以促进其插入基因组。在一些实施例中，核定位信号在模板RNA的3’端、5′端或内部区域。在一些实施例中，核定位信号在异源序列的3’(例如，直接在异源序列的3’)或在异源序列的5’(例如，直接在异源序列的5’)。在一些实施例中，核定位信号被置于模板RNA的5′UTR之外或3′UTR之外。在一些实施例中，核定位信号放置在5’UTR和3’UTR之间，其中任选地，核定位信号不随转基因转录(例如，核定位信号是反义取向或在转录终止信号或聚腺苷酸化信号的下游)。在一些实施例中，核定位序列位于内含子内部。在一些实施例中，多个相同或不同的核定位信号在RNA中，例如在模板RNA中。在一些实施例中，核定位信号的长度小于5、10、25、50、75、100、150、200、250、300、350、400、450、500、600、700、800、900或1000bp。可以使用各种RNA核定位序列。例如，Lubelsky和Ulitsky，Nature[自然]555(107-111)，2018描述了RNA序列，其驱动RNA定位进入细胞核。在一些实施例中，核定位信号是SINE来源的核RNA定位(SIRLOIN)信号。在一些实施例中，核定位信号结合核富集蛋白。在一些实施例中，核定位信号结合HNRNPK蛋白。在一些实施例中，核定位信号富含嘧啶，例如是富含C/T、富含C/U、富含C、富含T或富含U的区域。在一些实施例中，核定位信号源自长非编码RNA。在一些实施例中，核定位信号源自MALAT1长非编码RNA或是MALAT1的600个核苷酸的M区(在Miyagawa等人，RNA 18，(738-751)，2012中描述)。在一些实施例中，核定位信号源自BORG长非编码RNA或为AGCCC基序(在Zhang等人，Molecular and CellularBiology[分子和细胞生物学]34，2318-2329(2014))中描述。在一些实施例中，核定位序列在Shukla等人，The EMBO Journal[EMBO杂志]e98452(2018)中描述。在一些实施例中，核定位信号源自逆转录病毒。The nuclear localization sequence can be an RNA sequence that promotes RNA to be imported into the nucleus. In certain embodiments, the nuclear localization signal is located on the template RNA. In certain embodiments, the gene modification polypeptide is encoded on the first RNA, and the template RNA is the second separate RNA, and the nuclear localization signal is located on the template RNA instead of on the RNA encoding the gene modification polypeptide. Although it is not desired to be bound by theory, in some embodiments, the RNA encoding the gene modification polypeptide mainly targets the cytoplasm to promote its translation, and the template RNA mainly targets the nucleus to promote its insertion into the genome. In some embodiments, the nuclear localization signal is at the 3' end, 5' end or internal region of the template RNA. In some embodiments, the nuclear localization signal is at the 3' end of the heterologous sequence (for example, directly at the 3' of the heterologous sequence) or at the 5' end of the heterologous sequence (for example, directly at the 5' of the heterologous sequence). In some embodiments, the nuclear localization signal is placed outside the 5'UTR of the template RNA or outside the 3'UTR. In some embodiments, the nuclear localization signal is placed between the 5'UTR and the 3'UTR, wherein optionally, the nuclear localization signal is not transcribed with the transgene (for example, the nuclear localization signal is antisense oriented or downstream of the transcription termination signal or polyadenylation signal). In some embodiments, the nuclear localization sequence is located inside an intron. In some embodiments, a plurality of identical or different nuclear localization signals are in RNA, for example, in a template RNA. In some embodiments, the length of the nuclear localization signal is less than 5, 10, 25, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900 or 1000 bp. Various RNA nuclear localization sequences can be used. For example, Lubelsky and Ulitsky, Nature [Nature] 555 (107-111), 2018 describes an RNA sequence that drives RNA localization into the nucleus. In some embodiments, the nuclear localization signal is a nuclear RNA localization (SIRLOIN) signal derived from SINE. In some embodiments, the nuclear localization signal binds to a nuclear enriched protein. In some embodiments, the nuclear localization signal binds to an HNRNPK protein. In some embodiments, the nuclear localization signal is rich in pyrimidines, for example, a region rich in C/T, rich in C/U, rich in C, rich in T or rich in U. In some embodiments, the nuclear localization signal is derived from long non-coding RNA. In some embodiments, the nuclear localization signal is derived from MALAT1 long non-coding RNA or the M region of 600 nucleotides of MALAT1 (described in Miyagawa et al., RNA 18, (738-751), 2012). In some embodiments, the nuclear localization signal is derived from BORG long non-coding RNA or is an AGCCC motif (described in Zhang et al., Molecular and Cellular Biology [Molecular and Cellular Biology] 34, 2318-2329 (2014)). In some embodiments, the nuclear localization sequence is described in Shukla et al., The EMBO Journal [EMBO Magazine] e98452 (2018). In some embodiments, the nuclear localization signal is derived from a retrovirus.

在一些实施例中，本文所述的多肽包含一个或多个(例如，2、3、4、5个)核靶向序列，例如核定位序列(NLS)。在一些实施例中，NLS是两组分NLS。在一些实施例中，NLS促进了包含NLS的蛋白质导入到细胞核中。在一些实施例中，将NLS与本文所述的基因修饰多肽的N末端融合。在一些实施例中，将NLS与基因修饰多肽的C末端融合。在一些实施例中，将NLS与Cas结构域的N末端或C末端融合。在一些实施例中，在NLS与基因修饰多肽的邻近结构域之间布置接头序列。In some embodiments, the polypeptides described herein comprise one or more (e.g., 2, 3, 4, 5) nuclear targeting sequences, such as nuclear localization sequences (NLS). In some embodiments, the NLS is a two-component NLS. In some embodiments, the NLS promotes the import of proteins comprising the NLS into the nucleus. In some embodiments, the NLS is fused to the N-terminus of a genetically modified polypeptide described herein. In some embodiments, the NLS is fused to the C-terminus of a genetically modified polypeptide. In some embodiments, the NLS is fused to the N-terminus or C-terminus of a Cas domain. In some embodiments, a linker sequence is arranged between the NLS and the adjacent domains of the genetically modified polypeptide.

在一些实施例中，NLS包含氨基酸序列MDSLLMNRRKFLYQFKNVRWAKGRRETYLC(SEQ IDNO：5009)、PKKRKVEGADKRTADGSEFESPKKKRKV(SEQ ID NO：5010)、RKSGKIAAIWKRPRKPKKKRKV(SEQ ID NO：5011)、KRTADGSEFESPKKKRKV(SEQ ID NO：5012)、KKTELQTTNAENKTKKL(SEQ IDNO：5013)、或KRGINDRNFWRGENGRKTR(SEQ ID NO：5014)、KRPAATKKAGQAKKKK(SEQ ID NO：5015)、PAAKRVKLD(SEQ ID NO：4644)、KRTADGSEFEKRTADGSEFESPKKKAKVE(SEQ ID NO：4649)、KRTADGSEFE(SEQ ID NO：4650)、KRTADGSEFESPKKKAKVE(SEQ ID NO：4651)、AGKRTADGSEFEKRTADGSEFESPKKKAKVE(SEQ ID NO：4001)，或其功能片段或变体。示例性NLS序列还描述于PCT/EP2000/011690中，该专利的内容针对其对示例性核定位序列的披露通过援引并入本文。在一些实施例中，NLS包含如表11中披露的氨基酸序列。该表的NLS可以与多肽的一个或多个拷贝在多肽中一个或多个位置使用，例如N末端结构域中、肽结构域之间、C末端结构域中或多个位置的组合中的1、2、3个或多个NLS拷贝，以改善细胞核的亚细胞定位。可以在单个多肽中使用多个独特的序列。序列可以是天然的单组分或两组分的，例如，具有一段或两段碱性氨基酸，或者可以用作嵌合两组分序列。序列参考对应于UniProt登录号，除非针对使用亚细胞定位预测算法挖掘的序列指示为SeqNLS(Lin等人BMCBioinformat[BMC生物信息学]13：157(2012)，通过援引以其全文并入本文)。In some embodiments, the NLS comprises the amino acid sequences MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 5009), PKKRKVEGADKRTADGSEFESPKKKRKV (SEQ ID NO: 5010), RKSGKIAAIWKRPRKPKKKRKV (SEQ ID NO: 5011), KRTADGSEFESPKKKRKV (SEQ ID NO: 5012), KKTELQTT NAENKTKKL(SEQ IDNO： 5013), or KRGINDRNFWRGENGRKTR (SEQ ID NO: 5014), KRPAATKKAGQAKKKK (SEQ ID NO: 5015), PAAKRVKLD (SEQ ID NO: 4644), KRTADGSEFEKRTADGSEFESPKKKAKVE (SEQ ID NO: 4649), KRTADGSEFE (SEQ ID NO: 4649), NO: 4650), KRTADGSEFESPKKKAKVE (SEQ ID NO: 4651), AGKRTADGSEFEKRTADGSEFESPKKKAKVE (SEQ ID NO: 4001), or a functional fragment or variant thereof. Exemplary NLS sequences are also described in PCT/EP2000/011690, the contents of which are directed to Its disclosure of exemplary nuclear localization sequences is incorporated herein by reference. In some embodiments, the NLS comprises an amino acid sequence as disclosed in Table 11. The NLS of this table can be present in one or more copies of a polypeptide in a polypeptide. Multiple locations are used, such as 1, 2, 3 or more copies of the NLS in the N-terminal domain, between peptide domains, in the C-terminal domain, or a combination of multiple locations to improve subcellular localization in the nucleus Multiple unique sequences may be used in a single polypeptide. The sequences may be mono- or bi-component in nature, e.g., having one or two stretches of basic amino acids, or may be used as chimeric bi-component sequences. Sequence references correspond to UniProt accession numbers unless indicated as SeqNLS for sequences mined using the subcellular localization prediction algorithm (Lin et al. BMC Bioinformat [BMC Bioinformatics] 13: 157 (2012), incorporated herein by reference in its entirety).

表11用于基因修饰系统的示例性核定位信号Table 11 Exemplary nuclear localization signals for gene modification systems

在一些实施例中，NLS是两组分NLS。两组分NLS典型地包含由间隔子序列(其长度可以是例如约10个氨基酸)间隔开的两个碱性氨基酸簇。单组分NLS典型地缺乏间隔子。两组分NLS的实例是核浆素NLS，具有序列KR[PAATKKAGQA]KKKK(SEQ ID NO：5015)，其中间隔子置于括号内。另一个示例性两组分NLS具有序列PKKKRKVEGADKRTADGSEFESPKKKRKV(SEQID NO：5016)。示例性NLS描述于国际申请WO 2020051561中，该申请通过援引以其全文并入本文，包括其关于核定位序列的披露。In some embodiments, the NLS is a two-component NLS. A two-component NLS typically comprises two basic amino acid clusters separated by a spacer sequence (whose length can be, for example, about 10 amino acids). A single-component NLS typically lacks a spacer. An example of a two-component NLS is a nucleoplasmin NLS, having the sequence KR[PAATKKAGQA]KKKK (SEQ ID NO: 5015), wherein the spacer is placed in brackets. Another exemplary two-component NLS has the sequence PKKKRKVEGADKRTADGSEFESPKKKRKV (SEQ ID NO: 5016). Exemplary NLSs are described in International Application WO 2020051561, which is incorporated herein by reference in its entirety, including its disclosure about nuclear localization sequences.

在某些实施例中，基因编辑器系统多肽(例如，如本文所述的基因修饰多肽)进一步包含细胞内定位序列，例如，核定位序列和/或核仁定位序列。核定位序列和/或核仁定位序列可以是促进蛋白质输入到核和/或核仁中的氨基酸序列，其中它可以促进异源序列整合到基因组中。在某些实施例中，基因编辑器系统多肽(例如，如本文所述的基因修饰多肽)进一步包含核仁定位序列。在某些实施例中，基因修饰多肽在第一RNA上编码，模板RNA是第二单独RNA，并且核仁定位信号在编码基因修饰多肽的RNA上编码，而不在模板RNA上。在一些实施例中，核仁定位信号位于多肽的N末端、C末端或内部区域。在一些实施例中，使用多个相同或不同的核仁定位信号。在一些实施例中，核定位信号的长度小于5、10、25、50、75或100个氨基酸。可以使用各种多肽核仁定位信号。例如，Yang等人，Journal of BiomedicalScience[生物化学科学杂志]22，33(2015)描述了一种核定位信号，其也起着核仁定位信号的作用。在一些实施例中，核仁定位信号也可以是核定位信号。在一些实施例中，核仁定位信号可以与核定位信号重叠。在一些实施例中，核仁定位信号可包含碱性残基区段。在一些实施例中，核仁定位信号可以富含精氨酸和赖氨酸残基。在一些实施例中，核仁定位信号可以源自在核仁中富集的蛋白质。在一些实施例中，核仁定位信号可以源自在核糖体RNA基因座处富集的蛋白质。在一些实施例中，核仁定位信号可以源自结合rRNA的蛋白质。在一些实施例中，核仁定位信号可以源自MSP58。在一些实施例中，核仁定位信号可以是单组分基序。在一些实施例中，核仁定位信号可以是两组分基序。在一些实施例中，核仁定位信号可以由多个单组分或两组分基序组成。在一些实施例中，核仁定位信号可以由单组分和两组分基序的混合物组成。在一些实施例中，核仁定位信号可以是双重两组分基序。在一些实施例中，核仁定位基序可以是KRASSQALGTIPKRRSSSRFIKRKK(SEQ ID NO：5017)。在一些实施例中，核仁定位信号可以源自核因子-κB诱导激酶。在一些实施例中，核仁定位信号可以是RKKRKKK基序(SEQ ID NO：5018)(在Birbach等人，Journal of Cell Science[细胞科学杂志]，117(3615-3624)，2004中描述)。In certain embodiments, the gene editor system polypeptide (e.g., a gene modification polypeptide as described herein) further comprises an intracellular localization sequence, for example, a nuclear localization sequence and/or a nucleolar localization sequence. The nuclear localization sequence and/or the nucleolar localization sequence can be an amino acid sequence that promotes protein import into the nucleus and/or the nucleolus, wherein it can promote the integration of heterologous sequences into the genome. In certain embodiments, the gene editor system polypeptide (e.g., a gene modification polypeptide as described herein) further comprises a nucleolar localization sequence. In certain embodiments, the gene modification polypeptide is encoded on a first RNA, the template RNA is a second separate RNA, and the nucleolar localization signal is encoded on the RNA encoding the gene modification polypeptide, and not on the template RNA. In some embodiments, the nucleolar localization signal is located at the N-terminus, C-terminus, or internal region of the polypeptide. In some embodiments, a plurality of identical or different nucleolar localization signals are used. In some embodiments, the length of the nuclear localization signal is less than 5, 10, 25, 50, 75, or 100 amino acids. Various polypeptide nucleolar localization signals can be used. For example, Yang et al., Journal of Biomedical Science 22, 33 (2015) describes a nuclear localization signal that also functions as a nucleolar localization signal. In some embodiments, a nucleolar localization signal may also be a nuclear localization signal. In some embodiments, a nucleolar localization signal may overlap with a nuclear localization signal. In some embodiments, a nucleolar localization signal may comprise a segment of basic residues. In some embodiments, a nucleolar localization signal may be rich in arginine and lysine residues. In some embodiments, a nucleolar localization signal may be derived from a protein that is enriched in the nucleolus. In some embodiments, a nucleolar localization signal may be derived from a protein that is enriched at a ribosomal RNA locus. In some embodiments, a nucleolar localization signal may be derived from a protein that binds rRNA. In some embodiments, a nucleolar localization signal may be derived from MSP58. In some embodiments, a nucleolar localization signal may be a single component motif. In some embodiments, a nucleolar localization signal may be a two-component motif. In some embodiments, a nucleolar localization signal may be composed of a plurality of single component or two-component motifs. In some embodiments, a nucleolar localization signal may be composed of a mixture of single component and two-component motifs. In some embodiments, a nucleolar localization signal may be a dual two-component motif. In some embodiments, the nucleolar localization motif may be KRASSQALGTIPKRRSSSRFIKRKK (SEQ ID NO: 5017). In some embodiments, the nucleolar localization signal may be derived from nuclear factor-κB-inducing kinase. In some embodiments, the nucleolar localization signal may be a RKKRKKK motif (SEQ ID NO: 5018) (described in Birbach et al., Journal of Cell Science, 117 (3615-3624), 2004).

基因修饰多肽和系统的进化变体Genetically modified peptides and evolutionary variants of the system

在一些实施例中，本发明提供了如本文所述的基因修饰多肽的进化变体。在一些实施例中，进化变体可以通过对参考基因修饰多肽或其中包含的片段或结构域之一进行诱变处理而产生。在一些实施例中，一个或多个结构域(例如，逆转录酶结构域)进化。在一些实施例中，可以使一个或多个此类进化变体结构域单独进化或与其他结构域一起进化。在一些实施例中，可以将一个或多个进化变体结构域与一个或多个未进化的同源组分或一个或多个同源组分的进化的变体组合，例如，该一个或多个同源组分的进化的变体能以并行或连续方式进化。In some embodiments, the invention provides an evolutionary variant of a gene modified polypeptide as described herein. In some embodiments, an evolutionary variant can be produced by mutagenizing a reference gene modified polypeptide or one of the fragments or domains contained therein. In some embodiments, one or more domains (e.g., reverse transcriptase domains) evolve. In some embodiments, one or more such evolutionary variant domains can be made to evolve alone or with other domains. In some embodiments, one or more evolutionary variant domains can be combined with one or more unevolved homologous components or the evolutionary variant of one or more homologous components, for example, the evolutionary variant of the one or more homologous components can evolve in parallel or in a continuous manner.

在一些实施例中，对参考基因修饰多肽或其片段或结构域进行诱变处理的过程包括对参考基因修饰多肽或其片段或结构域进行诱变处理。在实施例中，诱变包括连续进化方法(例如，PACE)或非连续进化方法(例如，PANCE)，例如，如本文所述。在一些实施例中，进化的基因修饰多肽或其片段或结构域包含相对于参考基因修饰多肽或其片段或结构域的氨基酸序列引入其氨基酸序列中的一个或多个氨基酸变异。在实施例中，氨基酸序列变异可以包括参考基因修饰多肽的氨基酸序列内的一个或多个突变的残基(例如，保守取代、非保守取代、或其组合)，例如，该一个或多个突变的残基是由于编码基因修饰多肽的核苷酸序列的变化(例如，该编码序列中任何特定位置处密码子的变化)，该变化引起一个或多个氨基酸(例如，截短的蛋白质)的缺失、一个或多个氨基酸的插入或前述内容的任何组合。进化变体基因修饰多肽可以包括基因修饰多肽的一个或多个组分或结构域中的变体(例如，引入逆转录酶结构域的变体)。In some embodiments, the process of mutagenizing a reference gene modified polypeptide or its fragment or domain includes mutagenizing a reference gene modified polypeptide or its fragment or domain. In an embodiment, mutagenesis includes a continuous evolution method (e.g., PACE) or a discontinuous evolution method (e.g., PANCE), for example, as described herein. In some embodiments, the evolved gene modified polypeptide or its fragment or domain comprises one or more amino acid variations introduced into its amino acid sequence relative to the amino acid sequence of the reference gene modified polypeptide or its fragment or domain. In an embodiment, amino acid sequence variation may include one or more mutated residues (e.g., conservative substitutions, non-conservative substitutions, or a combination thereof) within the amino acid sequence of the reference gene modified polypeptide, for example, the one or more mutated residues are due to changes in the nucleotide sequence encoding the gene modified polypeptide (e.g., changes in codons at any particular position in the coding sequence), which causes one or more amino acids (e.g., truncated proteins) to be deleted, one or more amino acids to be inserted, or any combination of the foregoing. Evolution variant gene modified polypeptides may include variants in one or more components or domains of a gene modified polypeptide (e.g., variants introduced into a reverse transcriptase domain).

在一些方面，本披露提供了使用或包含基因修饰多肽的进化变体的基因修饰多肽、系统、试剂盒和方法，例如，采用了基因修饰多肽的进化变体或由PACE或PANCE生产或可由其生产的基因修饰多肽。在实施例中，未进化的参考基因修饰多肽是如本文披露的基因修饰多肽。In some aspects, the disclosure provides genetically modified polypeptides, systems, kits and methods using or comprising evolutionary variants of genetically modified polypeptides, e.g., genetically modified polypeptides produced or producible by PACE or PANCE are employed. In embodiments, the unevolved reference genetically modified polypeptide is a genetically modified polypeptide as disclosed herein.

如本文所用，术语“噬菌体辅助的连续进化(PACE)”通常是指采用噬菌体作为病毒载体的连续进化。PACE技术的实例已描述于例如以下中：2009年9月8日提交的国际PCT申请号PCT/US 2009/056194，其于2010年3月11日公开为WO 2010/028347；2011年12月22日提交的国际PCT申请PCT/US 2011/066747，其于2012年6月28日公开为WO 2012/088381；2015年5月5日发布的美国专利号9,023,594；2017年9月26日发布的美国专利号9,771,574；2016年7月19日发布的美国专利号9,394,537；2015年1月20日提交的国际PCT申请PCT/US 2015/012022，其于2015年9月11日公开为WO 2015/134121；2019年1月15日发布的美国专利号10,179,911；以及2016年4月15日提交的国际PCT申请PCT/US 2016/027795，其于2016年10月20日公开为WO 2016/168631，其中每个的全部内容通过援引并入本文。As used herein, the term "phage-assisted continuous evolution (PACE)" generally refers to continuous evolution using bacteriophage as viral vectors. Examples of PACE technology have been described, for example, in International PCT Application No. PCT/US 2009/056194, filed September 8, 2009, published as WO 2010/028347 on March 11, 2010; International PCT Application No. PCT/US 2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; U.S. Patent No. 9,023,594, issued May 5, 2015; U.S. Patent No. 9,771,574, issued September 26, 2017; U.S. Patent No. 9,394,537, issued July 19, 2016; International PCT Application No. PCT/US 2015/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012 2015/012022, which was published on September 11, 2015 as WO 2015/134121; U.S. Patent No. 10,179,911, issued on January 15, 2019; and International PCT Application PCT/US 2016/027795, filed on April 15, 2016, which was published on October 20, 2016 as WO 2016/168631, the entire contents of each of which are incorporated herein by reference.

如本文所用，术语“噬菌体辅助的非连续进化(PANCE)”通常是指采用噬菌体作为病毒载体的非连续进化。PANCE技术的实例已描述于例如Suzuki T.等人，Crystalstructures reveal an elusive functional domain of pyrrolysyl-tRNA synthetase[晶体结构揭示了吡咯赖氨酰tRNA合成酶的难以捉摸的功能结构域]，Nat Chem Biol.[自然化学生物学]13(12)：1261-1266(2017)中，该文献通过援引以其全文并入本文。简言之，PANCE是一种使用进化中的选择噬菌体(SP)的连续烧瓶转移进行快速体内定向进化的技术，其中含有要在新鲜宿主细胞(例如，大肠杆菌细胞)中进化的目的基因。宿主细胞内的基因可能保持不变，而SP中含有的基因则连续进化。在噬菌体生长之后，可以使用等分的受感染细胞转染随后的含有宿主大肠杆菌的烧瓶。这一过程可以重复和/或继续，直到期望的表型实现进化，例如，持续所需的转移次数。As used herein, the term "phage-assisted discontinuous evolution (PANCE)" generally refers to discontinuous evolution using phage as a viral vector. Examples of PANCE technology have been described, for example, in Suzuki T. et al., Crystal structures reveal an elusive functional domain of pyrrolysyl-tRNA synthetase, Nat Chem Biol. 13 (12): 1261-1266 (2017), which is incorporated herein by reference in its entirety. In brief, PANCE is a technique for rapid in vivo directed evolution using continuous flask transfers of evolving selected phage (SP), which contain target genes to be evolved in fresh host cells (e.g., E. coli cells). Genes within the host cell may remain unchanged, while genes contained in the SP evolve continuously. After phage growth, aliquots of infected cells can be used to transfect subsequent flasks containing host E. coli. This process can be repeated and/or continued until the desired phenotype has evolved, e.g., for a desired number of transfers.

技术人员通过参考(尤其是)前述文献可以容易地理解将PACE和PANCE应用于基因修饰多肽的方法。用于例如使用噬菌体颗粒例如在宿主细胞群体中引导基因组修饰蛋白或系统的连续进化的另外示例性方法可用于产生基因修饰多肽或其片段或亚结构域的进化变体。此类方法的非限制性实例描述于以下中：2009年9月8日提交的国际PCT申请PCT/US2009/056194，其于2010年3月11日公开为WO 2010/028347；2011年12月22日提交的国际PCT申请PCT/US 2011/066747，其于2012年6月28日公开为WO 2012/088381；2015年5月5日发布的美国专利号9,023,594；2017年9月26日发布的美国专利号9,771,574；2016年7月19日发布的美国专利号9,394,537；2015年1月20日提交的国际PCT申请PCT/US 2015/012022，其于2015年9月11日公开为WO 2015/134121；2019年1月15日发布的美国专利号10,179,911；2019年6月14日提交的国际申请号PCT/US 2019/37216；2019年1月31日公开的国际专利公开WO 2019/023680；2016年4月15日提交的国际PCT申请PCT/US 2016/027795，其于2016年10月20日公开为WO 2016/168631；以及2019年8月23日提交的国际专利公开号PCT/US2019/47996；其中每个通过援引以其全文并入本文。The skilled person can easily understand the methods of applying PACE and PANCE to gene-modified polypeptides by referring to, inter alia, the aforementioned documents. Additional exemplary methods for directing the continuous evolution of genome-modified proteins or systems, such as using phage particles, for example, in a host cell population, can be used to generate evolutionary variants of gene-modified polypeptides or fragments or subdomains thereof. Non-limiting examples of such methods are described in the following: International PCT Application No. PCT/US2009/056194, filed September 8, 2009, published as WO 2010/028347 on March 11, 2010; International PCT Application No. PCT/US 2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; U.S. Patent No. 9,023,594, issued May 5, 2015; U.S. Patent No. 9,771,574, issued September 26, 2017; U.S. Patent No. 9,394,537, issued July 19, 2016; International PCT Application No. PCT/US 2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; U.S. Patent No. 9,023,594, issued May 5, 2015; U.S. Patent No. 9,771,574, issued September 26, 2017; U.S. Patent No. 9,394,537, issued July 19, 2016; International PCT Application No. PCT/US 2015/012022, which was published as WO 2015/134121 on September 11, 2015; U.S. Patent No. 10,179,911, issued on January 15, 2019; International Application No. PCT/US2019/37216, filed on June 14, 2019; International Patent Publication WO 2019/023680, published on January 31, 2019; International PCT Application PCT/US2016/027795, filed on April 15, 2016, which was published as WO 2016/168631 on October 20, 2016; and International Patent Publication No. PCT/US2019/47996, filed on August 23, 2019; each of which is incorporated herein by reference in its entirety.

在一些非限制性说明性实施例中，进化变体基因修饰多肽、或其片段或结构域的进化的方法包括：(a)使宿主细胞群体与包含目的基因(起始基因修饰多肽或其片段或结构域)的病毒载体群体接触，其中：(1)宿主细胞易于被病毒载体感染；(2)宿主细胞对产生病毒颗粒所需的病毒基因进行表达；(3)产生感染性病毒颗粒所需的至少一种病毒基因的表达取决于目的基因的功能；和/或(4)病毒载体允许蛋白质在宿主细胞中表达，并且可以被宿主细胞复制和包装成病毒颗粒。在一些实施例中，该方法包括(b)使宿主细胞与诱变剂接触，其使用具有提高突变率的突变的宿主细胞(例如，通过携带突变质粒或一些基因组修饰——例如，校对受损的DNA聚合酶、SOS基因，例如UmuC、UmuD′、和/或RecA，如果与质粒结合，这些突变可能在诱导型启动子的控制下)或其组合。在一些实施例中，该方法包括(c)在允许病毒复制和产生病毒颗粒的条件下孵育宿主细胞群体，其中从宿主细胞群体中去除宿主细胞，并将新鲜的、未感染的宿主细胞引入到宿主细胞群体中，从而补充宿主细胞群体并产生宿主细胞流。在一些实施例中，将细胞在允许目的基因获得突变的条件下孵育。在一些实施例中，该方法进一步包括(d)从宿主细胞群体中分离病毒载体的突变版本，该突变版本编码进化的基因产物(例如，进化变体基因修饰多肽、或其片段或结构域)。In some non-limiting illustrative embodiments, a method for evolving an evolutionary variant genetically modified polypeptide, or a fragment or domain thereof, comprises: (a) contacting a population of host cells with a population of viral vectors comprising a gene of interest (the starting genetically modified polypeptide or a fragment or domain thereof), wherein: (1) the host cells are susceptible to infection by the viral vector; (2) the host cells express viral genes required for the production of viral particles; (3) the expression of at least one viral gene required for the production of infectious viral particles depends on the function of the gene of interest; and/or (4) the viral vector allows the protein to be expressed in the host cells and can be replicated and packaged into viral particles by the host cells. In some embodiments, the method comprises (b) contacting the host cells with a mutagen, using a host cell with a mutation that increases the mutation rate (e.g., by carrying a mutant plasmid or some genomic modification - for example, a proofreading-impaired DNA polymerase, an SOS gene, such as UmuC, UmuD', and/or RecA, which, if combined with a plasmid, may be under the control of an inducible promoter) or a combination thereof. In some embodiments, the method includes (c) incubating a host cell colony under conditions that allow viral replication and production of viral particles, wherein host cells are removed from the host cell colony, and fresh, uninfected host cells are introduced into the host cell colony, thereby supplementing the host cell colony and producing a host cell flow. In some embodiments, the cells are incubated under conditions that allow the target gene to acquire mutations. In some embodiments, the method further includes (d) isolating a mutant version of the viral vector from the host cell colony, the mutant version encoding an evolved gene product (e.g., an evolutionary variant gene modified polypeptide or a fragment or domain thereof).

技术人员将理解在上述框架内可采用的各种特征。例如，在一些实施例中，病毒载体或噬菌体是丝状噬菌体，例如M13噬菌体，例如M13选择噬菌体。在某些实施例中，产生感染性病毒颗粒所需的基因是M13基因III(gIII)。在实施例中，噬菌体可能缺乏功能性gIII，但不同的是包含gI、gII、gIV、gV、gVI、gVII、gVIII、gIX、和gX。在一些实施例中，感染性VSV颗粒的产生涉及包膜蛋白VSV-G。各种实施例可以使用不同的逆转录病毒载体，例如鼠白血病病毒载体或慢病毒载体。在实施例中，利用VSV-G包膜蛋白(例如，作为病毒的天然包膜蛋白的替代物)可以有效包装逆转录病毒载体。Technicians will appreciate the various features that can be adopted within the above framework.For example, in some embodiments, viral vector or phage are filamentous phages, such as M13 phages, such as M13 selection phages.In certain embodiments, the gene required for producing infectious viral particles is M13 gene III (gIII).In an embodiment, phage may lack functional gIII, but different is to comprise gI, gII, gIV, gV, gVI, gVII, gVIII, gIX and gX.In certain embodiments, the generation of infectious VSV particles relates to envelope protein VSV-G.Various embodiments can use different retroviral vectors, such as murine leukemia virus vectors or slow virus vectors.In an embodiment, utilizing VSV-G envelope protein (for example, as a substitute for the natural envelope protein of virus) can effectively package retroviral vectors.

在一些实施例中，根据合适数量的病毒生命周期孵育宿主细胞，例如至少10、至少20、至少30、至少40、至少50、至少100、至少200、至少300、至少400、至少500、至少600、至少700、至少800、至少900、至少1000、至少1250、至少1500、至少1750、至少2000、至少2500、至少3000、至少4000、至少5000、至少7500、至少10000或更多个连续的病毒生命周期，在M13噬菌体的说明性和非限制性实例中，每个病毒生命周期为10-20分钟。类似地，可以调节条件以调整宿主细胞在宿主细胞群体中保留的时间，例如约10、约11、约12、约13、约14、约15、约16、约17、约18、约19、约20、约21、约22、约23、约24、约25、约30、约35、约40、约45、约50、约55、约60、约70、约80、约90、约100、约120、约150、或约180分钟。可以部分地通过宿主细胞的密度来控制宿主细胞群体，或者在一些实施例中，流入物中的宿主细胞密度为例如10³个细胞/ml、约10⁴个细胞/ml、约10⁵个细胞/ml、约5-10⁵个细胞/ml、约10⁶个细胞/ml、约5-10⁶个细胞/ml、约10⁷个细胞/ml、约5-10⁷个细胞/ml、约10⁸个细胞/ml、约5-10⁸个细胞/ml、约10⁹个细胞/ml、约5·10⁹个细胞/ml、约10¹⁰个细胞/ml、或约5·10¹⁰个细胞/ml。In some embodiments, the host cells are incubated for a suitable number of viral life cycles, e.g., at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1250, at least 1500, at least 1750, at least 2000, at least 2500, at least 3000, at least 4000, at least 5000, at least 7500, at least 10000 or more consecutive viral life cycles, each of which is 10-20 minutes in the illustrative and non-limiting example of M13 bacteriophage. Similarly, conditions can be adjusted to adjust the time that a host cell remains in a host cell population, e.g., about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 70, about 80, about 90, about 100, about 120, about 150, or about 180 minutes. The host cell population can be controlled in part by the density of the host cells, or in some embodiments, the host cell density in the influent is, for example, 10 ³ cells/ml, about 10 ⁴ cells/ml, about 10 ⁵ cells/ml, about 5-10 ⁵ cells/ml, about 10 ⁶ cells/ml, about 5-10 ⁶ cells/ml, about 10 ⁷ cells/ml, about 5-10 ⁷ cells/ml, about 10 ⁸ cells/ml, about 5-10 ⁸ cells/ml, about 10 ⁹ cells/ml, about 5.10 ⁹ cells/ml, about 10 ¹⁰ cells/ml, or about 5.10 ¹⁰ cells/ml.

内含肽Intein

在一些实施例中，如下文更详细描述的，内含肽-N(intN)结构域可融合至本文所述的基因修饰多肽的第一结构域的N末端部分，并且内含肽-C(intC)结构域可融合至本文所述的基因修饰多肽的第二结构域的C末端部分用于将N末端部分连接到C末端部分，从而连接第一和第二结构域。在一些实施例中，第一和第二结构域各自独立地选自DNA结合结构域、RNA结合结构域、RT结构域和核酸内切酶结构域。In some embodiments, as described in more detail below, the intein-N (intN) domain can be fused to the N-terminal portion of the first domain of a genetically modified polypeptide described herein, and the intein-C (intC) domain can be fused to the C-terminal portion of the second domain of a genetically modified polypeptide described herein for connecting the N-terminal portion to the C-terminal portion, thereby connecting the first and second domains. In some embodiments, the first and second domains are each independently selected from a DNA binding domain, an RNA binding domain, an RT domain, and an endonuclease domain.

内含肽可作为自剪接蛋白内含子(例如肽)发生，例如，其连接侧翼N末端和C末端外显肽(例如，待连接的片段)。在一些情况下，内含肽可以包含蛋白质的片段，该片段能够在称为蛋白质剪接的过程中自我切除并将剩余的片段(外显肽)与肽键连接。内含肽也称为“蛋白内含子”。本文将内含肽自我切除并将蛋白质的剩余部分连接的过程称为“蛋白质剪接”或“内含肽介导的蛋白质剪接”。Inteins can occur as self-splicing protein introns (e.g., peptides), for example, which connect flanking N-terminal and C-terminal exteins (e.g., fragments to be connected). In some cases, inteins can comprise fragments of proteins that are able to self-excise and connect the remaining fragments (exteins) with peptide bonds in a process called protein splicing. Inteins are also called "protein introns." The process of intein self-excision and connecting the remaining parts of the protein is referred to herein as "protein splicing" or "intein-mediated protein splicing."

在一些实施例中，前体蛋白(在内含肽介导的蛋白质剪接之前的含内含肽的蛋白质)的内含肽来自两个基因。这种内含肽在本文称为断裂内含肽(例如，断裂内含肽-N和断裂内含肽-C)。因此，可以使用基于内含肽的方法将第一多肽序列和第二多肽序列连接在一起。例如，在蓝细菌中，DNA聚合酶III的催化亚基a(即DnaE)由两个分开的基因dnaE-n和dnaE-c编码。当被定位为第一多肽序列的一部分时，内含肽-N结构域(例如，由dnaE-n基因编码)可以将第一多肽序列与第二多肽序列连接，其中该第二多肽序列包含内含肽-C结构域(例如由dnaE-c基因编码)。因此，在一些实施例中，可以通过提供编码第一和第二多肽序列的核酸(例如，其中第一核酸分子编码第一多肽序列并且第二核酸分子编码第二多肽序列)来制备蛋白质，并且在允许产生第一和第二多肽序列并且允许通过基于内含肽的机制将第一多肽序列连接至第二多肽序列的条件下将核酸引入细胞。In certain embodiments, the intein of the precursor protein (the protein containing intein before the intein-mediated protein splicing) comes from two genes. Such intein is referred to herein as a fracture intein (e.g., fracture intein-N and fracture intein-C). Therefore, the first polypeptide sequence and the second polypeptide sequence can be linked together using an intein-based method. For example, in cyanobacteria, the catalytic subunit a (i.e., DnaE) of DNA polymerase III is encoded by two separate genes dnaE-n and dnaE-c. When positioned as a part of the first polypeptide sequence, the intein-N domain (e.g., encoded by the dnaE-n gene) can connect the first polypeptide sequence to the second polypeptide sequence, wherein the second polypeptide sequence comprises an intein-C domain (e.g., encoded by the dnaE-c gene). Thus, in some embodiments, a protein can be prepared by providing nucleic acids encoding first and second polypeptide sequences (e.g., wherein the first nucleic acid molecule encodes the first polypeptide sequence and the second nucleic acid molecule encodes the second polypeptide sequence), and introducing the nucleic acids into a cell under conditions that allow for the production of the first and second polypeptide sequences and for the first polypeptide sequence to be linked to the second polypeptide sequence by an intein-based mechanism.

以下中描述了使用内含肽连接异源蛋白质片段：例如，Wood等人，J.Biol.Chem.[生物化学杂志]289(21)；14512-9(2014)(其通过援引以其全文并入本文)。例如，当与分开的蛋白质片段融合时，内含肽IntN和IntC可以彼此识别，自我剪除，和/或同时连接它们所融合的蛋白质片段的侧翼N末端和C末端外显肽，从而从两个蛋白质片段重构全长蛋白质。The use of inteins to link heterologous protein fragments is described in, for example, Wood et al., J. Biol. Chem. 289(21); 14512-9 (2014) (which is incorporated herein by reference in its entirety). For example, when fused to separate protein fragments, inteins IntN and IntC can recognize each other, self-cleave, and/or simultaneously link to the flanking N-terminal and C-terminal exteins of the protein fragments to which they are fused, thereby reconstructing a full-length protein from two protein fragments.

在一些实施例中，使用基于dnaE内含肽的合成内含肽，即Cfa-N(例如，断裂内含肽-N)和Cfa-C(例如，断裂内含肽-C)内含肽对。此类内含肽的实例已在以下中进行了描述：例如Stevens等人，J Am Chem Soc.[美国化学学会杂志]2016年2月24日；138(7)：2162-5(其通过援引以其全文并入本文)。根据本披露可以使用的内含肽对的非限制性实例包括：Cfa DnaE内含肽、Ssp GyrB内含肽、Ssp DnaX内含肽、Ter DnaE3内含肽、Ter ThyX内含肽、Rma DnaB内含肽和Cne Prp8内含肽(例如，如美国专利号8,394,604中所述，该专利通过援引并入本文)。In some embodiments, synthetic inteins based on the dnaE intein are used, namely, the Cfa-N (e.g., split intein-N) and Cfa-C (e.g., split intein-C) intein pairs. Examples of such inteins are described, for example, in Stevens et al., J Am Chem Soc. 2016 Feb 24;138(7):2162-5 (which is incorporated herein by reference in its entirety). Non-limiting examples of intein pairs that can be used according to the present disclosure include: Cfa DnaE intein, Ssp GyrB intein, Ssp DnaX intein, Ter DnaE3 intein, Ter ThyX intein, Rma DnaB intein, and Cne Prp8 intein (e.g., as described in U.S. Pat. No. 8,394,604, which is incorporated herein by reference).

在一些涉及断裂Cas9的实施例中，可以将内含肽-N结构域和内含肽-C结构域分别与断裂Cas9的N末端部分和断裂Cas9的C末端部分融合，以便将断裂Cas9的N末端部分和断裂Cas9的C末端部分连接。例如，在一些实施例中，内含肽-N融合至断裂Cas9的N末端部分的C末端，即形成N-[断裂Cas9的N末端部分]-[内含肽-N]～C的结构。在一些实施例中，内含肽-C融合到断裂Cas9的C末端部分的N末端，即，形成N-[内含肽-C]～[断裂Cas9的C末端部分]-C的结构。用于连接与内含肽融合的蛋白质(例如，断裂Cas9)的内含肽介导的蛋白质剪接机制在以下中进行描述：Shah等人，Chem Sci.[化学科学]2014；5(1)：446-461，其通过援引并入本文。用于设计和使用内含肽的方法在本领域已知，并且例如由WO 2020051561、W02014004336、WO 2017132580、US 20150344549、和US 20180127780进行了描述，其中每个通过援引以其全文并入本文。In some embodiments involving the split Cas9, the intein-N domain and the intein-C domain can be fused to the N-terminal portion of the split Cas9 and the C-terminal portion of the split Cas9, respectively, so as to connect the N-terminal portion of the split Cas9 and the C-terminal portion of the split Cas9. For example, in some embodiments, the intein-N is fused to the C-terminal end of the N-terminal portion of the split Cas9, i.e., forming a structure of N-[N-terminal portion of the split Cas9]-[intein-N]~C. In some embodiments, the intein-C is fused to the N-terminal end of the C-terminal portion of the split Cas9, i.e., forming a structure of N-[intein-C]~[C-terminal portion of the split Cas9]-C. The intein-mediated protein splicing mechanism for connecting a protein fused with an intein (e.g., a split Cas9) is described in the following: Shah et al., Chem Sci. [Chemical Science] 2014; 5(1): 446-461, which is incorporated herein by reference. Methods for designing and using inteins are known in the art, and are described, for example, by WO 2020051561, WO2014004336, WO 2017132580, US 20150344549, and US 20180127780, each of which is incorporated herein by reference in its entirety.

在一些实施例中，断裂是指分成两个或更多个片段。在一些实施例中，断裂Cas9蛋白或断裂Cas9包含Cas9蛋白，该蛋白作为由两个分开的核苷酸序列编码的N末端片段和C末端片段来提供。可以对与Cas9蛋白的N末端部分和C末端部分对应的多肽进行剪接以形成重构的Cas9蛋白。在实施例中，Cas9蛋白质在蛋白质的无序区内被分成两个片段，例如，如以下中描述：Nishimasu等人，Cell[细胞]，第156卷，第5期，第935-949页，2014，或Jiang等人(2016)Science[科学]351：867-871和PDB文件：5F9R(其中每个通过援引以其全文并入本文)。无序区可通过本领域已知的一种或多种蛋白质结构确定技术确定，包括但不限于X射线晶体学、NMR光谱学、电子显微术(例如，cryoEM)和/或计算机蛋白质建模。在一些实施例中，将蛋白质在例如氨基酸A292-G364、F445-K483、或E565-T637之间的SpCas9的区域内的任何C、T、A、或S处，或在任何其他Cas9、Cas9变体(例如，nCas9、dCas9)或其他napDNAbp中的对应位置处分成两个片段。在一些实施例中，将蛋白质在SpCas9T310、T313、A456、S469、或C574处分成两个片段。在一些实施例中，将蛋白质分成两个片段的过程称为对蛋白质的断裂。In some embodiments, the break refers to being divided into two or more fragments. In some embodiments, the break Cas9 protein or the break Cas9 comprises a Cas9 protein, which is provided as an N-terminal fragment and a C-terminal fragment encoded by two separate nucleotide sequences. The polypeptide corresponding to the N-terminal portion and the C-terminal portion of the Cas9 protein can be spliced to form a reconstructed Cas9 protein. In an embodiment, the Cas9 protein is divided into two fragments within the disordered region of the protein, for example, as described in: Nishimasu et al., Cell [Cell], Vol. 156, No. 5, pp. 935-949, 2014, or Jiang et al. (2016) Science [Science] 351: 867-871 and PDB file: 5F9R (each of which is incorporated herein by reference in its entirety). The disordered region can be determined by one or more protein structure determination techniques known in the art, including but not limited to X-ray crystallography, NMR spectroscopy, electron microscopy (e.g., cryoEM) and/or computer protein modeling. In some embodiments, the protein is split into two fragments at any C, T, A, or S in the region of SpCas9, for example, between amino acids A292-G364, F445-K483, or E565-T637, or at corresponding positions in any other Cas9, Cas9 variants (e.g., nCas9, dCas9), or other napDNAbp. In some embodiments, the protein is split into two fragments at SpCas9 T310, T313, A456, S469, or C574. In some embodiments, the process of splitting a protein into two fragments is referred to as cleavage of the protein.

在一些实施例中，蛋白质片段的长度范围为约2-1000个氨基酸(例如，2-10、10-50、50-100、100-200、200-300、300-400、400-500、500-600、600-700、700-800、800-900、或900-1000个之间的氨基酸)。在一些实施例中，蛋白质片段的长度范围为约5-500个氨基酸(例如，5-10、10-50、50-100、100-200、200-300、300-400、或400-500个之间的氨基酸)。在一些实施例中，蛋白质片段的长度范围为约20-200个氨基酸(例如，20-30、30-40、40-50、50-100、或100-200个之间的氨基酸)。In some embodiments, the length of the protein fragment ranges from about 2-1000 amino acids (e.g., 2-10, 10-50, 50-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, or 900-1000 amino acids). In some embodiments, the length of the protein fragment ranges from about 5-500 amino acids (e.g., 5-10, 10-50, 50-100, 100-200, 200-300, 300-400, or 400-500 amino acids). In some embodiments, the length of the protein fragment ranges from about 20-200 amino acids (e.g., 20-30, 30-40, 40-50, 50-100, or 100-200 amino acids).

在一些实施例中，将基因修饰多肽的部分或片段与内含肽融合。可以将核酸酶与内含肽的N末端或C末端融合。在一些实施例中，将融合蛋白的部分或片段与内含肽融合并与AAV衣壳蛋白融合。可以将内含肽、核酸酶和衣壳蛋白以任何排列方式(例如，核酸酶-内含肽-衣壳、内含肽-核酸酶-衣壳、衣壳-内含肽-核酸酶等)融合在一起。在一些实施例中，将内含肽的N末端与融合蛋白的C末端融合，并将内含肽的C末端与AAV衣壳蛋白的N末端融合。In some embodiments, a portion or fragment of a genetically modified polypeptide is fused to an intein. A nuclease may be fused to the N-terminus or C-terminus of an intein. In some embodiments, a portion or fragment of a fusion protein is fused to an intein and fused to an AAV capsid protein. Inteins, nucleases, and capsid proteins may be fused together in any arrangement (e.g., nuclease-intein-capsid, intein-nuclease-capsid, capsid-intein-nuclease, etc.). In some embodiments, the N-terminus of an intein is fused to the C-terminus of a fusion protein, and the C-terminus of an intein is fused to the N-terminus of an AAV capsid protein.

在一些实施例中，核酸内切酶结构域(例如，切口酶Cas9结构域)与内含肽-N融合，并且包含RT结构域的多肽与内含肽-C融合。In some embodiments, an endonuclease domain (e.g., a nickase Cas9 domain) is fused to intein-N, and a polypeptide comprising a RT domain is fused to intein-C.

下面提供了内含肽-N结构域和相容的内含肽-C结构域的示例性核苷酸和氨基酸序列：Exemplary nucleotide and amino acid sequences of intein-N domains and compatible intein-C domains are provided below:

DnaE内含肽-N DNA：DnaE Intein-N DNA:

DnaE内含肽-N蛋白质：DnaE Intein-N Protein:

DnaE内含肽-C DNA：DnaE Intein-C DNA:

DnaE内含肽-C蛋白质：DnaE Intein-C Protein:

Cfa-N DNA：Cfa-N DNA:

Cfa-N蛋白质：Cfa-N Protein:

Cfa-C DNA：Cfa-C DNA:

Cfa-C蛋白质：Cfa-C Protein:

另外的结构域Additional domains

基因修饰多肽可以结合靶DNA序列和模板核酸(例如模板RNA)、对靶位点进行切口并将模板书写(例如逆转录)入DNA中，从而产生靶位点的修饰。在一些实施例中，可以向多肽添加另外的结构域以提高过程的效率。在一些实施例中，基因修饰多肽可包含另外的DNA连接结构域以将逆转录的DNA连接至靶位点的DNA。在一些实施例中，多肽可以包含异源RNA结合结构域。在一些实施例中，多肽可包含具有5′至3′外切核酸酶活性的结构域(例如，其中5′至3′外切核酸酶活性增加靶位点的改变的修复，例如有利于改变原始基因组序列)。在一些实施例中，多肽可包含具有3′至5′外切核酸酶活性，例如校对活性的结构域。在一些实施例中，书写结构域，例如RT结构域，具有3′至5′外切核酸酶活性，例如校对活性。Genetic modification polypeptides can bind to target DNA sequences and template nucleic acids (e.g., template RNA), cut target sites and write (e.g., reverse transcribe) templates into DNA, thereby producing modifications of target sites. In some embodiments, additional domains can be added to the polypeptide to improve the efficiency of the process. In some embodiments, the gene modification polypeptide may include additional DNA connection domains to connect reverse transcribed DNA to the DNA of the target site. In some embodiments, the polypeptide may include a heterologous RNA binding domain. In some embodiments, the polypeptide may include a domain with 5' to 3' exonuclease activity (e.g., wherein the 5' to 3' exonuclease activity increases the repair of changes in the target site, such as facilitating changes in the original genomic sequence). In some embodiments, the polypeptide may include a domain with 3' to 5' exonuclease activity, such as a proofreading activity. In some embodiments, a writing domain, such as an RT domain, has a 3' to 5' exonuclease activity, such as a proofreading activity.

模板核酸Template nucleic acid

本文所述的基因修饰系统可以使用模板核酸序列修饰宿主靶DNA位点。在一些实施例中，本文所述的基因修饰系统通过靶引发的逆转录(TPRT)将RNA序列模板转录到宿主靶DNA位点中。通过将RNA序列模板直接逆转录到宿主基因组中来修饰一个或多个DNA序列，基因修饰系统可以将对象序列插入靶基因组中，而不需要将外源DNA序列引入宿主细胞中(不同于例如CRISPR系统)以及消除外源DNA插入步骤。基因修饰系统还可以从靶基因组中缺失序列或使用对象序列引入取代。因此，基因修饰系统提供了使用定制的RNA序列模板的平台，该模板包含对象序列，例如，包含异源基因编码和/或功能信息的序列。Gene modification systems as described herein can modify host target DNA sites using template nucleic acid sequences. In certain embodiments, gene modification systems as described herein transcribe RNA sequence templates into host target DNA sites by reverse transcription (TPRT) triggered by target. Modify one or more DNA sequences by directly reverse transcription of RNA sequence templates into host genome, gene modification systems can insert object sequences into target genomes, without the need to introduce exogenous DNA sequences into host cells (different from, for example, CRISPR systems) and eliminate exogenous DNA insertion steps. Gene modification systems can also delete sequences from target genomes or introduce substitutions using object sequences. Therefore, gene modification systems provide platforms using customized RNA sequence templates, which templates include object sequences, for example, sequences including heterologous gene encoding and/or functional information.

在一些实施例中，模板核酸包含一个或多个与基因修饰多肽结合的序列(例如，2个序列)。In some embodiments, the template nucleic acid comprises one or more sequences (eg, 2 sequences) that bind to a gene modifying polypeptide.

在一些实施例中，本文所述的系统或方法包含单个模板核酸(例如，模板RNA)。在一些实施例中，本文所述的系统或方法包含多个模板核酸(例如，模板RNA)。例如，本文所述的系统包含第一RNA和第二RNA(例如，模板RNA)，该第一RNA包含(例如，从5′至3′)结合基因修饰多肽的序列(例如，DNA结合结构域和/或核酸内切酶结构域，例如，gRNA)和结合靶位点(例如，靶基因组中位点的第二链)的序列，该第二RNA包含(例如，从5′至3′)任选地结合基因修饰多肽(例如，特异性结合RT结构域)的序列、异源对象序列和PBS序列。在一些实施例中，当系统包含多个核酸时，每个核酸包含缀合结构域。在一些实施例中，缀合结构域使得核酸分子能够相关联，例如，通过互补序列的杂交。例如，在一些实施例中，第一RNA包含第一缀合结构域并且第二RNA包含第二缀合结构域，并且第一和第二缀合结构域能够例如在严格条件下彼此杂交。在一些实施例中，杂交的严格条件包括在约65C在4x氯化钠/柠檬酸钠(SSC)中杂交，然后在约65C在1xSSC中洗涤。In some embodiments, the systems or methods described herein include a single template nucleic acid (e.g., template RNA). In some embodiments, the systems or methods described herein include multiple template nucleic acids (e.g., template RNA). For example, the system described herein includes a first RNA and a second RNA (e.g., template RNA), the first RNA includes (e.g., from 5' to 3') a sequence (e.g., DNA binding domain and/or endonuclease domain, e.g., gRNA) binding to a gene-modified polypeptide and a sequence binding to a target site (e.g., the second strand of a site in a target genome), the second RNA includes (e.g., from 5' to 3') optionally binding to a gene-modified polypeptide (e.g., specifically binding to an RT domain), a heterologous object sequence, and a PBS sequence. In some embodiments, when the system includes multiple nucleic acids, each nucleic acid includes a conjugated domain. In some embodiments, the conjugated domain enables nucleic acid molecules to be associated, e.g., by hybridization of complementary sequences. For example, in some embodiments, the first RNA includes a first conjugated domain and the second RNA includes a second conjugated domain, and the first and second conjugated domains can hybridize to each other, e.g., under stringent conditions. In some embodiments, stringent conditions for hybridization include hybridization in 4x sodium chloride/sodium citrate (SSC) at about 65°C, followed by a wash in 1x SSC at about 65°C.

在一些实施例中，模板核酸包含RNA。在一些实施例中，模板核酸包含DNA(例如，单链或双链DNA)。In some embodiments, the template nucleic acid comprises RNA. In some embodiments, the template nucleic acid comprises DNA (e.g., single-stranded or double-stranded DNA).

在一些实施例中，模板核酸包含一个或多个(例如2个)与靶序列具有同源性的同源结构域。在一些实施例中，同源结构域的长度约为10-20、20-50或50-100个核苷酸。In some embodiments, the template nucleic acid comprises one or more (e.g., 2) homology domains having homology to the target sequence. In some embodiments, the length of the homology domain is about 10-20, 20-50, or 50-100 nucleotides.

在一些实施例中，模板RNA可以包含gRNA序列，例如，以将基因修饰多肽引导至目的靶位点。在一些实施例中，模板RNA包含(例如，从5′至3′)：(i)任选地结合靶位点(例如，靶基因组中位点的第二链)的gRNA间隔子，(ii)任选地结合本文所述的多肽(例如，基因修饰多肽或Cas多肽)的gRNA支架，(iii)包含突变区的异源对象序列(任选地，异源对象序列从5’至3’包含第一同源区、突变区和第二同源区)，以及(iv)包含3′靶同源结构域的引物结合位点(PBS)序列。In some embodiments, the template RNA may include a gRNA sequence, for example, to guide the gene-modifying polypeptide to the target site of interest. In some embodiments, the template RNA includes (e.g., from 5' to 3'): (i) a gRNA spacer that optionally binds to a target site (e.g., the second strand of a site in a target genome), (ii) a gRNA scaffold that optionally binds to a polypeptide described herein (e.g., a gene-modifying polypeptide or a Cas polypeptide), (iii) a heterologous subject sequence that includes a mutation region (optionally, the heterologous subject sequence includes a first homology region, a mutation region, and a second homology region from 5' to 3'), and (iv) a primer binding site (PBS) sequence that includes a 3' target homology domain.

本文所述的基因组编辑系统的模板核酸(例如，模板RNA)组分通常能够结合系统的基因修饰多肽。在一些实施例中，模板核酸(例如，模板RNA)具有3′区，其能够结合基因修饰多肽。结合区(例如3′区)可以是结构化的RNA区，例如具有至少1、2或3个发夹环，其能够结合系统的基因修饰多肽。结合区可以将模板核酸(例如，模板RNA)与任何多肽模块相关联。在一些实施例中，模板核酸(例如，模板RNA)的结合区可以与多肽中的RNA结合结构域相关联。在一些实施例中，模板核酸(例如，模板RNA)的结合区可以与基因修饰多肽的逆转录结构域相关联(例如，特异性结合RT结构域)。在一些实施例中，模板核酸(例如，模板RNA)可以与多肽的DNA结合结构域相关联，例如，gRNA与Cas9来源的DNA结合结构域相关联。在一些实施例中，结合区还可以提供DNA靶识别，例如gRNA与靶DNA序列杂交并结合多肽，例如Cas9结构域。在一些实施例中，模板核酸(例如，模板RNA)可以与多肽的多个组分(例如，DNA结合结构域和逆转录结构域)相关联。The template nucleic acid (e.g., template RNA) component of the genome editing system described herein is generally capable of binding to the gene-modified polypeptide of the system. In some embodiments, the template nucleic acid (e.g., template RNA) has a 3′ region that is capable of binding to the gene-modified polypeptide. The binding region (e.g., 3′ region) can be a structured RNA region, for example, having at least 1, 2, or 3 hairpin loops that are capable of binding to the gene-modified polypeptide of the system. The binding region can associate the template nucleic acid (e.g., template RNA) with any polypeptide module. In some embodiments, the binding region of the template nucleic acid (e.g., template RNA) can be associated with the RNA binding domain in the polypeptide. In some embodiments, the binding region of the template nucleic acid (e.g., template RNA) can be associated with the reverse transcription domain of the gene-modified polypeptide (e.g., specifically binding to the RT domain). In some embodiments, the template nucleic acid (e.g., template RNA) can be associated with the DNA binding domain of the polypeptide, for example, gRNA is associated with the DNA binding domain of the Cas9 source. In some embodiments, the binding region can also provide DNA target recognition, such as gRNA hybridizing with the target DNA sequence and binding to the polypeptide, such as the Cas9 domain. In some embodiments, a template nucleic acid (eg, a template RNA) can be associated with multiple components of a polypeptide (eg, a DNA binding domain and a reverse transcription domain).

在一些实施例中，模板RNA在3′端具有聚A尾。在一些实施例中，模板RNA在3′端不具有聚A尾。In some embodiments, the template RNA has a poly A tail at the 3' end. In some embodiments, the template RNA does not have a poly A tail at the 3' end.

在一些实施例中，模板核酸是模板RNA。在一些实施例中，模板RNA包含一个或多个经修饰的核苷酸。例如，在一些实施例中，模板RNA包含一个或多个脱氧核糖核苷酸。在一些实施例中，模板RNA的区域被DNA核苷酸替代，例如，以增强分子的稳定性。例如，模板的3′端可包含DNA核苷酸，而模板的其余部分包含可以逆转录的RNA核苷酸。例如，在一些实施例中，异源对象序列主要或完全由RNA核苷酸(例如，至少90％、95％、98％或99％RNA核苷酸)构成。在一些实施例中，PBS序列主要或完全由DNA核苷酸(例如，至少90％、95％、98％或99％DNA核苷酸)构成。在其他实施例中，用于书写进入基因组的异源对象序列可以包含DNA核苷酸。在一些实施例中，模板中的DNA核苷酸通过能够具有DNA依赖性DNA聚合酶活性的结构域复制到基因组中。在一些实施例中，DNA依赖性DNA聚合酶活性由多肽中的DNA聚合酶结构域提供。在一些实施例中，DNA依赖性DNA聚合酶活性由逆转录酶结构域提供，该逆转录酶结构域也能够进行DNA依赖性DNA聚合，例如第二链合成。在一些实施例中，模板分子仅由DNA核苷酸构成。In some embodiments, the template nucleic acid is a template RNA. In some embodiments, the template RNA comprises one or more modified nucleotides. For example, in some embodiments, the template RNA comprises one or more deoxyribonucleotides. In some embodiments, a region of the template RNA is replaced by a DNA nucleotide, for example, to enhance the stability of the molecule. For example, the 3′ end of the template may comprise a DNA nucleotide, and the rest of the template comprises an RNA nucleotide that can be reverse transcribed. For example, in some embodiments, the heterologous object sequence is mainly or completely composed of RNA nucleotides (e.g., at least 90%, 95%, 98% or 99% RNA nucleotides). In some embodiments, the PBS sequence is mainly or completely composed of DNA nucleotides (e.g., at least 90%, 95%, 98% or 99% DNA nucleotides). In other embodiments, the heterologous object sequence for writing into the genome may comprise DNA nucleotides. In some embodiments, the DNA nucleotides in the template are copied into the genome by a domain capable of having DNA-dependent DNA polymerase activity. In some embodiments, the DNA-dependent DNA polymerase activity is provided by a DNA polymerase domain in a polypeptide. In some embodiments, the DNA-dependent DNA polymerase activity is provided by a reverse transcriptase domain, which is also capable of DNA-dependent DNA polymerization, such as second-strand synthesis. In some embodiments, the template molecule consists solely of DNA nucleotides.

在一些实施例中，本文所述系统包含两种核酸，其共同构成本文所述模板RNA的序列。在一些实施例中，两种核酸以非共价方式彼此关联，例如，直接彼此关联(例如，通过碱基配对)，或作为包含一个或多个另外分子的复合物的一部分间接关联。In some embodiments, the system described herein comprises two nucleic acids, which together constitute the sequence of the template RNA described herein. In some embodiments, the two nucleic acids are associated with each other in a non-covalent manner, for example, directly associated with each other (e.g., by base pairing), or indirectly associated as part of a complex comprising one or more additional molecules.

本文所述的模板RNA从5’至3’可包含：(1)gRNA间隔子；(2)gRNA支架；(3)异源对象序列；(4)引物结合位点(PBS)序列。现在将更详细地描述这些组分中的每一种。The template RNA described herein may comprise, from 5' to 3': (1) a gRNA spacer; (2) a gRNA scaffold; (3) a heterologous target sequence; (4) a primer binding site (PBS) sequence. Each of these components will now be described in more detail.

gRNA间隔子和gRNA支架gRNA spacers and gRNA scaffolds

本文所述的模板RNA可以包含将基因修饰系统引导至靶核酸的gRNA间隔子，和促进模板RNA与基因修饰多肽的Cas结构域相关联的gRNA支架。本文所述的系统还可以包含不属于模板核酸一部分的gRNA。例如，包含gRNA间隔子和gRNA支架但不包含异源对象序列或PBS序列的gRNA可用于例如诱导第二链切口，例如，如本文中标题为“第二链切口”的部分中所描述的。The template RNA described herein may include a gRNA spacer that guides the gene modification system to the target nucleic acid, and a gRNA scaffold that promotes the association of the template RNA with the Cas domain of the gene modification polypeptide. The system described herein may also include a gRNA that is not a part of the template nucleic acid. For example, a gRNA comprising a gRNA spacer and a gRNA scaffold but not comprising a heterologous object sequence or a PBS sequence may be used, for example, to induce a second strand nick, for example, as described in the section entitled "Second Strand Nicking" herein.

在一些实施例中，gRNA是由参与CRISPR相关蛋白结合的支架序列和针对基因组靶标的用户定义的约20个核苷酸的靶向序列构成的短合成RNA。Nishimasu等人Cell[细胞]156、第935-949页(2014)描述了完整gRNA的结构。gRNA(也称为单指导RNA的sgRNA)由crRNA和tracrRNA来源的序列组成，这些序列通过人工四环连接。crRNA序列可分为指导区(20nt)和重复序列区(12nt)，而tracrRNA序列可分为抗重复序列区(14nt)和三个tracrRNA茎环(Nishimasu等人Cell[细胞]156，第935-949页(2014))。在实践中，指导RNA序列通常被设计为具有17-24个核苷酸(例如19、20或21个核苷酸)的长度，并且与靶核酸序列互补。定制gRNA生成器和算法可通过商业途径获得，用于设计有效的指导RNA。在一些实施例中，gRNA包含来自天然CRISPR系统的两种RNA组分，例如crRNA和tracrRNA。如本领域公知的，gRNA还可以包含嵌合的单指导RNA(sgRNA)，其含有来自tracrRNA(以结合核酸酶)和至少一个crRNA(以将核酸酶引导至被靶向进行编辑/结合的序列)的序列。化学修饰的sgRNA也已被证明可有效地与CRISPR相关蛋白一起使用；参见，例如，Hendel等人(2015)NatureBiotechnol.[自然生物技术]，985-991。在一些实施例中，gRNA间隔子包含与靶基因相关联的DNA序列互补的核酸序列。In some embodiments, gRNA is a short synthetic RNA consisting of a scaffold sequence involved in CRISPR-associated protein binding and a user-defined targeting sequence of about 20 nucleotides for a genomic target. Nishimasu et al. Cell [Cell] 156, pp. 935-949 (2014) describes the structure of a complete gRNA. gRNA (also known as sgRNA of a single guide RNA) consists of sequences derived from crRNA and tracrRNA, which are connected by artificial tetraloops. The crRNA sequence can be divided into a guide region (20nt) and a repeat sequence region (12nt), while the tracrRNA sequence can be divided into an anti-repeat sequence region (14nt) and three tracrRNA stem loops (Nishimasu et al. Cell [Cell] 156, pp. 935-949 (2014)). In practice, the guide RNA sequence is typically designed to have a length of 17-24 nucleotides (e.g., 19, 20, or 21 nucleotides) and is complementary to the target nucleic acid sequence. Custom gRNA generators and algorithms are commercially available for designing effective guide RNAs. In some embodiments, gRNA comprises two RNA components from a natural CRISPR system, such as crRNA and tracrRNA. As is known in the art, gRNA may also include a chimeric single guide RNA (sgRNA) containing sequences from tracrRNA (to bind nuclease) and at least one crRNA (to guide the nuclease to the sequence targeted for editing/binding). Chemically modified sgRNAs have also been shown to be effective for use with CRISPR-associated proteins; see, for example, Hendel et al. (2015) Nature Biotechnol. [Nature Biotechnology], 985-991. In some embodiments, the gRNA spacer comprises a nucleic acid sequence complementary to a DNA sequence associated with a target gene.

在一些实施例中，包含gRNA的模板核酸(例如模板RNA)的区域采用结合到靶DNA的gRNA的下绕带状结构(例如，如以下所述：Mulepati等人Science[科学]2014年9月19日：第345卷，第6203期，第1479-1484页)。不希望受理论束缚，这种非典型结构被认为是通过每六个核苷酸轮换出RNA-DNA杂交体来促进的。因此，在一些实施例中，包含gRNA的模板核酸(例如模板RNA)的区域可以耐受以某个间隔(例如每六个碱基)与靶位点增加的错配。在一些实施例中，包含与靶位点同源的gRNA的模板核酸(例如模板RNA)区域可以具有以规则间隔(例如每六个碱基)的摆动位置，其不需要与靶位点进行碱基配对。In some embodiments, a region of a template nucleic acid (e.g., template RNA) comprising a gRNA adopts an underwinding ribbon structure of a gRNA bound to a target DNA (e.g., as described in: Mulepati et al. Science 2014 Sep 19: Vol. 345, No. 6203, pp. 1479-1484). Without wishing to be bound by theory, this atypical structure is believed to be promoted by rotating out of the RNA-DNA hybrid every six nucleotides. Thus, in some embodiments, a region of a template nucleic acid (e.g., template RNA) comprising a gRNA can tolerate increased mismatches with a target site at certain intervals (e.g., every six bases). In some embodiments, a region of a template nucleic acid (e.g., template RNA) comprising a gRNA homologous to a target site can have wobble positions at regular intervals (e.g., every six bases) that do not require base pairing with the target site.

在一些实施例中，模板核酸(例如，模板RNA)具有与靶位点具有至少80％、85％、90％、95％、99％或100％同源性的至少15、16、17、18、19、20、21、22、23或24个碱基，例如在5’端，例如包含长度适合于基因修饰多肽的Cas9结构域(表8)的gRNA间隔子序列。In some embodiments, the template nucleic acid (e.g., template RNA) has at least 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24 bases with at least 80%, 85%, 90%, 95%, 99%, or 100% homology to the target site, e.g., at the 5' end, e.g., comprising a gRNA spacer sequence of a length suitable for the Cas9 domain (Table 8) of the gene modifying polypeptide.

在一些实施例中，可以在基因修饰多肽中使用具有增强活性的Cas9衍生物。在一些实施例中，Cas9衍生物可以包含改善HNH核酸内切酶结构域活性的突变，例如SpyCas9R221K、N394K，或改善R环形成的突变，例如SpyCas9 L1245V，或包含此类突变的组合，例如SpyCas9 R221K/N394K、SpyCas9 N394K/L1245V、SpyCas9 R221K/L1245V或SpyCas9 R221K/N394K/L1245V(参见例如，Spencer和Zhang Sci Rep[科学报告]7：16836(2017)，Cas9衍生物及其包含突变的内容通过援引并入本文)。在一些实施例中，Cas9衍生物可以包括本文所述的一种或多种类型的突变，例如，PAM修饰突变、蛋白质稳定突变、活性增强突变和/或相对于亲本酶部分或完全灭活一个或两个核酸内切酶结构域的突变(例如，一个或多个突变以消除针对靶DNA的一条或两条链的核酸内切酶活性，例如，切口酶或催化失活酶)。在一些实施例中，本文所述系统中使用的Cas9酶除了包含提高催化效率的突变(例如，SpyCas9R221K、N394K和/或L1245V)之外，还可以包含赋予酶切口酶活性的突变(例如，SpyCas9N863A或H840A)。在一些实施例中，本文所述系统中使用的Cas9酶是SpyCas9酶或衍生物，其除了包含提高催化效率的R221K和N394K突变之外还进一步包含赋予切口酶活性的N863A突变。In some embodiments, Cas9 derivatives with enhanced activity can be used in gene-modified polypeptides. In some embodiments, the Cas9 derivatives can include mutations that improve the activity of the HNH endonuclease domain, such as SpyCas9R221K, N394K, or mutations that improve R-loop formation, such as SpyCas9 L1245V, or combinations of such mutations, such as SpyCas9 R221K/N394K, SpyCas9 N394K/L1245V, SpyCas9 R221K/L1245V, or SpyCas9 R221K/N394K/L1245V (see, e.g., Spencer and Zhang Sci Rep [Science Report] 7: 16836 (2017), Cas9 derivatives and their contents containing mutations are incorporated herein by reference). In some embodiments, the Cas9 derivative may include one or more types of mutations described herein, e.g., PAM modification mutations, protein stabilization mutations, activity enhancement mutations, and/or mutations that partially or completely inactivate one or two endonuclease domains relative to the parent enzyme (e.g., one or more mutations to eliminate endonuclease activity against one or both strands of the target DNA, e.g., nickase or catalytic inactivation enzyme). In some embodiments, the Cas9 enzyme used in the system described herein may include a mutation that imparts enzyme nickase activity (e.g., SpyCas9N863A or H840A) in addition to mutations that improve catalytic efficiency (e.g., SpyCas9R221K, N394K, and/or L1245V). In some embodiments, the Cas9 enzyme used in the system described herein is a SpyCas9 enzyme or derivative that further includes an N863A mutation that imparts nickase activity in addition to R221K and N394K mutations that improve catalytic efficiency.

表12提供了定义用于设计gRNA和/或模板RNA的组分的参数，以将表8中列出的Cas变体应用于基因修饰。切割位点指示经验证或预测的原间隔子邻近基序(PAM)要求、经验证或预测的切割位点位置(相对于PAM位点的最上游碱基)。给定酶的gRNA可以通过连接crRNA、四环和tracrRNA序列，并进一步在间隔子(min)和间隔子(max)内添加长度与靶位点的原间隔子匹配的5′间隔子来组装。此外，ssDNA切口在靶标上的预测位置对于设计模板RNA的PBS序列(其可立即与切口5′的序列退火，以启动靶引发的逆转录)很重要。在一些实施例中，本文所述的gRNA支架包含在5’至3’方向上包含以下的核酸序列或与其具有至少70％、80％、85％、90％、95％或99％同一性的序列：表12的crRNA、来自表12的同一行的四环、以及来自表12的同一行的tracrRNA。在一些实施例中，包含支架的gRNA或模板RNA进一步包含gRNA间隔子，其长度在表12的同一行中指示的间隔子(min)和间隔子(max)内。在一些实施例中，进一步包含基因修饰多肽的系统包含具有根据表12的序列的gRNA或模板RNA，其中该基因修饰多肽包含表12的同一行中描述的Cas结构域。Table 12 provides parameters defining the components for designing gRNA and/or template RNA to apply the Cas variants listed in Table 8 to gene modification. The cleavage site indicates the verified or predicted original spacer adjacent motif (PAM) requirement, verified or predicted cleavage site position (relative to the most upstream base of the PAM site). The gRNA of a given enzyme can be assembled by connecting crRNA, tetraloop and tracrRNA sequences, and further adding a 5' spacer with a length matching the original spacer of the target site in the spacer (min) and spacer (max). In addition, the predicted position of the ssDNA nick on the target is important for designing the PBS sequence of the template RNA (which can be immediately annealed with the sequence of the nick 5' to start the reverse transcription initiated by the target). In some embodiments, the gRNA scaffold described herein comprises a nucleic acid sequence comprising the following in the 5' to 3' direction, or a sequence having at least 70%, 80%, 85%, 90%, 95% or 99% identity thereto: a crRNA of Table 12, a tetraloop from the same row of Table 12, and a tracrRNA from the same row of Table 12. In some embodiments, the gRNA or template RNA comprising the scaffold further comprises a gRNA spacer having a length within the spacer (min) and spacer (max) indicated in the same row of Table 12. In some embodiments, the system further comprising a gene modifying polypeptide comprises a gRNA or template RNA having a sequence according to Table 12, wherein the gene modifying polypeptide comprises a Cas domain described in the same row of Table 12.

异源对象序列Heterogeneous object sequences

本文所述的模板RNA可以包含异源对象序列，基因修饰多肽可将其用作逆转录的模板，以将所期望的序列写入靶核酸。在一些实施例中，异源对象序列从5’至3’包含编辑后同源区、突变区和编辑前同源区。不希望受理论束缚，对模板RNA进行逆转录的RT首先逆转录编辑前同源区，然后是突变区，然后是编辑后同源区，从而产生包含所需突变的DNA链，其两侧均具有同源区。The template RNA described herein may include a heterologous subject sequence, which the gene-modifying polypeptide may use as a template for reverse transcription to write the desired sequence into the target nucleic acid. In some embodiments, the heterologous subject sequence includes a post-editing homology region, a mutation region, and a pre-editing homology region from 5' to 3'. Without wishing to be bound by theory, the RT that reverse transcribes the template RNA first reverse transcribes the pre-editing homology region, then the mutation region, and then the post-editing homology region, thereby generating a DNA strand containing the desired mutation with homology regions on both sides.

在一些实施例中，异源对象序列的长度是至少32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49、50、51、52、53、54、55、56、57、58、59、60、61、62、63、64、65、66、67、68、69、70、71、72、73、74、75、76、77、78、79、80、81、82、83、84、85、86、87、88、89、90、91、92、93、94、95、96、97、98、99、100、120、140、160、180、200、500或1,000个核苷酸(nt)，或长度是至少1、1.5、2、2.5、3、3.5、4、4.5、5、5.5、6、6.5、7、7.5、8、8.5、9、9.5或10千碱基。在一些实施例中，异源对象序列的长度是不超过33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49、50、51、52、53、54、55、56、57、58、59、60、61、62、63、64、65、66、67、68、69、70、71、72、73、74、75、76、77、78、79、80、81、82、83、84、85、86、87、88、89、90、91、92、93、94、95、96、97、98、99、100、120、140、160、180、200、500、1,000或2000个核苷酸(nt)，或长度是不超过20、15、10、9、8、7、6、5、4或3千碱基。在一些实施例中，异源对象序列的长度是30-1000、40-1000、50-1000、60-1000、70-1000、74-1000、75-1000、76-1000、77-1000、78-1000、79-1000、80-1000、85-1000、90-1000、100-1000、120-1000、140-1000、160-1000、180-1000、200-1000、500-1000、30-500、40-500、50-500、60-500、70-500、74-500、75-500、76-500、77-500、78-500、79-500、80-500、85-500、90-500、100-500、120-500、140-500、160-500、180-500、200-500、30-200、40-200、50-200、60-200、70-200、74-200、75-200、76-200、77-200、78-200、79-200、80-200、85-200、90-200、100-200、120-200、140-200、160-200、180-200、30-100、40-100、50-100、60-100、70-100、74-100、75-100、76-100、77-100、78-100、79-100、80-100、85-100或90-100个核苷酸(nt)，或长度是1-20、1-15、1-10、1-9、1-8、1-7、1-6、1-5、1-4、1-3、1-2、2-20、2-15、2-10、2-9、2-8、2-7、2-6、2-5、2-4、2-3、3-20、3-15、3-10、3-9、3-8、3-7、3-6、3-5、3-4、4-20、4-15、4-10、4-9、4-8、4-7、4-6、4-5、5-20、5-15、5-10、5-9、5-8、5-7、5-6、6-20、6-15、6-10、6-9、6-8、6-7、7-20、7-15、7-10、7-9、7-8、8-20、8-15、8-10、8-9、9-20、9-15、9-10、10-15、10-20或15-20千碱基。在一些实施例中，异源对象序列的长度为10-100、10-90、10-80、10-70、10-60、10-50、10-40、10-30或10-20nt，例如长度为10-80、10-50或10-20nt，例如，长度为约10-20nt。在一些实施例中，异源对象序列的长度为8-30、9-25、10-20、11-16或12-15个核苷酸，例如长度为11-16nt。不希望受理论束缚，在一些实施例中，更大的插入大小、更大的编辑区域(例如，靶区域中第一编辑/取代和第二编辑/取代之间的距离)和/或更多数量的所期望编辑(例如，异源对象序列与靶基因组的错配)可以产生更长的最佳异源对象序列。In some embodiments, the length of the heterologous subject sequence is at least 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 , 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 120, 140, 160, 180, 200, 500, or 1,000 nucleotides (nt) or a length of at least 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10 kilobases. In some embodiments, the length of the heterogeneous object sequence is no more than 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 7, 8, 9, 10, 15, 10, 15, 20, 25, 30, 35, 40, 50, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 120, 140, 160, 180, 200, 500, 1,000, or 2,000 nucleotides (nt) or a length of no more than 20, 15, 10, 9, 8, 7, 6, 5, 4, or 3 kilobases. In some embodiments, the length of the heterogeneous object sequence is 30-1000, 40-1000, 50-1000, 60-1000, 70-1000, 74-1000, 75-1000, 76-1000, 77-1000, 78-1000, 79-1000, 80-1000, 85-1000, 90-1000, 100-1000, 120-1000, 140-1000, 160-1000, 180-1000, 200-1000, 500-1000, 30-500, 40-500, 50-500, 60-500, 70-500, 74-500, 75-500, 76-500, 77-500, 78-500, 79-500, 80-500, 85-500, 90-500, 100-500, 120-500, 140-500, 160-500, 180-500, 200-500, 30- 200, 40-200, 50-200, 60-200, 70-200, 74-200, 75-200, 76-200, 77-200, 78-200, 79-200, 80-200, 85-200, 90-200, 100-20 0, 120-200, 140-200, 160-200, 180-200, 30-100, 40-100, 50-100, 60-100, 70-100, 74-100, 75-100, 76-100, 77-100, 78-100, 79-100, 80-100, 85-100 or 90-100 nucleotides (nt) in length, or 1-20, 1-15, 1-10, 1-9, 1-8, 1-7, 1-6, 1-5, 1-4, 1-3, 1-2, 2-20, 2-15, 2-10, 2-9, 2-8, 2-7, 2-6, 2-5, 2-4, 2-3, 3-20, 3-15, 3-10, 3-9, 3-8, 3-7, 3-6, 3-5, 3-4, 4-20, 4-15, 4-10, 4-9, 4-8, 4-7, 4-6, 4-5, 5-20, 5-15, 5-10, 5-9, 5-8, 5-7, 5-6, 6-20, 6-15, 6-10, 6-9, 6-8, 6-7, 7-20, 7-15, 7-10, 7-9, 7-8, 8-20, 8-15, 8-10, 8-9, 9-20, 9-15, 9-10, 10-15, 10-20, or 15-20 kilobases. In some embodiments, the length of the heterologous subject sequence is 10-100, 10-90, 10-80, 10-70, 10-60, 10-50, 10-40, 10-30, or 10-20 nt, such as 10-80, 10-50, or 10-20 nt in length, such as about 10-20 nt in length. In some embodiments, the length of the heterologous subject sequence is 8-30, 9-25, 10-20, 11-16, or 12-15 nucleotides in length, such as 11-16 nt in length. Without wishing to be bound by theory, in some embodiments, a larger insert size, a larger edit region (e.g., the distance between the first edit/substitution and the second edit/substitution in the target region), and/or a greater number of desired edits (e.g., mismatches of the heterologous subject sequence with the target genome) can result in a longer optimal heterologous subject sequence.

在某些实施例中，模板核酸包含定制的RNA序列模板，其可以鉴定、设计、工程改造和构建定制的RNA序列模板，以包含改变或指定宿主基因组功能的序列，例如通过将异源编码区引入基因组；影响或引起外显子结构/可变剪接，例如导致一个或多个外显子的外显子跳过；引起内源基因破坏，例如造成基因敲除；引起内源基因的转录激活；引起内源DNA的表观遗传调节；引起一个或多个可操作地连接的基因的上调，例如导致基因激活或过表达；引起一个或多个可操作地连接的基因的上调，例如造成基因敲除；等。在某些实施例中，可以将定制的RNA序列模板工程改造以包含编码外显子和/或转基因的序列，提供与转录因子激活剂、阻遏物、增强子等及其组合的结合位点。在一些实施例中，可以工程改造定制模板来编码核酸或肽标签，以在可操作地连接到靶位点的内源RNA转录本或内源蛋白质中表达。在其他实施例中，编码序列可以进一步用剪接供体位点、剪接受体位点、或聚A尾定制。In certain embodiments, the template nucleic acid comprises a customized RNA sequence template, which can identify, design, engineer and construct a customized RNA sequence template to include a sequence that changes or specifies the function of the host genome, such as by introducing a heterologous coding region into the genome; affecting or causing exon structure/alternative splicing, such as causing exon skipping of one or more exons; causing endogenous gene disruption, such as causing gene knockout; causing transcriptional activation of endogenous genes; causing epigenetic regulation of endogenous DNA; causing upregulation of one or more operably connected genes, such as causing gene activation or overexpression; causing upregulation of one or more operably connected genes, such as causing gene knockout; etc. In certain embodiments, the customized RNA sequence template can be engineered to include sequences encoding exons and/or transgenes, providing binding sites for transcription factor activators, repressors, enhancers, etc. and combinations thereof. In some embodiments, a customized template can be engineered to encode a nucleic acid or peptide tag to be expressed in an endogenous RNA transcript or endogenous protein operably connected to a target site. In other embodiments, the coding sequence can be further customized with a splice donor site, a splice acceptor site, or a poly A tail.

系统的模板核酸(例如模板RNA)通常包含用于将所期望的序列写入靶DNA的对象序列(例如，异源对象序列)。对象序列可以是编码的或非编码的。模板核酸(例如，模板RNA)可以设计成在靶DNA基因座处产生插入、突变或缺失。在一些实施例中，模板核酸(例如，模板RNA)可被设计成导致插入靶DNA。例如，模板核酸(例如，模板RNA)可以含有异源序列，其中逆转录将导致异源序列插入靶DNA中。在其他实施例中，RNA模板可以设计为将缺失引入靶DNA。例如，模板核酸(例如，模板RNA)可以在所期望缺失的上游和下游匹配靶DNA，其中逆转录将导致从模板核酸(例如模板RNA)上游和下游序列的复制，而没有间插序列，例如导致间插序列的缺失。在其他实施例中，模板核酸(例如，模板RNA)可被设计为将编辑引入靶DNA。例如，模板RNA可以在一个或多个核苷酸除外的情况下匹配靶DNA序列，其中逆转录将导致这些编辑复制到靶DNA中，例如导致突变，例如转位或颠换突变。The template nucleic acid (e.g., template RNA) of the system generally includes an object sequence (e.g., a heterologous object sequence) for writing the desired sequence into the target DNA. The object sequence may be coded or non-coded. The template nucleic acid (e.g., template RNA) may be designed to produce an insertion, mutation, or deletion at the target DNA locus. In some embodiments, the template nucleic acid (e.g., template RNA) may be designed to cause insertion into the target DNA. For example, the template nucleic acid (e.g., template RNA) may contain a heterologous sequence, wherein reverse transcription will cause the heterologous sequence to be inserted into the target DNA. In other embodiments, the RNA template may be designed to introduce deletions into the target DNA. For example, the template nucleic acid (e.g., template RNA) may match the target DNA upstream and downstream of the desired deletion, wherein reverse transcription will cause replication of upstream and downstream sequences from the template nucleic acid (e.g., template RNA), without intervening sequences, such as causing the deletion of intervening sequences. In other embodiments, the template nucleic acid (e.g., template RNA) may be designed to introduce editing into the target DNA. For example, the template RNA may match the target DNA sequence except for one or more nucleotides, wherein reverse transcription will cause these editing to be copied into the target DNA, such as causing mutations, such as transposition or transversion mutations.

在一些实施例中，将对象序列写入靶位点会导致核苷酸的取代，例如，其中对象序列的全长对应于具有一个或多个错配碱基的靶位点的匹配长度。在一些实施例中，异源对象序列可以设计成使得可以发生序列改变的组合，例如同时添加和缺失、添加和取代、或缺失和取代。In some embodiments, writing the subject sequence into the target site results in substitution of nucleotides, e.g., where the full length of the subject sequence corresponds to the matched length of the target site with one or more mismatched bases. In some embodiments, the heterologous subject sequence can be designed so that a combination of sequence changes can occur, e.g., simultaneous additions and deletions, additions and substitutions, or deletions and substitutions.

在一些实施例中，异源对象序列可包含开放阅读框或开放阅读框的片段。在一些实施例中，异源对象序列具有Kozak序列。在一些实施例中，异源对象序列具有内部核糖体进入位点。在一些实施例中，异源对象序列具有自切割肽，例如T2A或P2A位点。在一些实施例中，异源对象序列具有起始密码子。在一些实施例中，模板RNA具有剪接受体位点。在一些实施例中，模板RNA具有剪接供体位点。示例性剪接受体和剪接供体位点在WO 2016044416中进行了描述，其通过援引以其全文并入本文。示例性剪接受体位点序列是本领域技术人员已知的。在一些实施例中，模板RNA在终止密码子的下游具有微小RNA结合位点。在一些实施例中，模板RNA在开放阅读框的终止密码子下游具有聚A尾。在一些实施例中，模板RNA包含一个或多个外显子。在一些实施例中，模板RNA包含一个或多个内含子。在一些实施例中，模板RNA包含真核转录终止子。在一些实施例中，模板RNA包含增强的翻译元件或翻译增强元件。在一些实施例中，RNA包含人T细胞白血病病毒(HTLV-1)R区。在一些实施例中，RNA包含增强核输出的转录后调节元件，例如乙型肝炎病毒(HPRE)或土拨鼠肝炎病毒(WPRE)的转录后调节元件。In some embodiments, the heterologous object sequence may comprise an open reading frame or a fragment of an open reading frame. In some embodiments, the heterologous object sequence has a Kozak sequence. In some embodiments, the heterologous object sequence has an internal ribosome entry site. In some embodiments, the heterologous object sequence has a self-cleaving peptide, such as a T2A or P2A site. In some embodiments, the heterologous object sequence has a start codon. In some embodiments, the template RNA has a splice acceptor site. In some embodiments, the template RNA has a splice donor site. Exemplary splice acceptors and splice donor sites are described in WO 2016044416, which is incorporated herein by reference in its entirety. Exemplary splice acceptor site sequences are known to those skilled in the art. In some embodiments, the template RNA has a microRNA binding site downstream of the stop codon. In some embodiments, the template RNA has a poly A tail downstream of the stop codon of the open reading frame. In some embodiments, the template RNA comprises one or more exons. In some embodiments, the template RNA comprises one or more introns. In some embodiments, the template RNA comprises a eukaryotic transcription terminator. In some embodiments, the template RNA comprises an enhanced translation element or a translation enhancing element. In some embodiments, the RNA comprises a human T-cell leukemia virus (HTLV-1) R region. In some embodiments, the RNA comprises a post-transcriptional regulatory element that enhances nuclear export, such as a post-transcriptional regulatory element of hepatitis B virus (HPRE) or woodchuck hepatitis virus (WPRE).

在一些实施例中，异源对象序列可以含有非编码序列。例如，模板核酸(例如，模板RNA)可以包含调节元件，例如，启动子或增强子序列或miRNA结合位点。在一些实施例中，对象序列在靶位点的整合将导致内源基因的上调。在一些实施例中，对象序列在靶位点的整合将导致内源基因的下调。在一些实施例中，模板核酸(例如，模板RNA)包含组织特异性启动子或增强子，其中的每个可以是单向的或双向的。在一些实施例中，启动子是RNA聚合酶I启动子、RNA聚合酶II启动子或RNA聚合酶III启动子。在一些实施例中，启动子包含TATA元件。在一些实施例中，启动子包含B识别元件。在一些实施例中，启动子具有针对转录因子的一个或多个结合位点。In some embodiments, the heterologous subject sequence can contain non-coding sequences. For example, the template nucleic acid (e.g., template RNA) can include regulatory elements, e.g., promoter or enhancer sequences or miRNA binding sites. In some embodiments, the integration of the subject sequence at the target site will result in the upregulation of endogenous genes. In some embodiments, the integration of the subject sequence at the target site will result in the downregulation of endogenous genes. In some embodiments, the template nucleic acid (e.g., template RNA) includes a tissue-specific promoter or enhancer, each of which can be unidirectional or bidirectional. In some embodiments, the promoter is an RNA polymerase I promoter, an RNA polymerase II promoter, or an RNA polymerase III promoter. In some embodiments, the promoter includes a TATA element. In some embodiments, the promoter includes a B recognition element. In some embodiments, the promoter has one or more binding sites for a transcription factor.

在一些实施例中，模板核酸(例如，模板RNA)包含协调表观遗传修饰的位点。在一些实施例中，模板核酸(例如，模板RNA)包含染色质绝缘子。例如，模板核酸(例如模板RNA)包含CTCF位点或靶向用于DNA甲基化的位点。In some embodiments, the template nucleic acid (e.g., template RNA) comprises a site for coordinating epigenetic modifications. In some embodiments, the template nucleic acid (e.g., template RNA) comprises a chromatin insulator. For example, the template nucleic acid (e.g., template RNA) comprises a CTCF site or a site targeted for DNA methylation.

在一些实施例中，模板核酸(例如，模板RNA)包含由至少一个可操作地连接至效应子序列的调节区构成的基因表达单元。效应子序列可以是转录成RNA的序列(例如，编码序列或非编码序列，例如编码微小RNA的序列)。In some embodiments, the template nucleic acid (e.g., template RNA) comprises a gene expression unit consisting of at least one regulatory region operably linked to an effector sequence. The effector sequence can be a sequence transcribed into RNA (e.g., a coding sequence or a non-coding sequence, such as a sequence encoding a microRNA).

在一些实施例中，将模板核酸(例如，模板RNA)的异源对象序列插入靶基因组的内源内含子中。在一些实施例中，将模板核酸(例如，模板RNA)的异源对象序列插入靶基因组中，从而充当新的外显子。在一些实施例中，将异源对象序列插入靶基因组导致天然外显子的替换或天然外显子的跳过。In some embodiments, a heterologous subject sequence of a template nucleic acid (e.g., a template RNA) is inserted into an endogenous intron of a target genome. In some embodiments, a heterologous subject sequence of a template nucleic acid (e.g., a template RNA) is inserted into a target genome, thereby acting as a new exon. In some embodiments, insertion of a heterologous subject sequence into a target genome results in replacement of a native exon or skipping of a native exon.

模板核酸(例如，模板RNA)可以设计成在靶DNA基因座处产生插入、突变或缺失。在一些实施例中，模板核酸(例如，模板RNA)可被设计成导致插入靶DNA。例如，模板核酸(例如，模板RNA)可以含有异源对象序列，其中逆转录将导致异源对象序列插入靶DNA中。在其他实施例中，RNA模板可以设计为将缺失写入靶DNA。例如，模板核酸(例如，模板RNA)可以在所期望缺失的上游和下游匹配靶DNA，其中逆转录将导致从模板核酸(例如模板RNA)上游和下游序列的复制，而没有间插序列，例如导致间插序列的缺失。在其他实施例中，模板核酸(例如，模板RNA)可被设计为将编辑写入靶DNA。例如，模板RNA可以在一个或多个核苷酸除外的情况下匹配靶DNA序列，其中逆转录将导致这些编辑复制到靶DNA中，例如导致突变，例如转位或颠换突变。Template nucleic acid (e.g., template RNA) can be designed to produce insertion, mutation or deletion at the target DNA locus. In some embodiments, template nucleic acid (e.g., template RNA) can be designed to cause insertion of target DNA. For example, template nucleic acid (e.g., template RNA) can contain heterologous object sequence, wherein reverse transcription will cause heterologous object sequence to be inserted into target DNA. In other embodiments, RNA template can be designed to write deletion into target DNA. For example, template nucleic acid (e.g., template RNA) can match target DNA upstream and downstream of desired deletion, wherein reverse transcription will cause replication of upstream and downstream sequences from template nucleic acid (e.g., template RNA), without intervening sequence, such as causing the deletion of intervening sequence. In other embodiments, template nucleic acid (e.g., template RNA) can be designed to write editing into target DNA. For example, template RNA can match target DNA sequence except one or more nucleotides, wherein reverse transcription will cause these editing to be copied into target DNA, such as causing mutation, such as transposition or transversion mutation.

在一些实施例中，编辑前同源结构域包含与包含在靶核酸分子中的核酸序列具有至少100％序列同一性的核酸序列。In some embodiments, the pre-editing homology domain comprises a nucleic acid sequence having at least 100% sequence identity to a nucleic acid sequence contained in a target nucleic acid molecule.

在一些实施例中，编辑后同源结构域包含与包含在靶核酸分子中的核酸序列具有至少100％序列同一性的核酸序列。In some embodiments, the post-editing homology domain comprises a nucleic acid sequence having at least 100% sequence identity to a nucleic acid sequence contained in a target nucleic acid molecule.

PBS序列PBS sequence

在一些实施例中，模板核酸(例如，模板RNA)包含PBS序列。在一些实施例中，PBS序列位于异源对象序列的3′并且与跟待由本文所述系统修饰的位点相邻的序列互补，或与跟待由系统/基因修饰多肽修饰的位点相邻的序列互补的序列包含不超过1、2、3、4或5个错配。在一些实施例中，PBS序列在靶核酸分子中切口位点的1、2、3、4、5、6、7、8、9或10个核苷酸内结合。在一些实施例中，PBS序列与靶核酸分子的结合允许启动靶引发的逆转录(TPRT)，例如，3′同源结构域充当TPRT的引物。在一些实施例中，PBS序列的长度为3-5、5-10、10-30、10-25、10-20、10-19、10-18、10-17、10-16、10-15、10-14、10-13、10-12、10-11、11-30、11-25、11-20、11-19、11-18、11-17、11-16、11-15、11-14、11-13、11-12、12-30、12-25、12-20、12-19、12-18、12-17、12-16、12-15、12-14、12-13、13-30、13-25、13-20、13-19、13-18、13-17、13-16、13-15、13-14、14-30、14-25、14-20、14-19、14-18、14-17、14-16、14-15、15-30、15-25、15-20、15-19、15-18、15-17、15-16、16-30、16-25、16-20、16-19、16-18、16-17、17-30、17-25、17-20、17-19、17-18、18-30、18-25、18-20、18-19、19-30、19-25、19-20、20-30、20-25或25-30个核苷酸，例如长度为10-17、12-16或12-14个核苷酸。在一些实施例中，PBS序列的长度为5-20、8-16、8-14、8-13、9-13、9-12或10-12个核苷酸，例如长度为9-12个核苷酸。In some embodiments, the template nucleic acid (e.g., template RNA) comprises a PBS sequence. In some embodiments, the PBS sequence is located 3' of the heterologous subject sequence and is complementary to a sequence adjacent to a site to be modified by the system described herein, or a sequence complementary to a sequence adjacent to a site to be modified by a system/gene modification polypeptide comprises no more than 1, 2, 3, 4, or 5 mismatches. In some embodiments, the PBS sequence binds within 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides of the nicking site in the target nucleic acid molecule. In some embodiments, the binding of the PBS sequence to the target nucleic acid molecule allows for the initiation of target-triggered reverse transcription (TPRT), for example, the 3' homology domain acts as a primer for TPRT. In some embodiments, the length of the PBS sequence is 3-5, 5-10, 10-30, 10-25, 10-20, 10-19, 10-18, 10-17, 10-16, 10-15, 10-14, 10-13, 10-12, 10-11, 11-30, 11-25, 11-20, 11-19, 11-18, 11- 17,11-16,11-15,11-14,11-13,11-12,12-30,12-25,12-20,12-19,12-18,12-17,12-16,12-15,12-14,12-13,13-30,13-25,13-20,13-19,13-18 ,13-17,13 -16, 13-15, 13-14, 14-30, 14-25, 14-20, 14-19, 14-18, 14-17, 14-16, 14-15, 15-30, 15-25, 15-20, 15-19, 15-18, 15-17, 15-16, 16-30, 16-25, 16-2 0, 16-19, 1 In some embodiments, the PBS sequence is 5-20, 8-16, 8-14, 8-13, 9-13, 9-12 or 10-12 nucleotides in length, for example, 9-12 nucleotides in length.

模板核酸(例如，模板RNA)可以与靶DNA有一些同源性。在一些实施例中，模板核酸(例如，模板RNA)的PBS序列结构域可以用作靶DNA的退火区，使得靶DNA被定位以引发模板核酸(例如，模板RNA)的逆转录。在一些实施例中，模板核酸(例如，模板RNA)具有在RNA的3′端的与靶DNA完全同源的至少2、3、4、5、6、7、8、9、10、11、12、13、14、15、20、25、30、35、40、45、50、60、70、80、90、100、110、120、130、140、150、175、200或更多个碱基。在一些实施例中，模板核酸(例如，模板RNA)具有例如在模板核酸(例如，模板RNA)的5′端的与靶DNA至少50％、60％、70％、80％、85％、90％、95％、97％、98％、99％或100％同源的至少2、3、4、5、6、7、8、9、10、11、12、13、14、15、20、25、30、35、40、45、50、60、70、80、90、100、110、120、130、140、150、175、200或更多个碱基。The template nucleic acid (e.g., template RNA) may have some homology with the target DNA. In some embodiments, the PBS sequence domain of the template nucleic acid (e.g., template RNA) may be used as an annealing zone for the target DNA so that the target DNA is positioned to initiate reverse transcription of the template nucleic acid (e.g., template RNA). In some embodiments, the template nucleic acid (e.g., template RNA) has at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 175, 200 or more bases completely homologous to the target DNA at the 3' end of the RNA. In some embodiments, the template nucleic acid (e.g., template RNA) has at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 175, 200, or more bases that are at least 50%, 60%, 70%, 80%, 90, 100, 110, 120, 130, 140, 150, 175, 200, or more bases at the 5′ end of the template nucleic acid (e.g., template RNA) that are at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% homologous to the target DNA.

示例性模板序列Exemplary Template Sequences

在本文的系统和方法的一些实施例中，模板RNA包含gRNA间隔子，该间隔子包含表1的gRNA间隔子序列的核心核苷酸。在一些实施例中，gRNA间隔子另外包含一个或多个(例如，2、3或全部)连续核苷酸，这些核苷酸从gRNA间隔子的侧翼核苷酸的3’端开始。在一些实施例中，包含表1中序列的模板RNA包含于系统内，该系统进一步包含具有表1的同一行中所列的RT结构域的基因修饰多肽。RT结构域氨基酸序列可见于例如本文的表6。In some embodiments of the systems and methods herein, the template RNA comprises a gRNA spacer comprising a core nucleotide of a gRNA spacer sequence of Table 1. In some embodiments, the gRNA spacer further comprises one or more (e.g., 2, 3 or all) consecutive nucleotides starting from the 3' end of the flanking nucleotides of the gRNA spacer. In some embodiments, a template RNA comprising a sequence in Table 1 is included in a system further comprising a gene modifying polypeptide having an RT domain listed in the same row of Table 1. The RT domain amino acid sequence can be found, for example, in Table 6 herein.

表1：示例性gRNA间隔子Cas对Table 1: Exemplary gRNA spacer Cas pairs

表1提供了用于纠正HBB中致病性EV6突变的gRNA数据库。间隔子、PAM和Cas变体列表，用于在适当位置产生切口，以便使用基因修饰系统安装所期望的基因组编辑。该表中的间隔子被设计用于与包含表中所示Cas种类的切口酶变体的基因修饰多肽一起使用。表2、3和4详细介绍了系统的其他组分，并且其组织方式使得此处第1列(“ID”)中显示的ID号与后续表中的相同ID号相对应。Table 1 provides a gRNA database for correcting pathogenic EV6 mutations in HBB. A list of spacers, PAMs, and Cas variants for creating cuts at appropriate locations to install the desired genome editing using a gene modification system. The spacers in this table are designed for use with gene modification polypeptides containing nickase variants of the Cas species shown in the table. Tables 2, 3, and 4 detail the other components of the system and are organized so that the ID number shown in column 1 ("ID") here corresponds to the same ID number in the subsequent tables.

在本文提供的示例性模板序列中，大写字母表示“核心核苷酸”，而小写字母表示“侧翼核苷酸”。本文中，当认为RNA序列(例如，模板RNA序列)包含含有胸腺嘧啶(T)的特定序列(例如，表1的序列或其一部分)时，当然应理解，RNA序列可以(并且确实经常)包含尿嘧啶(U)来代替T。例如，RNA序列在表1中的序列中显示为T的每个位置处都可以包含U。更特别地，本披露提供了根据表1中所示每个gRNA间隔子序列的RNA序列，其中该RNA序列具有U代替表1中的序列中的每个T。In the exemplary template sequences provided herein, capital letters represent "core nucleotides", while lowercase letters represent "flanking nucleotides". Herein, when it is considered that an RNA sequence (e.g., a template RNA sequence) comprises a specific sequence containing thymine (T) (e.g., a sequence of Table 1 or a portion thereof), it is of course understood that the RNA sequence can (and does often) comprise uracil (U) in place of T. For example, the RNA sequence can comprise U at each position shown as T in the sequence in Table 1. More particularly, the present disclosure provides an RNA sequence according to each gRNA spacer sequence shown in Table 1, wherein the RNA sequence has U in place of each T in the sequence in Table 1.

在本文的系统和方法的一些实施例中，异源对象序列包含表3的RT模板序列的核心核苷酸。在一些实施例中，异源对象序列另外包含一个或多个(例如，2、3、4、5、10、20、30、40个或全部)连续核苷酸，这些核苷酸从RT模板序列的侧翼核苷酸的3’端开始。在一些实施例中，异源对象序列包含与gRNA间隔子序列相对应的表3的RT模板序列的核心核苷酸。在序列表的上下文中，当两个组分在引用表中具有相同的ID号时，第一组分“对应于”第二组分。例如，对于ID#1的gRNA间隔子，对应的RT模板将是也具有ID#1的RT模板。在一些实施例中，异源对象序列另外包含一个或多个连续核苷酸，这些核苷酸从RT模板序列的侧翼核苷酸的3’端开始。In some embodiments of the systems and methods herein, the heterologous object sequence comprises the core nucleotides of the RT template sequence of Table 3. In some embodiments, the heterologous object sequence further comprises one or more (e.g., 2, 3, 4, 5, 10, 20, 30, 40 or all) consecutive nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence. In some embodiments, the heterologous object sequence comprises the core nucleotides of the RT template sequence of Table 3 corresponding to the gRNA spacer sequence. In the context of the sequence table, when two components have the same ID number in the reference table, the first component "corresponds to" the second component. For example, for a gRNA spacer of ID#1, the corresponding RT template will be an RT template also having ID#1. In some embodiments, the heterologous object sequence further comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence.

在一些实施例中，引物结合位点(PBS)序列具有包含来自表3中与RT模板序列同一行的PBS序列的核心核苷酸的序列。在一些实施例中，PBS序列另外包含一个或多个(例如，1、2、3、4、5、6、7个或全部)连续核苷酸，这些核苷酸从引物区的侧翼核苷酸的5’端开始。In some embodiments, the primer binding site (PBS) sequence has a sequence comprising core nucleotides from a PBS sequence in the same row as the RT template sequence in Table 3. In some embodiments, the PBS sequence further comprises one or more (e.g., 1, 2, 3, 4, 5, 6, 7, or all) consecutive nucleotides starting from the 5' end of the flanking nucleotides of the primer region.

表3：示例性RT序列(异源对象序列)和PBS序列对Table 3: Exemplary RT sequence (heterologous subject sequence) and PBS sequence pairs

表3提供了用于纠正HBB中致病性EV6突变的模板RNA的示例性PBS序列和异源对象序列(逆转录模板区)。对表1中的gRNA间隔子进行过滤，例如，通过出现在所期望编辑位置的15nt范围内并使用等级1Cas酶进行过滤。PBS序列和异源对象序列(逆转录模板区)是相对于表1中的同源gRNA引导的切口位点设计的，如本申请所述。为了举例说明，这些区域被设计为超出编辑位置(RT)8-17nt(引发)和1-50nt延伸。不希望受实例的限制，考虑到长度的变化，提供使用最大长度参数并包含给定参数内所有较短长度的模板的序列。序列如下显示：以大写字母表示核心序列，小写字母表示可能在所述长度参数内截短的侧翼序列。Table 3 provides exemplary PBS sequences and heterologous object sequences (reverse transcription template regions) for correcting template RNAs for pathogenic EV6 mutations in HBB. The gRNA spacers in Table 1 are filtered, for example, by appearing within the 15nt range of the desired editing position and filtering using a grade 1 Cas enzyme. PBS sequences and heterologous object sequences (reverse transcription template regions) are designed relative to the incision sites guided by the homologous gRNA in Table 1, as described in the present application. For illustration, these regions are designed to extend beyond the editing position (RT) 8-17nt (initiate) and 1-50nt. It is not desirable to be limited by the examples, and considering the change in length, a sequence using a maximum length parameter and containing all templates of shorter length within a given parameter is provided. The sequence is shown as follows: the core sequence is represented by uppercase letters, and the flanking sequences that may be truncated within the length parameter are represented by lowercase letters.

大写字母表示“核心核苷酸”，而小写字母表示“侧翼核苷酸”。本文中，当认为RNA序列(例如，模板RNA序列)包含含有胸腺嘧啶(T)的特定序列(例如，表3的序列或其一部分)时，当然应理解，RNA序列可以(并且确实经常)包含尿嘧啶(U)来代替T。例如，RNA序列在表3中的序列中显示为T的每个位置处都可以包含U。更特别地，本披露提供了根据表3中所示的每个异源对象序列和PBS序列的RNA序列，其中该RNA序列具有U代替表3的序列中的每个T。Capital letters represent "core nucleotides", while lowercase letters represent "flanking nucleotides". Herein, when an RNA sequence (e.g., a template RNA sequence) is considered to contain a specific sequence containing thymine (T) (e.g., a sequence of Table 3 or a portion thereof), it is of course understood that the RNA sequence can (and does often) contain uracil (U) in place of T. For example, the RNA sequence can contain a U at each position shown as a T in the sequence in Table 3. More specifically, the present disclosure provides an RNA sequence according to each heterologous subject sequence and a PBS sequence shown in Table 3, wherein the RNA sequence has a U in place of each T in the sequence of Table 3.

在本文的系统和方法的一些实施例中，模板RNA包含gRNA支架(例如，其结合基因修饰多肽，例如Cas多肽)，该支架包含表12中gRNA支架的序列。在一些实施例中，gRNA支架包含与表12中的gRNA支架具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。在一些实施例中，gRNA支架包含对应于RT模板序列、间隔子序列、或两者的表12中支架区的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。In some embodiments of the systems and methods herein, the template RNA comprises a gRNA scaffold (e.g., which binds a gene modifying polypeptide, such as a Cas polypeptide) comprising a sequence of a gRNA scaffold in Table 12. In some embodiments, the gRNA scaffold comprises a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity to a gRNA scaffold in Table 12. In some embodiments, the gRNA scaffold comprises a sequence of a scaffold region in Table 12 corresponding to an RT template sequence, a spacer sequence, or both, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.

在本文的系统和方法的一些实施例中，系统进一步包含第二链靶向性gRNA，该第二链靶向性gRNA将切口引导至人类HBB基因的第二链。在一些实施例中，第二链靶向性gRNA包含来自表2的左gRNA间隔子序列或右gRNA间隔子序列。在一些实施例中，gRNA间隔子另外包含一个或多个(例如，2、3或全部)连续核苷酸，这些核苷酸从左gRNA间隔子序列或右gRNA间隔子序列的侧翼核苷酸的3’端开始。在一些实施例中，第二链靶向性gRNA包含含有表4中第二切口gRNA序列的核心核苷酸的序列，或与其具有至少70％、75％、80％、85％、90％、95％、96％、97％、98％或99％同一性的序列。在一些实施例中，第二切口gRNA序列另外包含一个或多个连续核苷酸，这些核苷酸从第二切口gRNA序列的侧翼核苷酸的3’端开始。在一些实施例中，第二切口gRNA包含与基因修饰多肽的Cas结构域正交的gRNA支架序列。在一些实施例中，第二切口gRNA包含表12的gRNA支架序列。In some embodiments of the systems and methods herein, the system further comprises a second strand targeting gRNA that guides the nick to the second strand of the human HBB gene. In some embodiments, the second strand targeting gRNA comprises a left gRNA spacer sequence or a right gRNA spacer sequence from Table 2. In some embodiments, the gRNA spacer further comprises one or more (e.g., 2, 3 or all) consecutive nucleotides that start from the 3' end of the flanking nucleotides of the left gRNA spacer sequence or the right gRNA spacer sequence. In some embodiments, the second strand targeting gRNA comprises a sequence of core nucleotides containing the second nick gRNA sequence in Table 4, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto. In some embodiments, the second nick gRNA sequence further comprises one or more consecutive nucleotides that start from the 3' end of the flanking nucleotides of the second nick gRNA sequence. In some embodiments, the second nick gRNA comprises a gRNA scaffold sequence orthogonal to the Cas domain of the gene modifying polypeptide. In some embodiments, the second nicked gRNA comprises the gRNA scaffold sequence of Table 12.

表2：示例性左gRNA间隔子和右gRNA间隔子对Table 2: Exemplary left gRNA spacer and right gRNA spacer pairs

表2提供了用于纠正HBB中致病性E6V突变的任选使用的示例性第二切口gRNA种类。对表1中的gRNA间隔子进行过滤，例如，通过出现在所期望编辑位置的15nt范围内并使用等级1Cas酶进行过滤。通过在相对于第一gRNA定义的第一切口位点的-40至-140(“左”)和+40至+140(“右”)区域中搜索DNA的相对链来生成第二切口gRNA，以查找对应Cas变体所使用的PAM。在靶切口位点的每一侧均显示了一个示例性间隔子。Table 2 provides optional exemplary second nicking gRNA species for correcting pathogenic E6V mutations in HBB. The gRNA spacers in Table 1 were filtered, for example, by appearing within 15nt of the desired editing position and using a level 1 Cas enzyme. The second nicking gRNA was generated by searching the opposite strand of DNA in the region -40 to -140 ("left") and +40 to +140 ("right") relative to the first nicking site defined by the first gRNA to find the PAM used by the corresponding Cas variant. An exemplary spacer is shown on each side of the target nicking site.

大写字母表示“核心核苷酸”，而小写字母表示“侧翼核苷酸”。本文中，当认为RNA序列(例如，产生第二切口的gRNA)包含含有胸腺嘧啶(T)的特定序列(例如，表2的序列或其一部分)时，当然应理解，RNA序列可以(并且确实经常)包含尿嘧啶(U)来代替T。例如，RNA序列在表2中的序列中显示为T的每个位置处都可以包含U。更特别地，本披露提供了根据表2中所示每个gRNA间隔子序列的RNA序列，其中该RNA序列具有U代替表2中的序列中的每个T。Capital letters represent "core nucleotides", while lowercase letters represent "flanking nucleotides". Herein, when it is considered that an RNA sequence (e.g., a gRNA that produces a second nick) contains a specific sequence containing thymine (T) (e.g., a sequence in Table 2 or a portion thereof), it is of course understood that the RNA sequence can (and does often) contain uracil (U) in place of T. For example, the RNA sequence can contain U at each position shown as T in the sequence in Table 2. More particularly, the present disclosure provides an RNA sequence according to each gRNA spacer sequence shown in Table 2, wherein the RNA sequence has U replacing each T in the sequence in Table 2.

在一些实施例中，本文提供的系统和方法可以包含表4中所列的模板序列。表4提供了示例性模板RNA序列(第4列)和任选地第二切口gRNA序列(第5列)，这些序列设计用于与基因修饰多肽配对以纠正HBB基因中的突变。表4中的模板旨在举例说明以下的总序列：(1)gRNA间隔子(例如，用于靶向第一链切口)、(2)gRNA支架、(3)异源对象序列和(4)PBS序列(例如，用于在第一链切口处启动TPRT)。In some embodiments, the systems and methods provided herein can include the template sequences listed in Table 4. Table 4 provides exemplary template RNA sequences (column 4) and optionally second nicking gRNA sequences (column 5) designed to be paired with a gene modifying polypeptide to correct a mutation in the HBB gene. The templates in Table 4 are intended to illustrate the following overall sequences: (1) a gRNA spacer (e.g., for targeting a first strand nick), (2) a gRNA scaffold, (3) a heterologous subject sequence, and (4) a PBS sequence (e.g., for initiating TPRT at a first strand nick).

表1-4、5A-5D和6A中所示模板RNA序列可以根据正在靶向的细胞进行定制。例如，在一些实施例中，期望在编辑时灭活PAM序列(例如，使用“PAM-杀灭”修饰)以降低初始编辑后进一步基因编辑的可能性(例如，通过Cas重新靶向)。因此，本文所述的某些模板RNA被设计为将突变(例如，取代)写入靶位点的PAM，使得在编辑后，PAM位点将突变为基因修饰多肽不再识别的序列。因此，模板RNA的异源对象序列内的突变区可以包含PAM-杀灭序列。不希望受理论束缚，在一些实施例中，PAM-杀灭序列可在完成基因修饰后阻止基因修饰多肽的重新接合，或相对于缺乏PAM-杀灭序列的模板RNA减少重新接合。在一些实施例中，PAM-杀灭序列不会改变基因编码的氨基酸序列，例如，PAM-杀灭序列产生沉默突变。在其他实施例中，期望保持PAM序列完整(无PAM-杀灭)。The template RNA sequences shown in Tables 1-4, 5A-5D and 6A can be customized according to the cells being targeted. For example, in some embodiments, it is desirable to inactivate the PAM sequence during editing (e.g., using a "PAM-killing" modification) to reduce the possibility of further gene editing after initial editing (e.g., retargeting by Cas). Therefore, some template RNAs described herein are designed to write mutations (e.g., substitutions) into the PAM of the target site so that after editing, the PAM site will mutate to a sequence that the gene-modified polypeptide no longer recognizes. Therefore, the mutation region within the heterologous object sequence of the template RNA may include a PAM-killing sequence. Without wishing to be bound by theory, in some embodiments, the PAM-killing sequence may prevent the re-engagement of the gene-modified polypeptide after completing the gene modification, or reduce re-engagement relative to a template RNA lacking a PAM-killing sequence. In some embodiments, the PAM-killing sequence does not change the amino acid sequence encoded by the gene, for example, the PAM-killing sequence produces a silent mutation. In other embodiments, it is desirable to keep the PAM sequence intact (without PAM-killing).

类似地，在一些实施例中，为了降低初始编辑后进一步基因编辑的可能性(例如，通过Cas重新靶向)，可能期望通过“种子杀灭”基序改变RT模板序列的前三个核苷酸。因此，本文所述的某些模板RNA被设计为将突变(例如，取代)写入与RT模板序列的前三个核苷酸相对应的靶位点部分，使得在编辑后，靶位点将突变为与RT模板序列同源性较低的序列。因此，模板RNA的异源对象序列内的突变区可以包含种子杀灭序列。不希望受理论束缚，在一些实施例中，种子杀灭序列可在完成基因修饰后阻止基因修饰多肽的重新接合，或相对于缺乏种子杀灭序列而在其他方面相似的模板RNA减少重新接合。在一些实施例中，种子杀灭序列不会改变基因编码的氨基酸序列，例如，种子杀灭序列产生沉默突变。在其他实施例中，期望保持种子区域完整，并且不使用种子杀灭序列。Similarly, in some embodiments, in order to reduce the possibility of further gene editing after initial editing (e.g., by Cas re-targeting), it may be desirable to change the first three nucleotides of the RT template sequence by a "seed killing" motif. Therefore, certain template RNAs described herein are designed to write mutations (e.g., substitutions) into the target site portion corresponding to the first three nucleotides of the RT template sequence, so that after editing, the target site will mutate to a sequence with lower homology to the RT template sequence. Therefore, the mutation region within the heterologous object sequence of the template RNA may include a seed killing sequence. Without wishing to be bound by theory, in some embodiments, the seed killing sequence may prevent the rejoining of the gene modified polypeptide after the gene modification is completed, or reduce the rejoining relative to a template RNA that is otherwise similar but lacks a seed killing sequence. In some embodiments, the seed killing sequence does not change the amino acid sequence encoded by the gene, for example, the seed killing sequence produces a silent mutation. In other embodiments, it is desirable to keep the seed region intact and not use a seed killing sequence.

在另外的实施例中，为了优化或改善基因编辑效率，可能需要逃避靶细胞的错配修复或核苷酸修复途径，或者使靶细胞的修复途径偏向于保存编辑的链。在一些实施例中，可以在RT模板序列内引入多个沉默突变(例如，沉默取代)以逃避靶细胞的错配修复或核苷酸修复途径，或者使靶细胞的修复途径偏向于保存编辑的链。In other embodiments, in order to optimize or improve gene editing efficiency, it may be necessary to escape the mismatch repair or nucleotide repair pathway of the target cell, or to bias the repair pathway of the target cell to preserve the edited chain. In some embodiments, multiple silent mutations (e.g., silent substitutions) can be introduced into the RT template sequence to escape the mismatch repair or nucleotide repair pathway of the target cell, or to bias the repair pathway of the target cell to preserve the edited chain.

表7A提供了HBB基因内的各个位置的示例性沉默突变。Table 7A provides exemplary silent mutations at various positions within the HBB gene.

表7A.HBB基因的示例性沉默突变密码子Table 7A. Exemplary silent mutation codons of HBB gene

在一些实施例中，模板RNA包含一个或多个沉默突变。In some embodiments, the template RNA comprises one or more silent mutations.

在一些实施例中，沉默突变包含在编码HBB基因的计入初始甲硫氨酸第6个氨基酸(脯氨酸)的密码子的突变，例如突变为CCC或CCG。In some embodiments, the silent mutation comprises a mutation in the codon encoding the sixth amino acid (proline) of the HBB gene that includes an initial methionine, such as a mutation to CCC or CCG.

在一些实施例中，模板RNA包含如本文表X1-X4中所示的一个或多个沉默取代。In some embodiments, the template RNA comprises one or more silent substitutions as shown in Tables X1-X4 herein.

应当理解，表7A中所示的沉默突变可以单独或以任何方式组合用于本文所述的模板RNA序列中。It should be understood that the silent mutations shown in Table 7A can be used alone or in any combination in the template RNA sequences described herein.

具有诱导活性的gRNAgRNA with inducible activity

在一些实施例中，本文所述的gRNA(例如，作为模板RNA的一部分的gRNA或用于第二链切口的gRNA)具有诱导活性。可通过模板核酸例如模板RNA实现诱导活性，该模板核酸(除gRNA之外)还包含阻断结构域，其中部分或全部阻断结构域的序列至少部分互补于gRNA的一部分或全部。因此，阻断结构域能够与gRNA的一部分或全部杂交或基本上杂交。在一些实施例中，阻断结构域和诱导活性gRNA布置在模板核酸例如模板RNA上，使得gRNA可以采用第一构象(其中阻断结构域与gRNA杂交或基本上杂交)和第二构象(其中阻断结构域不与gRNA杂交或基本上不杂交)。在一些实施例中，在第一构象中，gRNA不能结合基因修饰多肽(例如，模板核酸结合结构域、DNA结合结构域或核酸内切酶结构域(例如，CRISPR/Cas蛋白))或以与缺乏阻断结构域的其他方面相似的模板RNA相比亲和力显著降低的方式结合。在一些实施例中，在第二构象中，gRNA能结合基因修饰多肽(例如，模板核酸结合结构域、DNA结合结构域或核酸内切酶结构域(例如，CRISPR/Cas蛋白))。在一些实施例中，gRNA是处于第一构象还是第二构象可以影响基因修饰多肽(例如，基因修饰多肽包含的CRISPR/Cas蛋白)的DNA结合或核酸内切酶活性是否活跃。In some embodiments, gRNA as described herein (e.g., gRNA as a part of template RNA or gRNA for second strand nick) has induction activity. Induction activity can be achieved by template nucleic acid such as template RNA, and the template nucleic acid (except gRNA) also includes a blocking domain, wherein the sequence of part or all of the blocking domain is at least partially complementary to a part or all of the gRNA. Therefore, the blocking domain can hybridize or substantially hybridize with a part or all of the gRNA. In some embodiments, the blocking domain and the induction activity gRNA are arranged on a template nucleic acid such as template RNA, so that the gRNA can adopt a first conformation (wherein the blocking domain hybridizes or substantially hybridizes with the gRNA) and a second conformation (wherein the blocking domain does not hybridize or substantially does not hybridize with the gRNA). In some embodiments, in the first conformation, the gRNA cannot bind to a gene modification polypeptide (e.g., a template nucleic acid binding domain, a DNA binding domain, or a nucleic acid endonuclease domain (e.g., CRISPR/Cas protein)) or bind in a manner that significantly reduces affinity compared to a template RNA similar to other aspects lacking a blocking domain. In some embodiments, in the second conformation, the gRNA can bind to a gene-modifying polypeptide (e.g., a template nucleic acid binding domain, a DNA binding domain, or a nuclease domain (e.g., a CRISPR/Cas protein)). In some embodiments, whether the gRNA is in the first conformation or the second conformation can affect whether the DNA binding or nuclease activity of the gene-modifying polypeptide (e.g., a CRISPR/Cas protein contained in the gene-modifying polypeptide) is active.

在一些实施例中，协调第二切口的gRNA具有诱导活性。在一些实施例中，协调第二切口的gRNA在模板被逆转录后诱导。在一些实施例中，gRNA与阻断结构域的杂交可以使用开放分子破坏。在一些实施例中，开放分子包含与gRNA的部分或全部或阻断结构域结合并抑制gRNA与阻断结构域杂交的药剂。在一些实施例中，开放分子包含核酸，例如，包含与gRNA、阻断结构域或两者部分地或完全地互补的序列。通过选择或设计合适的开放分子，提供的开放分子可以促进gRNA构象的变化，使其可以与CRISPR/Cas蛋白相关联并提供CRISPR/Cas蛋白的相关功能(例如，DNA结合和/或核酸内切酶活性)。不希望受理论束缚，在选定的时间和/或位置提供开放分子可以允许对gRNA、CRISPR/Cas蛋白或包含它们的基因修饰系统的活性进行空间和时间控制。在一些实施例中，开放分子包含对包含基因修饰多肽和/或模板核酸的细胞而言是外源的。在一些实施例中，开放分子包含内源药剂(例如，对于包含基因修饰多肽和/或模板核酸(其包含gRNA和阻断结构域)的细胞而言是内源的)。例如，可以选择诱导型gRNA、阻断结构域和开放分子，使得开放分子是在靶细胞或组织中表达的内源药剂，例如，从而确保基因修饰系统在靶细胞或组织中的活性。作为另一个实例，可以选择诱导型gRNA、阻断结构域和开放分子，使得开放分子在一个或多个非靶细胞或组织中不存在或基本上不表达，例如，从而确保基因修饰系统的活性在一种或多种非靶细胞或组织中不发生或基本上不发生，或与靶细胞或组织相比以降低的水平发生。示例性的阻断结构域、开放分子及其用途描述于PCT申请公开WO 2020044039 A1，其通过援引以其全文并入本文。在一些实施例中，模板核酸(例如模板RNA)可以包含一个或多个序列或结构，用于由基因修饰多肽的一个或多个组分(例如逆转录酶或RNA结合结构域和gRNA)结合。在一些实施例中，gRNA促进与基因修饰多肽的模板核酸结合结构域(例如，RNA结合结构域)的相互作用。在一些实施例中，gRNA将基因修饰多肽引导至匹配的靶序列，例如在靶细胞基因组中。In some embodiments, the gRNA coordinating the second incision has induction activity. In some embodiments, the gRNA coordinating the second incision is induced after the template is reverse transcribed. In some embodiments, the hybridization of gRNA and the blocking domain can be destroyed using open molecules. In some embodiments, the open molecule comprises a medicament that is combined with part or all of the gRNA or the blocking domain and inhibits the hybridization of gRNA and the blocking domain. In some embodiments, the open molecule comprises a nucleic acid, for example, comprising a sequence partially or completely complementary to the gRNA, the blocking domain or both. By selecting or designing a suitable open molecule, the open molecule provided can promote the change of gRNA conformation, so that it can be associated with CRISPR/Cas protein and provide the related functions (for example, DNA binding and/or endonuclease activity) of CRISPR/Cas protein. It is not desirable to be bound by theory, providing an open molecule at a selected time and/or position can allow the activity of gRNA, CRISPR/Cas protein or a gene modification system comprising them to be controlled in space and time. In some embodiments, the open molecule comprises an exogenous cell comprising a gene modification polypeptide and/or a template nucleic acid. In some embodiments, the open molecule comprises an endogenous agent (e.g., endogenous for cells comprising a gene modification polypeptide and/or a template nucleic acid (which comprises a gRNA and a blocking domain). For example, an inducible gRNA, a blocking domain, and an open molecule can be selected so that the open molecule is an endogenous agent expressed in a target cell or tissue, for example, so as to ensure the activity of the gene modification system in the target cell or tissue. As another example, an inducible gRNA, a blocking domain, and an open molecule can be selected so that the open molecule does not exist or is not substantially expressed in one or more non-target cells or tissues, for example, so as to ensure that the activity of the gene modification system does not occur or does not substantially occur in one or more non-target cells or tissues, or occurs at a reduced level compared to the target cell or tissue. Exemplary blocking domains, open molecules, and uses thereof are described in PCT Application Publication WO 2020044039 A1, which is incorporated herein by reference in its entirety. In some embodiments, a template nucleic acid (e.g., a template RNA) may include one or more sequences or structures for being combined by one or more components (e.g., a reverse transcriptase or an RNA binding domain and a gRNA) of a gene modification polypeptide. In some embodiments, gRNA promotes interaction with a template nucleic acid binding domain (e.g., RNA binding domain) of a gene-modified polypeptide. In some embodiments, gRNA guides the gene-modified polypeptide to a matching target sequence, e.g., in a target cell genome.

基因修饰系统中的环状RNA和核酶Circular RNA and ribozymes in gene modification systems

预期在靶细胞内的配制、递送或基因修饰反应期间采用环状和/或线性RNA状态可能是有用的。因此，在本文所述的任何方面的一些实施例中，基因修饰系统包含一个或多个环状RNA(circRNA)。在本文所述任何方面的一些实施例中，基因修饰系统包含一种或多种线性RNA。在一些实施例中，本文所述的核酸(例如，模板核酸、编码基因修饰多肽的核酸分子或两者)是circRNA。在一些实施例中，环状RNA分子编码基因修饰多肽。在一些实施例中，编码基因修饰多肽的circRNA分子被递送至宿主细胞。在一些实施例中，环状RNA分子编码重组酶，例如如本文所述。在一些实施例中，将编码重组酶的circRNA分子递送至宿主细胞。在一些实施例中，编码基因修饰多肽的circRNA分子在翻译之前被线性化(例如，在宿主细胞中，例如，在宿主细胞的细胞核中)。It is expected that the use of circular and/or linear RNA states during the preparation, delivery or gene modification reaction in the target cell may be useful. Therefore, in some embodiments of any aspect described herein, the gene modification system comprises one or more circular RNAs (circRNA). In some embodiments of any aspect described herein, the gene modification system comprises one or more linear RNAs. In some embodiments, nucleic acids described herein (e.g., template nucleic acids, nucleic acid molecules encoding gene modification polypeptides, or both) are circRNAs. In some embodiments, circular RNA molecules encode gene modification polypeptides. In some embodiments, circRNA molecules encoding gene modification polypeptides are delivered to host cells. In some embodiments, circular RNA molecules encode recombinases, such as described herein. In some embodiments, circRNA molecules encoding recombinases are delivered to host cells. In some embodiments, circRNA molecules encoding gene modification polypeptides are linearized (e.g., in host cells, e.g., in the nucleus of host cells) before translation.

已发现环状RNA(circRNA)天然存在于细胞中，并且已发现其具有不同的功能，包括在人细胞中的非编码和蛋白编码作用。已显示，可以通过将自剪接内含子掺入RNA分子(或编码RNA分子的DNA)，导致RNA环化来工程改造circRNA，并且工程改造的circRNA可以具有增强的蛋白质产生和稳定性(Wesselhoeft等人Nature Communications[自然通讯]2018)。在一些实施例中，基因修饰多肽由circRNA编码。在某些实施例中，模板核酸是DNA，例如dsDNA或ssDNA。在某些实施例中，circDNA包含模板RNA。It has been found that circular RNA (circRNA) is naturally present in cells, and it has been found that it has different functions, including non-coding and protein coding in human cells. It has been shown that circRNA can be engineered by incorporating self-splicing introns into RNA molecules (or DNA encoding RNA molecules), resulting in RNA cyclization, and the engineered circRNA can have enhanced protein production and stability (Wesselhoeft et al. Nature Communications [Nature Communications] 2018). In some embodiments, the gene-modified polypeptide is encoded by circRNA. In certain embodiments, the template nucleic acid is DNA, such as dsDNA or ssDNA. In certain embodiments, circDNA comprises template RNA.

在一些实施例中，circRNA包含一个或多个核酶序列。在一些实施例中，核酶序列被激活用于例如在宿主细胞中自切割，例如，从而导致circRNA的线性化。在一些实施例中，当镁浓度达到足以切割的水平时，核酶被激活，例如在宿主细胞中。在一些实施例中，在递送至宿主细胞之前，circRNA保持在低镁环境中。在一些实施例中，核酶是蛋白质反应性核酶。在一些实施例中，核酶是核酸反应性核酶。在一些实施例中，circRNA包含切割位点。在一些实施例中，circRNA包含第二切割位点。In some embodiments, circRNA comprises one or more ribozyme sequences. In some embodiments, the ribozyme sequence is activated for, for example, self-cutting in a host cell, for example, thereby causing the linearization of circRNA. In some embodiments, when the magnesium concentration reaches a level sufficient for cutting, the ribozyme is activated, for example, in a host cell. In some embodiments, before being delivered to the host cell, the circRNA remains in a low magnesium environment. In some embodiments, the ribozyme is a protein reactive ribozyme. In some embodiments, the ribozyme is a nucleic acid reactive ribozyme. In some embodiments, the circRNA comprises a cleavage site. In some embodiments, the circRNA comprises a second cleavage site.

在一些实施例中，circRNA在靶细胞的细胞核中被线性化。在一些实施例中，细胞的细胞核中circRNA的线性化涉及细胞的细胞核中存在的组分，例如以激活切割事件。在一些实施例中，对核元件(例如核蛋白，例如基因组相互作用蛋白，例如表观遗传修饰因子，例如EZH2)有反应的核酶(例如来自B2或ALU元件的核酶)掺入例如基因修饰系统的circRNA中。在一些实施例中，circRNA的核定位导致核酶的自催化活性增加和circRNA的线性化。In certain embodiments, circRNA is linearized in the nucleus of target cells. In certain embodiments, the linearization of circRNA in the nucleus of cells relates to the components present in the nucleus of cells, for example, to activate the cleavage event. In certain embodiments, a ribozyme (for example, a ribozyme from B2 or ALU elements) that reacts to a nuclear element (for example, a nucleoprotein, for example, a genome interacting protein, for example, an epigenetic modifier, for example, EZH2) is incorporated into the circRNA of a gene modification system. In certain embodiments, the nuclear localization of circRNA causes the autocatalytic activity of the ribozyme to increase and the linearization of the circRNA.

在一些实施例中，核酶与基因修饰系统的一个或多个其他组分是异源的。在一些实施例中，可诱导的核酶(例如，在本文所述的circRNA中)是合成产生的，例如，通过利用蛋白质配体反应性适体设计。已描述了利用烟草环斑病毒锤头状核酶的卫星RNA与MS2外壳蛋白适体的系统(Kennedy等人Nucleic Acids Res[核酸研究]42(19)：12306-12321(2014)，其通过援引以其全文并入本文)，其在MS2外壳蛋白的存在下导致核酶活性的激活。在实施例中，这样的系统对定位于细胞质或细胞核的蛋白质配体产生反应。在一些实施例中，蛋白质配体不是MS2。已经描述了用于产生靶标配体的RNA适体的方法，例如，基于通过指数富集的配体系统进化(SELEX)(Tuerk和Gold，Science[科学]249(4968)：505-510(1990)；Ellington和Szostak，Nature[自然]346(6287)：818-822(1990)；其中每个的方法通过援引并入本文)并且在一些情况下得到计算机设计的帮助(Bell等人PNAS[美国国家科学院院刊]117(15)：8486-8493，其方法通过援引并入本文)。因此，在一些实施例中，产生用于靶配体的适体并将其掺入合成核酶系统中，例如引发核酶介导的切割和circRNA线性化，例如在蛋白质配体存在下。在一些实施例中，circRNA线性化在细胞质中被引发，例如，使用与细胞质中的配体相关联的适体。在一些实施例中，circRNA线性化在细胞核中被引发，例如，使用与细胞核中的配体相关联的适体。在实施例中，细胞核中的配体包含表观遗传修饰因子或转录因子。在一些实施例中，引发线性化的配体以高于脱靶细胞的水平存在于中靶细胞中。In some embodiments, the ribozyme is heterologous to one or more other components of the gene modification system. In some embodiments, the inducible ribozyme (e.g., in a circRNA described herein) is synthetically produced, for example, by designing an aptamer that is reactive to a protein ligand. A system utilizing satellite RNA of a tobacco ringspot virus hammerhead ribozyme and an MS2 coat protein aptamer has been described (Kennedy et al. Nucleic Acids Res 42(19): 12306-12321 (2014), which is incorporated herein by reference in its entirety), which results in activation of ribozyme activity in the presence of MS2 coat protein. In embodiments, such a system responds to a protein ligand that is localized to the cytoplasm or nucleus. In some embodiments, the protein ligand is not MS2. Methods for generating RNA aptamers for target ligands have been described, for example, based on systematic evolution of ligands by exponential enrichment (SELEX) (Tuerk and Gold, Science 249(4968):505-510 (1990); Ellington and Szostak, Nature 346(6287):818-822 (1990); the methods of each of which are incorporated herein by reference) and in some cases aided by computer design (Bell et al. PNAS 117(15):8486-8493, the methods of which are incorporated herein by reference). Thus, in some embodiments, aptamers for target ligands are generated and incorporated into synthetic ribozyme systems, for example, to induce ribozyme-mediated cleavage and linearization of circRNAs, for example in the presence of a protein ligand. In some embodiments, circRNA linearization is initiated in the cytoplasm, for example, using an aptamer associated with a ligand in the cytoplasm. In some embodiments, circRNA linearization is initiated in the nucleus, for example, using an aptamer associated with a ligand in the nucleus. In embodiments, the ligand in the nucleus comprises an epigenetic modifier or a transcription factor. In some embodiments, the ligand that initiates linearization is present in the on-target cells at a higher level than in the off-target cells.

还预期核酸反应性核酶系统可用于circRNA线性化。例如，在例如Penchovsky(BiotechnologyAdvances[生物技术进展]32(5)：1015-1027(2014)，通过援引并入本文)中描述了感测确定的靶核酸分子以引发核酶激活的生物传感器。通过这些方法，核酶自然折叠成非活性状态，并且仅在存在确定的靶核酸分子(例如，RNA分子)的情况下才被激活。在一些实施例中，基因修饰系统的circRNA包含在确定的靶核酸(例如RNA，例如mRNA、miRNA、指导RNA、gRNA、sgRNA、ncRNA、lncRNA、tRNA、snRNA或mtRNA)存在下被激活的核酸反应性核酶。在一些实施例中，引发线性化的核酸以高于脱靶细胞的水平存在于中靶细胞中。It is also expected that nucleic acid-reactive ribozyme systems can be used for circRNA linearization. For example, in, for example, Penchovsky (Biotechnology Advances [Biological Technology Progress] 32 (5): 1015-1027 (2014), incorporated herein by reference), a biosensor that senses a determined target nucleic acid molecule to trigger ribozyme activation is described. By these methods, ribozymes are naturally folded into an inactive state and are activated only in the presence of a determined target nucleic acid molecule (e.g., an RNA molecule). In some embodiments, the circRNA of the gene modification system comprises a nucleic acid-reactive ribozyme that is activated in the presence of a determined target nucleic acid (e.g., RNA, such as mRNA, miRNA, guide RNA, gRNA, sgRNA, ncRNA, lncRNA, tRNA, snRNA, or mtRNA). In some embodiments, the nucleic acid that triggers linearization is present in the target cell at a level higher than that of the off-target cell.

在本文任一方面的一些实施例中，基因修饰系统掺入了一种或多种对目的靶组织或靶细胞具有可诱导特异性的核酶，例如，由目的靶组织或靶细胞中以较高水平存在的配体或核酸激活的核酶。在一些实施例中，基因修饰系统掺入对亚细胞区室(例如细胞核、核仁、细胞质或线粒体)具有可诱导特异性的核酶。在一些实施例中，核酶被以较高水平存在于靶亚细胞区室中的配体或核酸激活。在一些实施例中，基因修饰系统的RNA组分以circRNA的形式提供，例如其通过线性化激活。在一些实施例中，编码基因修饰多肽的circRNA的线性化激活了该分子进行翻译。在一些实施例中，激活基因修饰系统的circRNA组分的信号在中靶细胞或组织中以更高水平存在，例如，使得该系统在这些细胞中被特异性地激活。In some embodiments of any aspect of this invention, the gene modification system incorporates one or more ribozymes with inducible specificity to the target tissue or target cell, for example, ribozymes activated by ligands or nucleic acids present at higher levels in the target tissue or target cell. In some embodiments, the gene modification system incorporates ribozymes with inducible specificity to subcellular compartments (such as nuclei, nucleoli, cytoplasm or mitochondria). In some embodiments, ribozymes are activated by ligands or nucleic acids present at higher levels in the target subcellular compartment. In some embodiments, the RNA component of the gene modification system is provided in the form of circRNA, for example, it is activated by linearization. In some embodiments, the linearization of the circRNA encoding the gene modification polypeptide activates the molecule for translation. In some embodiments, the signal of the circRNA component activating the gene modification system exists at a higher level in the target cell or tissue, for example, so that the system is specifically activated in these cells.

在一些实施例中，基因修饰系统的RNA组分以circRNA的形式提供，例如其通过线性化灭活。在一些实施例中，编码基因修饰多肽的circRNA通过切割和降解而灭活。在一些实施例中，编码基因修饰多肽的circRNA通过将翻译信号与多肽的编码序列分离的切割而灭活。在一些实施例中，灭活基因修饰系统的circRNA组分的信号在脱靶细胞或组织中以更高水平存在，使得该系统在这些细胞中被特异性地灭活。In some embodiments, the RNA component of the gene modification system is provided in the form of circRNA, for example, it is inactivated by linearization. In some embodiments, the circRNA encoding the gene modification polypeptide is inactivated by cutting and degradation. In some embodiments, the circRNA encoding the gene modification polypeptide is inactivated by cutting that separates the translation signal from the coding sequence of the polypeptide. In some embodiments, the signal of the circRNA component of the inactivation gene modification system is present at a higher level in off-target cells or tissues, so that the system is specifically inactivated in these cells.

靶核酸位点Target nucleic acid site

在一些实施例中，在基因修饰之后，编辑序列周围的靶位点例如在少于约50％或10％的编辑事件中包含有限数量的插入或缺失，例如，如通过对靶位点的长读段扩增子测序所确定的，例如，如Karst等人(2020)bioRxiv doi.org/10.1101/645903(通过援引以其全文并入本文)中所述。在一些实施例中，靶位点不显示多个连续编辑事件，例如头对尾或头对头重复，例如，如通过靶位点的长读段扩增子测序确定的，例如，如Karst等人bioRxivdoi.org/10.1101/645903(2020)(其通过援引以其全文并入本文)中所述。在一些实施例中，靶位点包含对应于模板RNA的整合序列。在一些实施例中，靶位点在超过约1％或10％的事件中不包含由内源RNA产生的插入，例如，如通过靶位点的长读段扩增子测序所确定的，例如，如Karst等人bioRxiv doi.org/10.1101/645903(2020)(其通过援引以其全文并入本文)中所述。在一些实施例中，靶位点包含对应于模板RNA的整合序列。In some embodiments, after genetic modification, the target site surrounding the editing sequence contains a limited number of insertions or deletions, for example, in less than about 50% or 10% of the editing events, for example, as determined by long read amplicon sequencing of the target site, for example, as described in Karst et al. (2020) bioRxiv doi.org/10.1101/645903 (incorporated herein by reference in its entirety). In some embodiments, the target site does not show multiple consecutive editing events, such as head-to-tail or head-to-head duplications, for example, as determined by long read amplicon sequencing of the target site, for example, as described in Karst et al. bioRxivdoi.org/10.1101/645903 (2020) (incorporated herein by reference in its entirety). In some embodiments, the target site comprises an integration sequence corresponding to the template RNA. In some embodiments, the target site does not contain an insertion generated by an endogenous RNA in more than about 1% or 10% of the events, for example, as determined by long read amplicon sequencing of the target site, for example, as described in Karst et al. bioRxiv doi.org/10.1101/645903 (2020) (which is incorporated herein by reference in its entirety). In some embodiments, the target site comprises an integration sequence corresponding to a template RNA.

在本发明的某些方面，由基因修饰系统整合的宿主DNA结合位点可以在基因中、内含子中、外显子中、ORF中、在任何基因的编码区之外、在基因的调节区内、或在基因的调节区外。在其他方面，多肽可以结合一个或多于一个宿主DNA序列。In certain aspects of the invention, the host DNA binding site integrated by the gene modification system can be in a gene, in an intron, in an exon, in an ORF, outside the coding region of any gene, in the regulatory region of a gene, or outside the regulatory region of a gene. In other aspects, the polypeptide can be combined with one or more than one host DNA sequence.

在一些实施例中，基因修饰系统用于编辑多个等位基因中的靶基因座。在一些实施例中，基因修饰系统被设计用于编辑特定等位基因。例如，基因修饰多肽可以针对仅存在于一个等位基因上的特定序列，例如包含与靶等位基因(例如，gRNA或退火结构域)具有同源性的模板RNA，但不针对第二同源等位基因。在一些实施例中，基因修饰系统可以改变单倍型特异性等位基因。在一些实施例中，靶向特定等位基因的基因修饰系统优先靶向该等位基因，例如，对靶等位基因具有至少2、4、6、8或10倍的偏好。In some embodiments, the gene modification system is used to edit the target locus in multiple alleles. In some embodiments, the gene modification system is designed to edit a specific allele. For example, the gene modification polypeptide can be directed to a specific sequence present only on one allele, such as a template RNA containing homology with the target allele (e.g., gRNA or annealing domain), but not for a second homologous allele. In some embodiments, the gene modification system can change haplotype-specific alleles. In some embodiments, the gene modification system targeting a specific allele preferentially targets the allele, for example, the target allele has at least 2, 4, 6, 8 or 10 times of preference.

第二链切口Second strand cut

在一些实施例中，本文所述的基因修饰系统包括对第一链进行切口的切口酶活性(例如，在基因修饰多肽中)，和对靶DNA的第二链进行切口的切口酶活性(例如，在与基因修饰多肽分开的多肽中)。如本文所讨论的，不希望受理论束缚，对靶位点DNA的第一链进行切口被认为提供了可被RT结构域用于逆转录模板RNA的序列(例如，异源对象序列)的3′OH。不希望受理论束缚，认为向第二链引入另外的切口可能使细胞DNA修复机制偏向于比原始基因组序列更频繁地采用基于异源对象序列的序列。在一些实施例中，第二链的另外切口由与第一链的切口相同的核酸内切酶结构域(例如，切口酶结构域)产生。在一些实施例中，相同的基因修饰多肽既对第一链进行切口，又对第二链进行切口。在一些实施例中，基因修饰多肽包含CRISPR/Cas结构域并且第二链的另外切口由另外的核酸(例如包含引导CRISPR/Cas结构域对第二链进行切口的第二gRNA)引导。在其他实施例中，另外的第二链切口由与第一链的切口不同的核酸内切酶结构域(例如，切口酶结构域)产生。在一些实施例中，该不同的核酸内切酶结构域位于另外的多肽中(例如，本发明的系统进一步包含另外的多肽)，与基因修饰多肽分开。在一些实施例中，另外的多肽包含本文所述的核酸内切酶结构域(例如，切口酶结构域)。在一些实施例中，另外的多肽包含例如本文所述的DNA结合结构域。In some embodiments, the gene modification system described herein includes a nickase activity (e.g., in a gene modification polypeptide) that nicks the first strand, and a nickase activity (e.g., in a polypeptide separated from the gene modification polypeptide) that nicks the second strand of the target DNA. As discussed herein, without wishing to be bound by theory, nicking the first strand of the target site DNA is thought to provide a 3′OH of a sequence (e.g., a heterologous object sequence) that can be used by the RT domain for reverse transcription of the template RNA. Without wishing to be bound by theory, it is believed that introducing additional nicks to the second strand may bias the cellular DNA repair mechanism toward using sequences based on heterologous object sequences more frequently than the original genomic sequence. In some embodiments, the additional nicks of the second strand are produced by the same endonuclease domain (e.g., a nickase domain) as the nicks of the first strand. In some embodiments, the same gene modification polypeptide nicks both the first strand and the second strand. In some embodiments, the gene modification polypeptide comprises a CRISPR/Cas domain and the additional nicks of the second strand are guided by additional nucleic acids (e.g., a second gRNA that comprises a guide CRISPR/Cas domain to nick the second strand). In other embodiments, the other second strand nick is produced by an endonuclease domain (e.g., a nickase domain) different from the nick of the first strand. In certain embodiments, the different endonuclease domains are located in other polypeptides (e.g., the system of the present invention further comprises other polypeptides), separated from the genetically modified polypeptide. In certain embodiments, other polypeptides comprise endonuclease domains (e.g., a nickase domain) as described herein. In certain embodiments, other polypeptides comprise DNA binding domains such as described herein.

在本文中预期第二链切口相对于第一链切口出现的位置可影响以下一项或多项的程度：获得所期望的基因修饰DNA修饰，出现不期望的双链断裂(DSB)，出现不期望的插入，或出现不期望的缺失。不希望受理论束缚，第二链切口可能以两个总体取向发生：向内切口和向外切口。It is contemplated herein that the position at which the second strand nick occurs relative to the first strand nick may affect the extent to which one or more of the following are achieved: the desired genetic modification DNA modification is achieved, an undesirable double strand break (DSB) occurs, an undesirable insertion occurs, or an undesirable deletion occurs. Without wishing to be bound by theory, the second strand nick may occur in two general orientations: inward nicks and outward nicks.

在一些实施例中，在向内切口取向，RT结构域进行聚合(例如，使用模板RNA(例如，异源对象序列))远离第二链切口。在一些实施例中，在向内切口取向，第一链的切口的位置和第二链的切口的位置位于第一PAM位点和第二PAM位点之间(例如，在其中两个切口都由包含CRISPR/Cas结构域的多肽(例如，基因修饰多肽)产生的情况下)。当外侧有两个PAM且内侧有两个切口时，这种向内切口取向也可以称为“PAM-在外”。在一些实施例中，在向内切口取向，第一链的该切口的位置和第二链的切口的位置在多肽和另外的多肽与靶DNA结合的位点之间。在一些实施例中，在向内切口取向，第二链的切口位置位于多肽和另外的多肽的结合位点之间，并且第一链的切口也位于多肽和另外的多肽的结合位点之间。在一些实施例中，在向内切口取向，第一链的切口的位置和第二链的切口的位置位于PAM位点和距靶位点一定距离的第二多肽的结合位点之间。In some embodiments, in the inward cut orientation, the RT domain polymerizes (e.g., using a template RNA (e.g., a heterologous subject sequence)) away from the second strand cut. In some embodiments, in the inward cut orientation, the position of the cut of the first strand and the position of the cut of the second strand are located between the first PAM site and the second PAM site (e.g., in the case where both cuts are produced by a polypeptide (e.g., a gene modifying polypeptide) comprising a CRISPR/Cas domain). When there are two PAMs on the outside and two cuts on the inside, this inward cut orientation can also be referred to as "PAM-outside". In some embodiments, in the inward cut orientation, the position of the cut of the first strand and the position of the cut of the second strand are between the site where the polypeptide and the additional polypeptide bind to the target DNA. In some embodiments, in the inward cut orientation, the position of the cut of the second strand is located between the binding site of the polypeptide and the additional polypeptide, and the cut of the first strand is also located between the binding site of the polypeptide and the additional polypeptide. In some embodiments, in the inward cut orientation, the position of the cut of the first strand and the position of the cut of the second strand are located between the PAM site and the binding site of the second polypeptide at a distance from the target site.

提供向内切口取向的基因修饰系统的实例包括：包含CRISPR/Cas结构域的基因修饰多肽、包含引导对第一链上的靶位点DNA进行切口的gRNA的模板RNA和另外的核酸(其包含在距离第一切口位置一定距离的位点处引导切口的另外的gRNA)，其中第一切口的位置和第二切口的位置在两个gRNA引导基因修饰多肽所至的位点的PAM位点之间。作为另一个实例，提供向内切口取向的另一种基因修饰系统包含含有锌指分子和第一切口酶结构域的基因修饰多肽，其中锌指分子以引导第一切口酶结构域对靶位点的第一链进行切口的方式结合靶DNA；包含CRISPR/Cas结构域的另外的多肽，和包含gRNA的另外的核酸，该gRNA引导另外的多肽在第二链上与靶位点DNA相距一段距离的位点进行切口，其中第一切口的位置和第二切口的位置位于PAM位点和锌指分子结合的位点之间。作为另一个实例，提供向内切口取向的另一种基因修饰系统包含含有锌指分子和第一切口酶结构域的基因修饰多肽，其中锌指分子以引导第一切口酶结构域对靶位点的第一链进行切口的方式结合靶DNA；包含TAL效应子分子和第二切口酶结构域的另外的多肽，其中TAL效应子分子以引导另外的多肽对第二链进行切口的方式结合到距靶位点一定距离的位点，其中第一切口的位置和第二个切口的位置在TAL效应子分子结合的位点和锌指分子结合的位点之间。Examples of gene modification systems that provide an inward incision orientation include: a gene modification polypeptide comprising a CRISPR/Cas domain, a template RNA comprising a gRNA that guides incision of a target site DNA on a first strand, and an additional nucleic acid (which comprises an additional gRNA that guides incision at a site at a distance from the first incision site), wherein the location of the first incision and the location of the second incision are between the PAM sites of the sites to which the two gRNAs guide the gene modification polypeptides. As another example, another gene modification system that provides an inward incision orientation comprises a gene modification polypeptide comprising a zinc finger molecule and a first nickase domain, wherein the zinc finger molecule binds to the target DNA in a manner that guides the first nickase domain to incision the first strand of the target site; an additional polypeptide comprising a CRISPR/Cas domain, and an additional nucleic acid comprising a gRNA that guides the additional polypeptide to incision at a site on the second strand that is a distance from the target site DNA, wherein the location of the first incision and the location of the second incision are located between the PAM site and the site to which the zinc finger molecule binds. As another example, another gene modification system providing an inward incision orientation comprises a gene modification polypeptide comprising a zinc finger molecule and a first nickase domain, wherein the zinc finger molecule binds to the target DNA in a manner that directs the first nickase domain to incise the first chain of the target site; and another polypeptide comprising a TAL effector molecule and a second nickase domain, wherein the TAL effector molecule binds to a site a certain distance from the target site in a manner that directs the other polypeptide to incise the second chain, wherein the position of the first incision and the position of the second incision are between the site to which the TAL effector molecule binds and the site to which the zinc finger molecule binds.

在一些实施例中，在向外切口取向，RT结构域进行聚合(例如，使用模板RNA(例如，异源对象序列))朝向第二链切口。在一些实施例中，在向外切口取向，当第一和第二切口均由包含CRISPR/Cas结构域的多肽(例如，基因修饰多肽)产生时，第一PAM位点和第二PAM位点位于第一链的切口的位置和第二链的切口的位置之间。当内侧有两个PAM且外侧有两个切口时，这种向外的切口取向也可以称为“PAM-在内”。在一些实施例中，在向外切口取向，多肽(例如，基因修饰多肽)和另外的多肽结合至靶DNA上位于第一链的切口的位置和第二链的切口的位置之间的位点。在一些实施例中，在向外切口取向，第二链的切口的位置相对于第一链的切口的位置位于多肽和另外的多肽的结合位点的相对侧。在一些实施例中，在向外取向，PAM位点和距靶位点一定距离的第二多肽的结合位点位于第一链的切口的位置和第二链的切口的位置之间。In some embodiments, in the outward nick orientation, the RT domain polymerizes (e.g., using a template RNA (e.g., a heterologous subject sequence)) toward the second strand nick. In some embodiments, in the outward nick orientation, when both the first and second nicks are produced by a polypeptide (e.g., a gene modifying polypeptide) comprising a CRISPR/Cas domain, the first PAM site and the second PAM site are located between the position of the nick of the first strand and the position of the nick of the second strand. When there are two PAMs on the inside and two nicks on the outside, this outward nick orientation may also be referred to as "PAM-inside". In some embodiments, in the outward nick orientation, the polypeptide (e.g., a gene modifying polypeptide) and the additional polypeptide bind to a site on the target DNA between the position of the nick of the first strand and the position of the nick of the second strand. In some embodiments, in the outward nick orientation, the position of the nick of the second strand is located on the opposite side of the binding site of the polypeptide and the additional polypeptide relative to the position of the nick of the first strand. In some embodiments, in the outward orientation, the PAM site and the binding site of the second polypeptide at a distance from the target site are located between the position of the nick of the first strand and the position of the nick of the second strand.

提供向外切口取向的基因修饰系统的实例包括：包含CRISPR/Cas结构域的基因修饰多肽、包含引导对第一链上的靶位点DNA进行切口的gRNA的模板RNA和另外的核酸(其包含在距离第一切口位置一定距离的位点处引导切口的另外的gRNA)，其中第一切口的位置和第二切口的位置在两个gRNA引导基因修饰多肽所至的位点的PAM位点之外(即PAM位点位于第一切口的位置和第二切口的位置之间)。作为另一个实例，提供向外切口取向的另一种基因修饰系统包含含有锌指分子和第一切口酶结构域的基因修饰多肽，其中锌指分子以引导第一切口酶结构域对靶位点的第一链进行切口的方式结合靶DNA；包含CRISPR/Cas结构域的另外的多肽，和包含gRNA的另外的核酸，该gRNA引导另外的多肽在第二链上与靶位点DNA相距一段距离的位点进行切口，其中第一切口的位置和第二个切口在PAM位点和锌指分子结合的位点之外(即PAM位点和锌指分子结合的位点在第一切口的位置和第二切口的位置之间)。作为另一个实例，提供向外切口取向的另一种基因修饰系统包含含有锌指分子和第一切口酶结构域的基因修饰多肽，其中锌指分子以引导第一切口酶结构域对靶位点的第一链进行切口的方式结合靶DNA；包含TAL效应子分子和第二切口酶结构域的另外的多肽，其中TAL效应子分子以引导另外的多肽对第二链进行切口的方式结合到距靶位点一定距离的位点，其中第一切口的位置和第二切口的位置在TAL效应子分子结合的位点和锌指分子结合的位点之外(即TAL效应子分子结合的位点和锌指分子结合的位点在第一切口的位置和第二切口的位置之间)。Examples of gene modification systems that provide an outward nicking orientation include: a gene modification polypeptide comprising a CRISPR/Cas domain, a template RNA comprising a gRNA that guides nicking of a target site DNA on a first strand, and an additional nucleic acid (which comprises an additional gRNA that guides nicking at a site a certain distance from the first nicking site), wherein the position of the first nick and the position of the second nick are outside the PAM site of the site to which the two gRNAs guide the gene modification polypeptide (i.e., the PAM site is located between the position of the first nick and the position of the second nick). As another example, another gene modification system that provides an outward nicking orientation comprises a gene modification polypeptide comprising a zinc finger molecule and a first nickase domain, wherein the zinc finger molecule binds to the target DNA in a manner that guides the first nickase domain to nick the first strand of the target site; another polypeptide comprising a CRISPR/Cas domain, and another nucleic acid comprising a gRNA that guides the other polypeptide to nick a site on the second strand that is a distance away from the target site DNA, wherein the position of the first nick and the second nick are outside the PAM site and the site to which the zinc finger molecule binds (i.e., the site to which the PAM site and the zinc finger molecule binds are between the position of the first nick and the position of the second nick). As another example, another gene modification system with an outward nicking orientation is provided, which comprises a gene modification polypeptide containing a zinc finger molecule and a first nickase domain, wherein the zinc finger molecule binds to the target DNA in a manner that guides the first nickase domain to nick the first chain of the target site; and another polypeptide containing a TAL effector molecule and a second nickase domain, wherein the TAL effector molecule binds to a site at a certain distance from the target site in a manner that guides the other polypeptide to nick the second chain, wherein the position of the first nick and the position of the second nick are outside the site to which the TAL effector molecule binds and the site to which the zinc finger molecule binds (i.e., the site to which the TAL effector molecule binds and the site to which the zinc finger molecule binds are between the position of the first nick and the position of the second nick).

不希望受理论束缚，认为对于提供第二链切口的基因修饰系统，在一些实施例中优选向外切口取向。如本文所述，与向外切口取向相比，向内切口可产生更多数量的双链断裂(DSB)。DSB可以被细胞核中的DSB修复途径识别，这可能导致不期望的插入和缺失。向外切口取向可提供降低的DSB形成风险，和相应更少量的不期望的插入和缺失。在一些实施例中，不期望的插入和缺失是不由异源对象序列编码的插入和缺失，例如由与异源对象序列编码的修饰无关的双链断裂修复途径产生的插入或缺失。在一些实施例中，所期望的基因修饰包含对由异源对象序列编码(例如，以及通过基因修饰将异源对象序列写入靶位点来实现)的靶DNA的改变(例如，取代、插入或缺失)。在一些实施例中，第一链切口和第二链切口处于向外取向。Without wishing to be bound by theory, it is believed that for a genetic modification system that provides a second strand nick, an outward nick orientation is preferred in some embodiments. As described herein, an inward nick can produce a greater number of double-strand breaks (DSBs) compared to an outward nick orientation. DSBs can be recognized by the DSB repair pathway in the nucleus, which may result in undesirable insertions and deletions. An outward nick orientation can provide a reduced risk of DSB formation, and a correspondingly smaller amount of undesirable insertions and deletions. In some embodiments, undesirable insertions and deletions are insertions and deletions that are not encoded by a heterologous object sequence, such as insertions or deletions produced by a double-strand break repair pathway that is unrelated to the modification encoded by the heterologous object sequence. In some embodiments, the desired genetic modification comprises a change (e.g., substitution, insertion, or deletion) to a target DNA encoded by a heterologous object sequence (e.g., and achieved by genetic modification by writing a heterologous object sequence into a target site). In some embodiments, the first strand nick and the second strand nick are in an outward orientation.

此外，第一链切口和第二链切口之间的距离可能影响以下一项或多项的程度：获得所期望的基因修饰系统DNA修饰，出现不期望的双链断裂(DSB)，出现不期望的插入，或出现不期望的缺失。不希望受理论束缚，认为第二链切口的益处，即DNA修复偏向于将异源对象序列掺入靶DNA中，随着第一链切口和第二链切口之间的距离减小而增加。然而，认为DSB形成风险也随着第一链切口和第二链切口之间的距离减小而增加。相应地，认为不期望的插入和/或缺失的数量可能随着第一链切口和第二链切口之间的距离减小而增加。在一些实施例中，选择第一链切口和第二链切口之间的距离以平衡偏向将异源对象序列掺入靶DNA中的DNA修复的益处和DSB形成和不希望的缺失和/或插入的风险。在一些实施例中，相对于第一切口和第二切口相隔小于阈值距离的在其他方面类似的向内切口取向系统，第一链切口和第二链切口相隔至少阈值距离的系统具有增加水平的所期望基因修饰系统修饰结果、降低水平的不期望的缺失和/或降低水平的不期望的插入。在一些实施例中，一个或多个阈值距离在下面给出。In addition, the distance between the first chain nick and the second chain nick may affect the extent of one or more of the following: obtaining the desired genetic modification system DNA modification, the occurrence of undesirable double-strand breaks (DSBs), the occurrence of undesirable insertions, or the occurrence of undesirable deletions. Without wishing to be bound by theory, it is believed that the benefit of the second chain nick, that is, DNA repair is biased towards incorporating heterologous object sequences into the target DNA, increases as the distance between the first chain nick and the second chain nick decreases. However, it is believed that the risk of DSB formation also increases as the distance between the first chain nick and the second chain nick decreases. Accordingly, it is believed that the number of undesirable insertions and/or deletions may increase as the distance between the first chain nick and the second chain nick decreases. In some embodiments, the distance between the first chain nick and the second chain nick is selected to balance the benefit of DNA repair biased towards incorporating heterologous object sequences into the target DNA and the risk of DSB formation and undesirable deletions and/or insertions. In some embodiments, a system where the first strand nick and the second strand nick are separated by at least a threshold distance has an increased level of desired gene modification system modification results, a reduced level of undesired deletions, and/or a reduced level of undesired insertions relative to an otherwise similar inward nick orientation system where the first nick and the second nick are separated by less than a threshold distance. In some embodiments, one or more threshold distances are given below.

在一些实施例中，第一切口和第二切口相隔至少20、25、30、35、40、45、50、55、60、65、70、75、80、85、90、95、100、110、120、130、140、150、160、170、180、190或200个核苷酸。在一些实施例中，第一切口和第二切口相隔不超过25、30、35、40、45、50、55、60、65、70，75、80、85、90、95、100、110、120、130、140、150、160、170、180、190、200或250个核苷酸。在一些实施例中，第一切口和第二切口相隔20-200、30-200、40-200、50-200、60-200、70-200、80-200、90-200、100-200、110-200、120-200、130-200、140-200、150-200、160-200、170-200、180-200、190-200、20-190、30-190、40-190、50-190、60-190、70-190、80-190、90-190、100-190、110-190、120-190、130-190、140-190、150-190、160-190、170-190、180-190、20-180、30-180、40-180、50-180、60-180、70-180、80-180、90-180、100-180、110-180、120-180、130-180、140-180、150-180、160-180、170-180、20-170、30-170、40-170、50-170、60-170、70-170、80-170、90-170、100-170、110-170、120-170、130-170、140-170、150-170、160-170、20-160、30-160、40-160、50-160、60-160、70-160、80-160、90-160、100-160、110-160、120-160、130-160、140-160、150-160、20-150、30-150、40-150、50-150、60-150、70-150、80-150、90-150、100-150、110-150、120-150、130-150、140-150、20-140、30-140、40-140、50-140、60-140、70-140、80-140、90-140、100-140、110-140、120-140、130-140、20-130、30-130、40-130、50-130、60-130、70-130、80-130、90-130、100-130、110-130、120-130、20-120、30-120、40-120、50-120、60-120、70-120、80-120、90-120、100-120、110-120、20-110、30-110、40-110、50-110、60-110、70-110、80-110、90-110、100-110、20-100、30-100、40-100、50-100、60-100、70-100、80-100、90-100、20-90、30-90、40-90、50-90、60-90、70-90、80-90、20-80、30-80、40-80、50-80、60-80、70-80、20-70、30-70、40-70、50-70、60-70、20-60、30-60、40-60、50-60、20-50、30-50、40-50、20-40、30-40或20-30个核苷酸。在一些实施例中，第一切口和第二切口相隔40-100个核苷酸。In some embodiments, the first nick and the second nick are separated by at least 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides. In some embodiments, the first nick and the second nick are separated by no more than 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, or 250 nucleotides. In some embodiments, the first incision and the second incision are 20-200, 30-200, 40-200, 50-200, 60-200, 70-200, 80-200, 90-200, 100-200, 110-200, 120-200, 130-200, 140-200, 150-200, 160-200, 170-200, 180-200, 190-200, 1 80-190, 20-180, 30-180, 40-180, 50-180, 60-180, 70-180, 80-180, 90-180, 100-180, 110-180, 120-180, 130-180, 140-180, 150-180, 160-180, 170-180, 20-170, 30-170, 40-170, 50-170, 6 0-170, 70-170, 80-170, 90-170, 100-170, 110-170, 120-170, 130-170, 140-170, 150-170, 160-170, 20-160, 30-160, 40-160, 50-160, 60-160, 70-160, 80-160, 90-160, 100-160, 110-160, 12 0-160, 130-160, 140-160, 150-160, 20-1 50,30-150,40-150,50-150,60-150,70-150,80-150,90-150,100-150,110-150,120-150,130-150,140-150,20-140,30-140,40-140,50-140,6 0-140, 70-140, 80-140, 90-140, 100-140, 110-140, 120-140, 130-140, 20-130, 30-130, 40-130, 50-130, 60-130, 70-130, 80-130, 90-130, 100-130, 110-130, 120-130, 20-120, 30-120, 40- 120, 50-120, 60-120, 70-120, 80-120, 90- 120, 100-120, 110-120, 20-110, 30-110, 40-110, 50-110, 60-110, 70-110, 80-110, 90-110, 100-110, 20-100, 30-100, 40-100, 50-100, 60-100, 70 -100, 80-100, 90-100, 20-90, 30-90, 40-9 In some embodiments, the first nick and the second nick are separated by 40-100 nucleotides.

不希望受理论束缚，认为对于提供第二链切口并选择向内切口取向的基因修饰系统，增加第一链切口和第二链切口之间的距离可以是优选的。如本文所述，向内切口取向可以比向外切口取向产生更多数量的DSB，并且可以导致比向外切口取向更多量的不期望的插入和缺失，但是增加切口之间的距离可以减轻DSB、不期望的缺失和/或不期望的插入的这种增加。在一些实施例中，相对于第一切口和第二切口相隔小于阈值距离的在其他方面类似的向内切口取向系统，其中第一切口和第二切口相隔至少阈值距离的向内切口取向具有增加水平的所期望基因修饰系统修饰结果、减少水平的不期望的缺失和/或减少水平的不期望的插入。在一些实施例中，阈值距离在下面给出。Do not wish to be bound by theory, think that for providing the second chain nick and selecting the gene modification system of inward nick orientation, increasing the distance between the first chain nick and the second chain nick can be preferred.As described herein, inward nick orientation can produce more number of DSBs than outward nick orientation, and can cause more undesirable insertion and deletion than outward nick orientation, but increase the distance between the nick and can alleviate this increase of DSB, undesirable deletion and/or undesirable insertion.In certain embodiments, relative to the first nick and the second nick spaced apart less than the threshold distance in other aspects similar inward nick orientation system, wherein the first nick and the second nick spaced apart at least the threshold distance inward nick orientation has the desired gene modification system modification result of increase level, the undesirable deletion of reduction level and/or the undesirable insertion of reduction level.In certain embodiments, the threshold distance is provided below.

在一些实施例中，第一链切口和第二链切口处于向内取向。在一些实施例中，第一链切口和第二链切口处于向内取向，并且第一链切口和第二链切口相隔至少100、110、120、130、140、150、160、170、180、190、200、220、240、260、280、300、350、400、450或500个核苷酸，例如至少100个核苷酸，(并且任选地相隔不超过500、400、300、200、190、180、170，160、150、140、130或120个核苷酸)。在一些实施例中，第一链切口和第二链切口处于向内取向，并且第一链切口和第二链切口相隔100-200、110-200、120-200、130-200、140-200、150-200、160-200、170-200、180-200、190-200、100-190、110-190、120-190、130-190、140-190、150-190、160-190、170-190、180-190、100-180、110-180、120-180、130-180、140-180、150-180、160-180、170-180、100-170、110-170、120-170、130-170、140-170、150-170、160-170、100-160、110-160、120-160、130-160、140-160、150-160、100-150、110-150、120-150、130-150、140-150、100-140、110-140、120-140、130-140、100-130、110-130、120-130、100-120、110-120或100-110个核苷酸。In some embodiments, the first strand nick and the second strand nick are in an inward orientation. In some embodiments, the first strand nick and the second strand nick are in an inward orientation, and the first strand nick and the second strand nick are at least 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 220, 240, 260, 280, 300, 350, 400, 450, or 500 nucleotides apart, such as at least 100 nucleotides, (and optionally no more than 500, 400, 300, 200, 190, 180, 170, 160, 150, 140, 130, or 120 nucleotides apart). In some embodiments, the first and second chain cutouts are in an inward orientation and the first and second chain cutouts are 100-200, 110-200, 120-200, 130-200, 140-200, 150-200, 160-200, 170-200, 180-200, 190-200, 100-190, 110-190, 120-190, 130-190, 140-190, 150-190, 160-190, 170-190, 180-190, 100-180, 110-180, 120-180, 130-180, 140-180, 150-180, 1 150-160, 100-150, 110-150, 120-150, 130-150, 140-150, 100-140, 110-140, 120-140, 130-140, 100-130, 110-130, 120-130, 100-120, 110-120, or 100-110 nucleotides.

化学修饰的核酸和核酸末端特征Chemically modified nucleic acids and nucleic acid termini characteristics

本文所述的核酸(例如模板核酸，例如模板RNA；或编码基因修饰多肽的核酸(例如，mRNA)；或gRNA)可以包含未修饰或经修饰的核碱基。天然存在的RNA从四种基本核糖核苷酸合成：ATP、CTP、UTP和GTP，但可以含有转录后修饰的核苷酸。此外，已经在RNA中鉴定了大约一百种不同的核苷修饰(Rozenski，J，Crain，P，和McCloskey，J.(1999).The RNAModification Database：1999 update.[RNA修饰数据库：1999年更新]Nucl Acids Res[核酸研究]27：196-197)。RNA还可包含自然界中不存在的完全合成核苷酸。The nucleic acids described herein (e.g., template nucleic acids, such as template RNA; or nucleic acids encoding gene-modified polypeptides (e.g., mRNA); or gRNA) may comprise unmodified or modified nucleobases. Naturally occurring RNA is synthesized from four basic ribonucleotides: ATP, CTP, UTP, and GTP, but may contain nucleotides that are modified post-transcriptionally. In addition, approximately one hundred different nucleoside modifications have been identified in RNA (Rozenski, J, Crain, P, and McCloskey, J. (1999). The RNA Modification Database: 1999 update. [RNA Modification Database: 1999 Update] Nucl Acids Res [Nucleic Acids Research] 27: 196-197). RNA may also comprise completely synthetic nucleotides that do not occur in nature.

在一些实施例中，化学修饰是在以下中提供的化学修饰：WO/2016/183482、美国专利公开号20090286852、国际申请号WO/2012/019168、WO/2012/045075、WO/2012/135805、WO/2012/158736、WO/2013/039857、WO/2013/039861、WO/2013/052523、WO/2013/090648、WO/2013/096709、WO/2013/101690、WO/2013/106496、WO/2013/130161、WO/2013/151669、WO/2013/151736、WO/2013/151672、WO/2013/151664、WO/2013/151665、WO/2013/151668、WO/2013/151671、WO/2013/151667、WO/2013/151670、WO/2013/151666、WO/2013/151663、WO/2014/028429、WO/2014/081507、WO/2014/093924、WO/2014/093574、WO/2014/113089、WO/2014/144711、WO/2014/144767、WO/2014/144039、WO/2014/152540、WO/2014/152030、WO/2014/152031、WO/2014/152027、WO/2014/152211、WO/2014/158795、WO/2014/159813、WO/2014/164253、WO/2015/006747、WO/2015/034928、WO/2015/034925、WO/2015/038892、WO/2015/048744、WO/2015/051214、WO/2015/051173、WO/2015/051169、WO/2015/058069、WO/2015/085318、WO/2015/089511、WO/2015/105926、WO/2015/164674、WO/2015/196130、WO/2015/196128、WO/2015/196118、WO/2016/011226、WO/2016/011222、WO/2016/011306、WO/2016/014846、WO/2016/022914、WO/2016/036902、WO/2016/077125、或WO/2016/077123，其中每个通过援引以其全文并入本文。应当理解，将化学修饰的核苷酸掺入多核苷酸可以导致将修饰掺入核碱基、主链或二者，这取决于该修饰在核苷酸中的位置。在一些实施例中，主链修饰是EP 2813570中提供的修饰，将其通过援引以其全文并入本文。在一些实施例中，经修饰的帽是美国专利公开号20050287539(其通过援引以其全文并入本文)中提供的帽。In some embodiments, the chemical modification is a chemical modification provided in WO/2016/183482, U.S. Patent Publication No. 20090286852, International Application Nos. WO/2012/019168, WO/2012/045075, WO/2012/135805, WO/2012/158736, WO/2013/039857, WO/2013/039861, WO/2013/052523, WO/2013/090648, WO/2013/096709, WO/2013/101690, WO/2013/106496, WO/2013/130161, WO/2013/158736. 69. WO/2013/151736, WO/2013/151672, WO/2013/151664, WO/2013/151665, WO/2013/151668, WO/2013/151671, WO/2013/151667, WO/2013/151670, WO/2013 /151666、WO/2013/151663、WO/2014/028429、WO/2014/081507、WO/2014/093924、WO/2014/093574、WO/2014/113089、WO/2014/144711、WO/2014/144767 , WO/2014/144039, WO/2014/152540, WO/2014/152030, WO/2014/152031, WO/2014/152027, WO/2014/152211, WO/2014/158795, WO/2014/159813, WO/2014/16 W 6/014846, WO/2016/022914, WO/2016/036902, WO/2016/077125, or WO/2016/077123, each of which is incorporated herein by reference in its entirety. It should be understood that the incorporation of chemically modified nucleotides into a polynucleotide can result in the incorporation of modifications into the nucleobase, the backbone, or both, depending on the position of the modification in the nucleotide. In some embodiments, the backbone modification is a modification provided in EP 2813570, which is incorporated herein by reference in its entirety. In some embodiments, the modified cap is a cap provided in U.S. Patent Publication No. 20050287539 (which is incorporated herein by reference in its entirety).

在一些实施例中，化学修饰的核酸(例如，RNA，例如，mRNA)包含一种或多种ARCA：抗反向帽类似物(m27.3′-OGP3G)、GP3G(未甲基化帽类似物)、m7GP3G(单甲基化帽类似物)、m32.2.7GP3G(三甲基化帽类似物)、m5CTP(5′-甲基-胞苷三磷酸)、m6ATP(N6-甲基-腺苷-5′-三磷酸)、s2UTP(2-硫代-尿苷三磷酸)和Ψ(假尿苷三磷酸)。In some embodiments, the chemically modified nucleic acid (e.g., RNA, e.g., mRNA) comprises one or more ARCA: anti-reverse cap analog (m27.3′-OGP3G), GP3G (unmethylated cap analog), m7GP3G (monomethylated cap analog), m32.2.7GP3G (trimethylated cap analog), m5CTP (5′-methyl-cytidine triphosphate), m6ATP (N6-methyl-adenosine-5′-triphosphate), s2UTP (2-thio-uridine triphosphate), and Ψ (pseudouridine triphosphate).

在一些实施例中，化学修饰的核酸包含5′帽，例如：7-甲基鸟苷帽(例如，O-Me-m7G帽)；超甲基化帽类似物；NAD+来源的帽类似物(例如，如Kiledjian，Trends in CellBiology[细胞生物学趋势]28，454-464(2018)中所述)；或经修饰的，例如生物素化的帽类似物(例如，Bednarek等人，Phil Trans R Soc B[伦敦皇家学会哲学汇刊b辑-生物科学]373，20180167(2018)中所述)。In some embodiments, the chemically modified nucleic acid comprises a 5′ cap, such as: a 7-methylguanosine cap (e.g., an O-Me-m7G cap); a hypermethylated cap analog; an NAD+-derived cap analog (e.g., as described in Kiledjian, Trends in Cell Biology 28, 454-464 (2018)); or a modified, such as a biotinylated, cap analog (e.g., as described in Bednarek et al., Phil Trans R Soc B 373, 20180167 (2018)).

在一些实施例中，化学修饰的核酸包含选自以下中的一种或多种的3′特征：聚A尾；16个核苷酸长的茎环结构，其两侧为未配对的5个核苷酸(例如，Mannironi等人，Nucleic Acid Research[核酸研究]17，9113-9126(1989)中所述)；三螺旋结构(例如，Brown等人，PNAS[美国国家科学院院刊]109，19202-19207(2012)所述)；tRNA、Y RNA或穹窿RNA结构(例如，如Labno等人，Biochemica et Biophysica Acta[生物化学和生物物理学报]1863，3125-3147(2016)所述)；掺入一个或多个脱氧核糖核苷酸三磷酸(dNTP)、2′O-甲基化NTP或硫代磷酸酯-NTP；单核苷酸化学修饰(例如，将3′末端核糖氧化为反应性醛，然后缀合醛反应性修饰的核苷酸)；或化学连接到另一个核酸分子。In some embodiments, the chemically modified nucleic acid comprises a 3′ feature selected from one or more of the following: a poly A tail; a 16-nucleotide long stem-loop structure flanked by 5 unpaired nucleotides (e.g., as described in Mannironi et al., Nucleic Acid Research 17, 9113-9126 (1989)); a triple helix structure (e.g., as described in Brown et al., PNAS 109, 19202-19207 (2012)); a tRNA, Y RNA, or vault RNA structure (e.g., as described in Labno et al., Biochemica et Biophysica Acta [Biochimica et Biophysica Sinica] 1863, 3125-3147 (2016)); incorporation of one or more deoxyribonucleotide triphosphates (dNTPs), 2′O-methylated NTPs, or phosphorothioate-NTPs; chemical modification of single nucleotides (e.g., oxidation of the 3′ terminal ribose to a reactive aldehyde followed by conjugation of an aldehyde-reactive modified nucleotide); or chemical linkage to another nucleic acid molecule.

在一些实施例中，核酸(例如，模板核酸)包含一个或多个经修饰的核苷酸，例如选自二氢尿苷、肌苷、7-甲基鸟苷、5-甲基胞苷(5mC)、5′磷酸核糖胸核苷、2′-O-甲基核糖胸核苷、2′-O-乙基核糖胸核苷、2′-氟核糖胸核苷、C-5丙炔基-脱氧胞苷(pdC)、C-5丙炔基-脱氧尿苷(pdU)、C-5丙炔基胞苷(pC)、C-5丙炔基尿苷(pU)、5-甲基胞苷、5-甲基尿苷、5-甲基脱氧胞苷、5-甲基脱氧尿苷甲氧基、2，6-二氨基嘌呤、5′-二甲氧基三苯甲基-N4-乙基-2′-脱氧胞苷、C-5丙炔基-f-胞苷(pfC)、C-5丙炔基-f-尿苷(pfU)、5-甲基f-胞苷、5-甲基f-尿苷、C-5丙炔基-m-胞苷(pmC)、C-5丙炔基-f-尿苷(pmU)、5-甲基m-胞苷、5-甲基m-尿苷、LNA(锁核酸)、MGB(小沟结合剂)假尿苷(Ψ)、1-N-甲基假尿苷(1-Me-Ψ)、或5-甲氧基尿苷(5-MO-U)。In some embodiments, the nucleic acid (e.g., template nucleic acid) comprises one or more modified nucleotides, for example, selected from dihydrouridine, inosine, 7-methylguanosine, 5-methylcytidine (5mC), 5′-phosphoribosylthymidine, 2′-O-methylribothymidine, 2′-O-ethylribothymidine, 2′-fluororibothymidine, C-5 propynyl-deoxycytidine (pdC), C-5 propynyl-deoxyuridine (pdU), C-5 propynylcytidine (pC), C-5 propynyluridine (pU), 5-methylcytidine, 5-methyluridine, 5-methyldeoxycytidine, 5-methyldeoxyur ...cytidine, 5-methyldeoxyuridine, 5-methyldeoxyuridine, 5-methyldeoxyuridine, 5-methyldeoxyuridine, 5-methyldeoxyuridine, 5-methyldeoxycytidine, 5-methyldeoxyuridine, 5-methyldeoxyuridine, 5-methyldeoxyuridine, 5-methyldeoxyuridine, 5-methyldeoxycytidine, 5-methyldeoxyuridine, 5-methyldeoxyuridine, 5-methyldeoxyuridine, 5-methyldeoxycytidine Oxy, 2,6-diaminopurine, 5′-dimethoxytrityl-N4-ethyl-2′-deoxycytidine, C-5 propynyl-f-cytidine (pfC), C-5 propynyl-f-uridine (pfU), 5-methyl f-cytidine, 5-methyl f-uridine, C-5 propynyl-m-cytidine (pmC), C-5 propynyl-f-uridine (pmU), 5-methyl m-cytidine, 5-methyl m-uridine, LNA (locked nucleic acid), MGB (minor groove binder) pseudouridine (Ψ), 1-N-methyl pseudouridine (1-Me-Ψ), or 5-methoxyuridine (5-MO-U).

在一些实施例中，核酸包含主链修饰，例如对主链中的糖或磷酸基团的修饰。在一些实施例中，核酸包含核碱基修饰。In some embodiments, the nucleic acid comprises a backbone modification, such as a modification to a sugar or phosphate group in the backbone. In some embodiments, the nucleic acid comprises a nucleobase modification.

在一些实施例中，核酸包含表13的一个或多个化学修饰的核苷酸、表14的一个或多个化学主链修饰、表15的一个或多个化学修饰的帽。例如，在一些实施例中，核酸包含两个或更多个(例如，3、4、5、6、7、8、9或10或更多个)不同类型的化学修饰。例如，核酸可以包含例如如本文所述的例如在表13中的两个或更多个(例如，3、4、5、6、7、8、9或10或更多个)不同类型的修饰的核碱基。可替代地或组合地，核酸可以包含例如如本文所述的例如在表14中的两个或更多个(例如，3、4、5、6、7、8、9或10或更多个)不同类型的主链修饰。可替代地或组合地，核酸可包含一个或多个经修饰的帽，例如，如本文所述，例如表15中所述。例如，在一些实施例中，核酸包含一种或多种类型的经修饰的核碱基和一种或多种类型的主链修饰；一种或多种类型的经修饰的核碱基和一个或多个经修饰的帽；一种或多种类型的经修饰的帽和一种或多种类型的主链修饰；或一种或多种类型的经修饰的核碱基、一种或多种类型的主链修饰和一种或多种类型的经修饰的帽。In some embodiments, the nucleic acid comprises one or more chemically modified nucleotides of Table 13, one or more chemical backbone modifications of Table 14, and one or more chemically modified caps of Table 15. For example, in some embodiments, the nucleic acid comprises two or more (e.g., 3, 4, 5, 6, 7, 8, 9, or 10 or more) different types of chemical modifications. For example, the nucleic acid can comprise two or more (e.g., 3, 4, 5, 6, 7, 8, 9, or 10 or more) different types of modified nucleobases, e.g., as described herein, e.g., in Table 13. Alternatively or in combination, the nucleic acid can comprise two or more (e.g., 3, 4, 5, 6, 7, 8, 9, or 10 or more) different types of backbone modifications, e.g., as described herein, e.g., in Table 14. Alternatively or in combination, the nucleic acid can comprise one or more modified caps, e.g., as described herein, e.g., as described in Table 15. For example, in some embodiments, the nucleic acid comprises one or more types of modified nucleobases and one or more types of backbone modifications; one or more types of modified nucleobases and one or more modified caps; one or more types of modified caps and one or more types of backbone modifications; or one or more types of modified nucleobases, one or more types of backbone modifications, and one or more types of modified caps.

在一些实施例中，核酸包含一个或多个(例如，2、3、4、5、6、7、8、9、10、20、30、40、50、60、70、80、90、100、150、200、250、300、350、400、450、500、600、700、800、900、1000、或更多个)经修饰的核碱基。在一些实施例中，核酸的所有核碱基都被修饰。在一些实施例中，在主链中的一个或多个(例如，2、3、4、5、6、7、8、9、10、20、30、40、50、60、70、80、90、100、150、200、250、300、350、400、450、500、600、700、800、900、1000或更多个)位置修饰核酸。在一些实施例中，核酸的所有主链位置都被修饰。In some embodiments, the nucleic acid comprises one or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, or more) modified nucleobases. In some embodiments, all nucleobases of the nucleic acid are modified. In some embodiments, the nucleic acid is modified at one or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, or more) positions in the backbone. In some embodiments, all backbone positions of the nucleic acid are modified.

表13.修饰的核苷酸Table 13. Modified nucleotides

表14.主链修饰Table 14. Main chain modifications

表15.经修饰的帽Table 15. Modified caps

构成基因修饰系统的模板的核苷酸可以是天然碱基或经修饰的碱基，或其组合。例如，模板可以包含假尿苷、二氢尿苷、肌苷、7-甲基鸟苷或其他经修饰的碱基。在一些实施例中，模板可以包含锁核酸核苷酸。在一些实施例中，模板中使用的经修饰的碱基不抑制模板的逆转录。在一些实施例中，模板中使用的经修饰的碱基可以提高逆转录，例如特异性或保真度。The nucleotides constituting the template of the gene modification system can be natural bases or modified bases, or a combination thereof. For example, the template can include pseudouridine, dihydrouridine, inosine, 7-methylguanosine or other modified bases. In some embodiments, the template can include locked nucleic acid nucleotides. In some embodiments, the modified bases used in the template do not inhibit reverse transcription of the template. In some embodiments, the modified bases used in the template can improve reverse transcription, such as specificity or fidelity.

在一些实施例中，系统的RNA组分(例如，模板RNA或gRNA)包含一个或多个核苷酸修饰。在一些实施例中，与未修饰或末端修饰的指导物相比，gRNA的修饰模式可显著影响体内活性(例如，如以下所示：来自Finn等人Cell Rep[细胞报道]22(9)：2227-2235(2018)的图1D；其通过援引以其全文并入本文)。不希望受理论束缚，该过程可能至少部分归因于修饰赋予的RNA稳定性。这种修饰的非限制性实例可以包括2′-O-甲基(2′-O-Me)、2′-0-(2-甲氧基乙基)(2′-0-MOE)、2′-氟(2′-F)、核苷酸之间的硫代磷酸酯(PS)键、G-C取代以及核苷酸及其等价物之间的反向无碱基连接。In some embodiments, the RNA component of the system (e.g., template RNA or gRNA) comprises one or more nucleotide modifications. In some embodiments, the modification pattern of gRNA can significantly affect in vivo activity compared to unmodified or terminally modified guides (e.g., as shown below: Figure 1D from Finn et al. Cell Rep [Cell Report] 22 (9): 2227-2235 (2018); which is incorporated herein by reference in its entirety). Without wishing to be bound by theory, the process may be at least partially due to the RNA stability conferred by the modification. Non-limiting examples of such modifications may include 2′-O-methyl (2′-O-Me), 2′-0-(2-methoxyethyl) (2′-0-MOE), 2′-fluoro (2′-F), phosphorothioate (PS) bonds between nucleotides, G-C substitutions, and reverse abasic connections between nucleotides and their equivalents.

在一些实施例中，模板RNA(例如，在其结合靶位点的部分)或指导RNA包含5′末端区域。在一些实施例中，模板RNA或指导RNA不包含5′末端区域。在一些实施例中，5′末端区域包含gRNA间隔子区，例如，如Briner AE等人，Molecular Cell[分子细胞]56：333-339(2014)中关于sgRNA所描述的(通过援引以其全文并入本文；适用于本文，例如，对于所有指导RNA)。在一些实施例中，5′末端区域包含5′端修饰。在一些实施例中，具有或不具有间隔子区的5′末端区域可以与crRNA、trRNA、sgRNA和/或dgRNA相关联。在一些情况下，gRNA间隔子区可以包含指导区、指导结构域或靶向结构域。In some embodiments, the template RNA (e.g., at the portion where it binds the target site) or guide RNA comprises a 5' terminal region. In some embodiments, the template RNA or guide RNA does not comprise a 5' terminal region. In some embodiments, the 5' terminal region comprises a gRNA spacer region, for example, as described in Briner AE et al., Molecular Cell [Molecular Cell] 56: 333-339 (2014) for sgRNA (incorporated herein by reference in its entirety; applicable herein, for example, for all guide RNAs). In some embodiments, the 5' terminal region comprises a 5' end modification. In some embodiments, a 5' terminal region with or without a spacer region may be associated with crRNA, trRNA, sgRNA, and/or dgRNA. In some cases, the gRNA spacer region may comprise a guide region, a guide domain, or a targeting domain.

在一些实施例中，本文所述的模板RNA(例如，在其结合靶位点的部分处)或指导RNA包含WO 2018107028 A1(通过援引以其全文并入本文)的表4中所示的任何序列。在一些实施例中，当序列显示指导区和/或间隔子区时，组合物可以包含或不包含该区域。在一些实施例中，指导RNA包含WO 2018107028 A1的表4中所示的任何序列(例如，如其中由SEQ IDNO表示)的一个或多个修饰。在实施例中，核苷酸可以相同或不同，和/或所示的修饰模式可以与WO 2018107028 A1的表4中所示的指导序列的修饰模式相同或相似。在一些实施例中，修饰模式包括gRNA或gRNA区域(例如5′末端区域、下部茎区、凸起区、上部茎区、连结区、发夹1区、发夹2区，3′末端区域)的修饰的相对位置和同一性。在一些实施例中，修饰模式包含WO 2018107028 A1的表4的序列栏中所示的任一序列的修饰和/或在该序列的一个或多个区域上的修饰的至少50％、55％、60％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％、或100％。在一些实施例中，修饰模式与WO 2018107028A1的表4的序列栏中所示的任一序列的修饰模式至少50％、55％、60％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％或100％相同。在一些实施例中，修饰模式在WO 2018107028A1的表4中所示序列的一个或多个区域(例如，在5’末端区域、下部茎区、凸起区、上部茎区、连结区、发夹1区、发夹2区，和/或3’末端区域)上至少50％、55％、60％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％、或100％相同。在一些实施例中，修饰模式与5′末端区域上序列的修饰模式至少50％、55％、60％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％、或100％相同。在一些实施例中，修饰模式在下部茎上至少50％、55％、60％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％、或100％相同。在一些实施例中，修饰模式在凸起上至少50％、55％、60％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％或100％相同。在一些实施例中，修饰模式在上部茎上至少50％、55％、60％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％、或100％相同。在一些实施例中，修饰模式在连结上至少50％、55％、60％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％、或100％相同。在一些实施例中，修饰模式在发夹1上至少50％、55％、60％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％或100％相同。在一些实施例中，修饰模式在发夹2上至少50％、55％、60％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％或100％相同。在一些实施例中，修饰模式在3′末端上至少50％、55％、60％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％、或100％相同。在一些实施例中，修饰模式与WO 2018107028A1的表4的序列或这样的序列的区域(例如5′末端、下部茎、凸起、上部茎、连结、发夹1、发夹2、3′末端)的修饰模式例如在0、1、2、3、4、5、6或更多个核苷酸处不同。在一些实施例中，gRNA包含修饰，这些修饰与WO 2018107028A1的表4的序列的修饰例如在0、1、2、3、4、5、6或更多个核苷酸处不同。在一些实施例中，gRNA包含修饰，这些修饰与WO2018107028A1的表4的序列的区域(例如，5′末端、下部茎、凸起、上部茎、连结、发夹1、发夹2、3′末端)的修饰例如在0、1、2、3、4、5、6或更多个核苷酸处不同。In some embodiments, the template RNA described herein (e.g., at the portion where it binds the target site) or guide RNA comprises any sequence shown in Table 4 of WO 2018107028 A1 (incorporated herein in its entirety by reference). In some embodiments, when the sequence shows a guide region and/or a spacer region, the composition may or may not include the region. In some embodiments, the guide RNA comprises one or more modifications of any sequence shown in Table 4 of WO 2018107028 A1 (e.g., as represented by SEQ IDNO therein). In an embodiment, the nucleotides may be the same or different, and/or the modification pattern shown may be the same or similar to the modification pattern of the guide sequence shown in Table 4 of WO 2018107028 A1. In some embodiments, the modification pattern includes the relative position and identity of the modification of the gRNA or gRNA region (e.g., 5' terminal region, lower stem region, bulge region, upper stem region, junction region, hairpin 1 region, hairpin 2 region, 3' terminal region). In some embodiments, the modification pattern comprises at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% of the modification of any of the sequences shown in the sequence column of Table 4 of WO 2018107028 A1 and/or the modification on one or more regions of the sequence. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical to the modification pattern of any of the sequences shown in the sequence column of Table 4 of WO 2018107028 A1. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to one or more regions of the sequence shown in Table 4 of WO 2018107028A1 (e.g., at the 5' terminal region, lower stem region, bulge region, upper stem region, junction region, hairpin 1 region, hairpin 2 region, and/or 3' terminal region). In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the modification pattern of the sequence at the 5' terminal region. In certain embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical on the lower stem. In certain embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical on the projection. In certain embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical on the upper stem. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical on the joint. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical on hairpin 1. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical on hairpin 2. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical at the 3' end. In some embodiments, the modification pattern is different from the modification pattern of the sequence of Table 4 of WO 2018107028A1 or the region of such sequence (e.g., 5' end, lower stem, bulge, upper stem, connection, hairpin 1, hairpin 2, 3' end), for example, at 0, 1, 2, 3, 4, 5, 6 or more nucleotides. In some embodiments, the gRNA comprises modifications, which are different from the modifications of the sequences of Table 4 of WO 2018107028A1, for example, at 0, 1, 2, 3, 4, 5, 6 or more nucleotides. In some embodiments, the gRNA comprises modifications that differ from modifications in a region of the sequence of Table 4 of WO2018107028A1 (e.g., 5′ end, lower stem, bulge, upper stem, junction, hairpin 1, hairpin 2, 3′ end), for example, at 0, 1, 2, 3, 4, 5, 6 or more nucleotides.

在一些实施例中，模板RNA(例如，在其结合靶位点的部分)或gRNA包含2′-O-甲基(2′-O-Me)修饰的核苷酸。在一些实施例中，gRNA包含2′-O-(2-甲氧基乙基)(2′-O-moe)修饰的核苷酸。在一些实施例中，gRNA包含2′-氟(2′-F)修饰的核苷酸。在一些实施例中，gRNA包含核苷酸之间的硫代磷酸酯(PS)键。在一些实施例中，gRNA包含5′端修饰、3′端修饰或5′和3′端修饰。在一些实施例中，5′端修饰包含核苷酸之间的硫代磷酸酯(PS)键。在一些实施例中，5′端修饰包含2′-O-甲基(2′-O-Me)、2′-O-(2-甲氧基乙基)(2′-O-MOE)和/或2′-氟(2′-F)修饰的核苷酸。在一些实施例中，5′端修饰包含至少一个硫代磷酸酯(PS)键和2′-O-甲基(2′-O-Me)、2′-O-(2-甲氧基乙基)(2′-O-MOE)和/或2′-氟(2′-F)修饰的核苷酸中的一个或多个。端修饰可以包含硫代磷酸酯(PS)、2′-O-甲基(2′-O-Me)、2′--O--(2-甲氧基乙基)(2′-O-MOE)和/或2′-氟(2′-F)修饰。等效的端修饰也包含在本文所述的实施例中。在一些实施例中，模板RNA或gRNA包含端修饰与模板RNA或gRNA的一个或多个区域的修饰组合。用于保护RNA(例如gRNA)的其他示例性修饰和方法及其式在WO 2018126176 A1(其通过援引以其全文并入本文)中进行了描述。In some embodiments, the template RNA (e.g., in the portion where it binds to the target site) or gRNA comprises 2′-O-methyl (2′-O-Me) modified nucleotides. In some embodiments, the gRNA comprises 2′-O-(2-methoxyethyl) (2′-O-moe) modified nucleotides. In some embodiments, the gRNA comprises 2′-fluoro (2′-F) modified nucleotides. In some embodiments, the gRNA comprises phosphorothioate (PS) bonds between nucleotides. In some embodiments, the gRNA comprises 5′ end modifications, 3′ end modifications, or 5′ and 3′ end modifications. In some embodiments, the 5′ end modification comprises phosphorothioate (PS) bonds between nucleotides. In some embodiments, the 5′ end modification comprises 2′-O-methyl (2′-O-Me), 2′-O-(2-methoxyethyl) (2′-O-MOE) and/or 2′-fluoro (2′-F) modified nucleotides. In some embodiments, the 5' end modification comprises at least one phosphorothioate (PS) bond and 2'-O-methyl (2'-O-Me), 2'-O-(2-methoxyethyl) (2'-O-MOE) and/or 2'-fluoro (2'-F) modified nucleotides. The end modification may comprise phosphorothioate (PS), 2'-O-methyl (2'-O-Me), 2'--O--(2-methoxyethyl) (2'-O-MOE) and/or 2'-fluoro (2'-F) modifications. Equivalent end modifications are also included in the embodiments described herein. In some embodiments, the template RNA or gRNA comprises an end modification combined with a modification of one or more regions of the template RNA or gRNA. Other exemplary modifications and methods for protecting RNA (e.g., gRNA) and formulas thereof are described in WO 2018126176 A1 (which is incorporated herein by reference in its entirety).

在一些实施例中，本文所述的模板RNA在5’端包含三个硫代磷酸酯连接，在3’端包含三个硫代磷酸酯连接。在一些实施例中，本文所述的模板RNA在5’端包含三个2’-O-甲基核糖核苷酸，在3’端包含三个2’-O-甲基核糖核苷酸。在一些实施例中，模板RNA的最5’端三个核苷酸为2’-O-甲基核糖核苷酸，模板RNA的最5’端三个核苷酸间连接为硫代磷酸酯连接，模板RNA的最3’端三个核苷酸为2’-O-甲基核糖核苷酸，并且模板RNA的最3’端三个核苷酸间连接为硫代磷酸酯连接。在一些实施例中，模板RNA包含核糖核苷酸和2’-O-甲基核糖核苷酸的交替区段，例如长度在12至28个核苷酸之间的区段。在一些实施例中，模板RNA的中心部分包含交替区段，并且5’和3’端各自包含三个2’-O-甲基核糖核苷酸和三个硫代磷酸酯连接。In some embodiments, the template RNA described herein comprises three phosphorothioate connections at the 5' end and three phosphorothioate connections at the 3' end. In some embodiments, the template RNA described herein comprises three 2'-O-methyl ribonucleotides at the 5' end and three 2'-O-methyl ribonucleotides at the 3' end. In some embodiments, the three nucleotides at the 5' end of the template RNA are 2'-O-methyl ribonucleotides, the connection between the three nucleotides at the 5' end of the template RNA is a phosphorothioate connection, the three nucleotides at the 3' end of the template RNA are 2'-O-methyl ribonucleotides, and the connection between the three nucleotides at the 3' end of the template RNA is a phosphorothioate connection. In some embodiments, the template RNA comprises alternating segments of ribonucleotides and 2'-O-methyl ribonucleotides, such as segments with a length between 12 and 28 nucleotides. In some embodiments, the central portion of the template RNA comprises alternating segments, and the 5' and 3' ends each comprise three 2'-O-methyl ribonucleotides and three phosphorothioate connections.

在一些实施例中，结构指导的且系统的方法用于将修饰(例如，2′-OMe-RNA、2′-F-RNA、和PS修饰)引入模板RNA或指导RNA，例如，如在Mir等人Nat Commun[自然通讯]9：2641(2018)(通过援引以其全文并入本文)中描述。在一些实施例中，2′-F-RNA的掺入增加了RNA：RNA或RNA：DNA双链体的热稳定性和核酸酶稳定性，例如，同时对C3′-内糖褶皱的干扰最小。在一些实施例中，在2′-OH对RNA：DNA双链体稳定性很重要的位置，2′-F可能比2′-OMe具有更好的耐受性。在一些实施例中，crRNA包含一个或多个不降低Cas9活性的修饰，例如C10、C20或C21(完全修饰的)，例如，如Mir等人Nat Commun[自然通讯]9：2641(2018)(通过援引以其全文并入本文)的补充表1中所述。在一些实施例中，tracrRNA包含一个或多个不降低Cas9活性的修饰，例如Mir等人Nat Commun[自然通讯]9：2641(2018)的补充表1中所述的T2、T6、T7或T8(完全修饰的)。在一些实施例中，包含一个或多个修饰(例如，如本文所述)的crRNA可以与包含一个或多个修饰(例如C20和T2)的tracrRNA配对。在一些实施例中，gRNA包含例如crRNA和tracrRNA的嵌合体(例如，Jinek等人Science[科学]337(6096)：816-821(2012))。在实施例中，来自crRNA和tracrRNA的修饰被映射到单指导嵌合体上，例如，以产生具有增强稳定性的经修饰的gRNA。In some embodiments, a structure-guided and systematic approach is used to introduce modifications (e.g., 2′-OMe-RNA, 2′-F-RNA, and PS modifications) into a template RNA or guide RNA, e.g., as described in Mir et al. Nat Commun [Nature Communications] 9: 2641 (2018) (incorporated herein by reference in its entirety). In some embodiments, incorporation of 2′-F-RNA increases the thermal and nuclease stability of the RNA: RNA or RNA: DNA duplex, e.g., while minimizing interference with the C3′-internal sugar fold. In some embodiments, 2′-F may be better tolerated than 2′-OMe at positions where 2′-OH is important for RNA: DNA duplex stability. In some embodiments, the crRNA comprises one or more modifications that do not reduce Cas9 activity, e.g., C10, C20, or C21 (fully modified), e.g., as described in Supplementary Table 1 of Mir et al. Nat Commun [Nature Communications] 9: 2641 (2018) (incorporated herein by reference in its entirety). In some embodiments, the tracrRNA comprises one or more modifications that do not reduce Cas9 activity, such as T2, T6, T7, or T8 (fully modified) as described in Supplementary Table 1 of Mir et al. Nat Commun [Nature Communications] 9: 2641 (2018). In some embodiments, a crRNA comprising one or more modifications (e.g., as described herein) can be paired with a tracrRNA comprising one or more modifications (e.g., C20 and T2). In some embodiments, the gRNA comprises, for example, a chimera of a crRNA and a tracrRNA (e.g., Jinek et al. Science [Science] 337(6096): 816-821 (2012)). In embodiments, modifications from crRNA and tracrRNA are mapped onto a single guide chimera, for example, to generate a modified gRNA with enhanced stability.

在一些实施例中，gRNA分子可以通过添加或减少天然存在的结构组分例如发夹来修饰。在一些实施例中，gRNA可包含缺失了一个或多个3′发夹元件的gRNA，例如，如WO2018106727(通过援引以其全文并入本文)中所述。在一些实施例中，gRNA可以包含添加的发夹结构，例如，在间隔子区中添加的发夹结构，其在Kocak等人Nat Biotechnol[自然生物技术]37(6)：657-666(2019)的教导中显示增加CRISPR-Cas系统的特异性。另外的修饰，包括缩短的gRNA和提高体内活性的特定修饰的实例可以在US 20190316121(通过援引以其全文并入本文)中找到。In some embodiments, the gRNA molecule can be modified by adding or reducing naturally occurring structural components such as hairpins. In some embodiments, the gRNA may include a gRNA lacking one or more 3' hairpin elements, for example, as described in WO2018106727 (incorporated herein in its entirety by reference). In some embodiments, the gRNA may include an added hairpin structure, for example, a hairpin structure added in the spacer region, which is shown in the teachings of Kocak et al. Nat Biotechnol [Nature Biotechnology] 37 (6): 657-666 (2019) to increase the specificity of the CRISPR-Cas system. Additional modifications, including shortened gRNAs and examples of specific modifications that increase in vivo activity can be found in US 20190316121 (incorporated herein in its entirety by reference).

在一些实施例中，结构指导的且系统的方法(例如，如以下中所述：Mir等人NatCommun[自然通讯]9：2641(2018)；通过援引以其全文并入本文)用于寻找模板RNA的修饰。在实施例中，通过包含或排除模板RNA的指导区来鉴定修饰。在一些实施例中，与模板RNA结合的多肽结构用于确定RNA的非蛋白质接触核苷酸，然后可以选择这些核苷酸进行修饰，例如，其中破坏RNA与多肽结合的风险较低。模板RNA中的二级结构也可以通过软件工具在计算机上预测，例如RNAstructure工具可在以下获得：rna.urmc.rochester.edu/RNAstructureWeb(Bellaousov等人Nucleic Acids Res[核酸研究]41：W471-W474(2013)；通过援引以其全文并入本文)，例如，以确定用于选择修饰的二级结构，例如发夹、茎和/或凸起。In some embodiments, a structure-guided and systematic approach (e.g., as described in Mir et al. Nat Commun 9:2641 (2018); incorporated herein by reference in its entirety) is used to find modifications of the template RNA. In embodiments, modifications are identified by including or excluding a guide region of the template RNA. In some embodiments, the structure of a polypeptide bound to the template RNA is used to determine the non-protein contacting nucleotides of the RNA, which can then be selected for modification, e.g., where the risk of disrupting RNA binding to the polypeptide is low. Secondary structure in the template RNA can also be predicted in silico by software tools, such as the RNAstructure tool available at rna.urmc.rochester.edu/RNAstructureWeb (Bellaousov et al. Nucleic Acids Res 41:W471-W474 (2013); incorporated herein by reference in its entirety), e.g., to determine secondary structures, such as hairpins, stems, and/or protrusions, for selection of modifications.

组合物和系统的产生Composition and system generation

如本领域技术人员将理解的那样，设计和构建核酸构建体和蛋白质或多肽(例如本文所述的系统、构建体和多肽)的方法在本领域中是常规的。通常，可以使用重组方法。通常，参见Smales和James(编辑)，Therapeutic Proteins：Methods and Protocols[治疗性蛋白：方法和方案](Methods in Molecular Biology[分子生物学方法])，Humana Press[胡玛纳出版社](2005)；以及Crommelin，Sindelar和Meibohm(编辑)，PharmaceuticalBiotechnology：Fundamentals and Applications[药物生物技术：基础与应用]，Springer[斯普林格出版社](2013)。设计、制备、评价、纯化和操纵核酸组合物的方法描述于Green和Sambrook(编辑)，Molecular Cloning：A Laboratory Manual[分子克隆：实验室手册](第四版)，Cold Spring Harbor Laboratory Press[冷泉港实验室出版社](2012)。As will be appreciated by those skilled in the art, methods for designing and constructing nucleic acid constructs and proteins or polypeptides (e.g., the systems, constructs, and polypeptides described herein) are routine in the art. Typically, recombinant methods can be used. Generally, see Smales and James (eds.), Therapeutic Proteins: Methods and Protocols (Methods in Molecular Biology), Humana Press (2005); and Crommelin, Sindelar, and Meibohm (eds.), Pharmaceutical Biotechnology: Fundamentals and Applications, Springer (2013). Methods for designing, preparing, evaluating, purifying, and manipulating nucleic acid compositions are described in Green and Sambrook (eds.), Molecular Cloning: A Laboratory Manual (Fourth Edition), Cold Spring Harbor Laboratory Press (2012).

本披露部分地提供了编码本文所述的基因修饰多肽、本文所述的模板核酸、或两者的核酸(例如，载体)。在一些实施例中，载体包含选择性标志物，例如，抗生素抗性标志物。在一些实施例中，抗生素抗性标志物是卡那霉素抗性标志物。在一些实施例中，抗生素抗性标志物不赋予对β-内酰胺抗生素的抗性。在一些实施例中，载体不包含氨苄西林抗性标志物。在一些实施例中，载体包含卡那霉素抗性标志物而不包含氨苄西林抗性标志物。在一些实施例中，将编码基因修饰多肽的载体整合到靶细胞基因组中(例如，在施用于靶细胞、组织、器官或受试者后)。在一些实施例中，不将编码基因修饰多肽的载体整合到靶细胞基因组中(例如，在施用于靶细胞、组织、器官或受试者后)。在一些实施例中，编码模板核酸(例如，模板RNA)的载体没有整合到靶细胞基因组中(例如，在施用于靶细胞、组织、器官或受试者后)。在一些实施例中，如果将载体整合到靶细胞基因组中的靶位点中，则不将选择性标志物整合到基因组中。在一些实施例中，如果将载体整合到靶细胞基因组中的靶位点中，则不将参与载体维持的基因或序列(例如，质粒维持基因)整合到基因组中。在一些实施例中，如果将载体整合到靶细胞基因组中的靶位点中，则不将转移调节序列(例如，反向末端重复序列，例如，来自AAV)整合到基因组中。在一些实施例中，向靶细胞、组织、器官或受试者施用载体(例如，编码本文所述的基因修饰多肽、本文所述的模板核酸、或两者的载体)可使载体的部分整合到所述靶细胞、组织、器官或受试者的一个或多个基因组中的一个或多个靶位点中。在一些实施例中，包含整合材料的少于99％、95％、90％、80％、70％、60％、50％、40％、30％、20％、10％、5％、4％、3％、2％、或1％的靶位点(例如，没有靶位点)包含来自载体的选择性标志物(例如，抗生素抗性基因)、转移调节序列(例如，反向末端重复序列，例如，来自AAV)、或两者。The present disclosure provides, in part, nucleic acids (e.g., vectors) encoding gene-modified polypeptides as described herein, template nucleic acids as described herein, or both. In some embodiments, the vector comprises a selective marker, e.g., an antibiotic resistance marker. In some embodiments, the antibiotic resistance marker is a kanamycin resistance marker. In some embodiments, the antibiotic resistance marker does not confer resistance to β-lactam antibiotics. In some embodiments, the vector does not comprise an ampicillin resistance marker. In some embodiments, the vector comprises a kanamycin resistance marker and does not comprise an ampicillin resistance marker. In some embodiments, the vector encoding the gene-modified polypeptide is integrated into the target cell genome (e.g., after being applied to a target cell, tissue, organ, or subject). In some embodiments, the vector encoding the gene-modified polypeptide is not integrated into the target cell genome (e.g., after being applied to a target cell, tissue, organ, or subject). In some embodiments, the vector encoding the template nucleic acid (e.g., template RNA) is not integrated into the target cell genome (e.g., after being applied to a target cell, tissue, organ, or subject). In some embodiments, if the vector is integrated into the target site in the target cell genome, the selective marker is not integrated into the genome. In some embodiments, if the vector is integrated into the target site in the target cell genome, the genes or sequences involved in the maintenance of the vector (e.g., plasmid maintenance genes) are not integrated into the genome. In some embodiments, if the vector is integrated into the target site in the target cell genome, the transfer regulatory sequence (e.g., inverted terminal repeat sequence, e.g., from AAV) is not integrated into the genome. In some embodiments, administration of a vector (e.g., a vector encoding a gene modified polypeptide as described herein, a template nucleic acid as described herein, or both) to a target cell, tissue, organ, or subject may allow a portion of the vector to be integrated into one or more target sites in one or more genomes of the target cell, tissue, organ, or subject. In some embodiments, less than 99%, 95%, 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20%, 10%, 5%, 4%, 3%, 2%, or 1% of the target sites (e.g., no target sites) containing integrated material include a selective marker (e.g., an antibiotic resistance gene) from the vector, a transfer regulatory sequence (e.g., an inverted terminal repeat sequence, e.g., from AAV), or both.

用于产生本文所述的治疗性药物蛋白质或多肽的示例性方法涉及在哺乳动物细胞中表达，尽管也可以使用昆虫细胞、酵母、细菌、或其他细胞，在适当的启动子控制下，产生重组蛋白。哺乳动物表达载体可以包含非转录元件，如复制起点、合适的启动子、以及其他5′或3’侧翼非转录序列；以及5′或3′非翻译序列，如必要的核糖体结合位点、聚腺苷酸化位点、剪接供体和受体位点、以及终止序列。源自SV40病毒基因组的DNA序列，例如SV40起点、早期启动子、剪接和聚腺苷酸化位点可以用于提供异源DNA序列表达所需的其他遗传元件。在以下文献中描述了用于与细菌、真菌、酵母、和哺乳动物细胞宿主一起使用的适当的克隆和表达载体：Green和Sambrook，Molecular Cloning：A Laboratory Manual[分子克隆：实验室手册](第四版)，Cold Spring Harbor Laboratory Press[冷泉港实验室出版社](2012)。Exemplary methods for producing therapeutic drug proteins or polypeptides described herein involve expression in mammalian cells, although insect cells, yeast, bacteria, or other cells may also be used to produce recombinant proteins under the control of appropriate promoters. Mammalian expression vectors may contain non-transcribed elements, such as an origin of replication, a suitable promoter, and other 5' or 3' flanking non-transcribed sequences; and 5' or 3' non-translated sequences, such as necessary ribosome binding sites, polyadenylation sites, splice donor and acceptor sites, and termination sequences. DNA sequences derived from the SV40 viral genome, such as the SV40 origin, early promoter, splice and polyadenylation sites, can be used to provide other genetic elements required for expression of heterologous DNA sequences. Appropriate cloning and expression vectors for use with bacterial, fungal, yeast, and mammalian cell hosts are described in the following literature: Green and Sambrook, Molecular Cloning: A Laboratory Manual (Fourth Edition), Cold Spring Harbor Laboratory Press (2012).

各种哺乳动物细胞培养系统可以用于表达和制造重组蛋白。哺乳动物表达系统的实例包括CHO、COS、HEK293、HeLA和BHK细胞系。在以下文献中描述了用于生产蛋白治疗剂的宿主细胞培养的过程：Zhou和Kantardjieff(编辑)，Mammalian Cell Cultures forBiologics Manufacturing[用于生物制品制造的哺乳动物细胞培养](Advances inBiochemical Engineering/Biotechnology[生物化学工程/生物科技的进展])，Springer[斯普林格出版社](2014)。本文所述的组合物可包括载体，例如编码重组蛋白的病毒载体，例如慢病毒载体。在一些实施例中，载体，例如病毒载体，可以包含编码重组蛋白的核酸。Various mammalian cell culture systems can be used to express and manufacture recombinant proteins. Examples of mammalian expression systems include CHO, COS, HEK293, HeLA and BHK cell lines. The process of host cell culture for producing protein therapeutics is described in the following literature: Zhou and Kantardjieff (editor), Mammalian Cell Cultures for Biologics Manufacturing [mammalian cell culture for biologics manufacturing] (Advances in Biochemical Engineering/Biotechnology [progress in biochemical engineering/biotechnology]), Springer [Springer Press] (2014). Compositions described herein may include vectors, such as viral vectors encoding recombinant proteins, such as lentiviral vectors. In certain embodiments, vectors, such as viral vectors, may include nucleic acids encoding recombinant proteins.

在以下文献中描述了蛋白治疗剂的纯化：Franks，Protein Biotechnology：Isolation，Characterization，and Stabilization[蛋白生物技术：分离、表征、和稳定化]，Humana Press[胡玛纳出版社](2013)；以及Cutler，Protein PurificationProtocols[蛋白纯化方案](Methods in Molecular Biology[分子生物学方法])，HumanaPress[胡玛纳出版社](2010)。Purification of protein therapeutics is described in Franks, Protein Biotechnology: Isolation, Characterization, and Stabilization, Humana Press (2013); and Cutler, Protein Purification Protocols (Methods in Molecular Biology), Humana Press (2010).

本披露还提供了用于产生对基因修饰多肽和/或基因组靶位点具有特异性的模板核酸分子(例如，模板RNA)的组合物和方法。在一方面，该方法包括产生RNA区段，包括上游同源区段、异源对象序列区段、基因修饰多肽结合基序和gRNA区段。The present disclosure also provides compositions and methods for producing template nucleic acid molecules (e.g., template RNA) specific to gene-modified polypeptides and/or genomic target sites. In one aspect, the method includes producing RNA segments, including upstream homologous segments, heterologous object sequence segments, gene-modified polypeptide binding motifs, and gRNA segments.

治疗性应用Therapeutic applications

在一些实施例中，如本文所述的基因修饰系统可用于修饰细胞(例如，动物细胞、植物细胞或真菌细胞)。在一些实施例中，如本文所述的基因修饰系统可用于修饰哺乳动物细胞(例如，人细胞)。在一些实施例中，如本文所述的基因修饰系统可用于修饰来自家畜动物(例如，牛、马、绵羊、山羊、猪、美洲驼、羊驼、骆驼、牦牛、鸡、鸭、鹅或鸵鸟)的细胞。在一些实施例中，如本文所述的基因修饰系统可用作实验室工具或研究工具，或用于实验室方法或研究方法中，例如以修饰动物细胞例如哺乳动物细胞(例如，人细胞)、植物细胞或真菌细胞。In some embodiments, the gene modification system as described herein can be used to modify cells (e.g., animal cells, plant cells, or fungal cells). In some embodiments, the gene modification system as described herein can be used to modify mammalian cells (e.g., human cells). In some embodiments, the gene modification system as described herein can be used to modify cells from livestock animals (e.g., cattle, horses, sheep, goats, pigs, llamas, alpacas, camels, yaks, chickens, ducks, geese, or ostriches). In some embodiments, the gene modification system as described herein can be used as a laboratory tool or research tool, or in a laboratory method or research method, for example, to modify animal cells such as mammalian cells (e.g., human cells), plant cells, or fungal cells.

通过将编码基因整合到RNA序列模板中，基因修饰系统可以满足治疗需求，例如，通过在具有功能丧失性突变的个体中提供治疗性转基因的表达，通过以正常转基因代替功能获得性突变，通过提供调节序列以消除功能获得性突变表达，和/或通过控制可操作地连接的基因、转基因及其系统的表达。在某些实施例中，RNA序列模板编码对宿主细胞的治疗需要具有特异性的启动子区，例如组织特异性启动子或增强子。在又其他实施例中，启动子可以可操作地连接至编码序列。By integrating the coding gene into the RNA sequence template, the gene modification system can meet the treatment needs, for example, by providing the expression of therapeutic transgenes in individuals with loss-of-function mutations, by replacing gain-of-function mutations with normal transgenes, by providing regulatory sequences to eliminate gain-of-function mutation expression, and/or by controlling the expression of operably connected genes, transgenes and their systems. In certain embodiments, the RNA sequence template encodes a promoter region that has specificity for the treatment of the host cell, such as a tissue-specific promoter or enhancer. In yet other embodiments, the promoter can be operably linked to the coding sequence.

因此，本文提供用于治疗有需要的受试者的镰状细胞病(SCD)(例如镰状细胞贫血)的方法。在一些实施例中，治疗促使与SCD相关的一种或多种症状的改善。Thus, provided herein are methods for treating sickle cell disease (SCD) (eg, sickle cell anemia) in a subject in need thereof. In some embodiments, treatment results in improvement of one or more symptoms associated with SCD.

在一些实施例中，本文的系统用于治疗具有E6(例如E6V)突变的受试者。In some embodiments, the systems herein are used to treat a subject having an E6 (eg, E6V) mutation.

在一些实施例中，用本文披露的系统处理可纠正约60％-70％(例如约60％-65％或约65％-70％)细胞中的E6V突变。在一些实施例中，用本文披露的系统处理可纠正约60％-70％(例如约60％-65％或约65％-70％)的从经处理的细胞分离的DNA中的E6V突变。In some embodiments, treatment with a system disclosed herein corrects E6V mutations in about 60%-70% (e.g., about 60%-65% or about 65%-70%) of cells. In some embodiments, treatment with a system disclosed herein corrects E6V mutations in about 60%-70% (e.g., about 60%-65% or about 65%-70%) of DNA isolated from treated cells.

在一些实施例中，利用本文所述的基因修饰系统进行治疗可产生以下一项或多项结果：In some embodiments, treatment using the gene modification system described herein can result in one or more of the following outcomes:

(a)镰刀形细胞数目减少；(a) The number of sickle cells decreases;

(b)β-球蛋白(例如，血红蛋白S)的异常形式的产生减少；(b) decreased production of abnormal forms of β-globulins (e.g., hemoglobin S);

(c)与镰状细胞相关的血管阻塞相关的疼痛和/或器官损伤的减少；和/或(c) reduction in pain and/or organ damage associated with sickle cell-related vascular occlusion; and/or

(d)正常血流的增加，(d) Increase in normal blood flow,

这些结果为与未接受本文所述基因修饰系统治疗的SCD受试者相比而言。These results were compared to SCD subjects who were not treated with the gene modification system described herein.

施用和递送Administration and delivery

本文所述的组合物和系统可以在体外或体内使用。在一些实施例中，例如在体外或体内将系统或系统的组分递送至细胞(例如，哺乳动物细胞，例如人细胞)。在一些实施例中，细胞是真核细胞，例如多细胞生物的细胞，例如动物，例如哺乳动物(例如人、猪、牛)、鸟(例如家禽，例如鸡、火鸡、或鸭)或鱼。在一些实施例中，细胞是非人动物细胞(例如，实验动物、家畜动物或伴侣动物)。在一些实施例中，细胞是干细胞(例如，造血干细胞)、成纤维细胞或T细胞。在一些实施例中，细胞是免疫细胞，例如，T细胞(例如，Treg、CD4、CD8、γδ或记忆T细胞)、B细胞(例如，记忆B细胞或浆细胞)或NK细胞。在一些实施例中，细胞是非分裂细胞，例如非分裂成纤维细胞或非分裂T细胞。在一些实施例中，细胞是HSC，并且p53没有被上调或被上调少于10％、5％、2％或1％、例如，如根据PCT/US2019/048607的实例30中所述的方法测定。技术人员将理解，能以多肽、核酸(例如，DNA、RNA)及其组合的形式递送基因修饰系统的组分。The compositions and systems described herein can be used in vitro or in vivo. In some embodiments, for example, the system or components of the system are delivered to cells (e.g., mammalian cells, such as human cells) in vitro or in vivo. In some embodiments, the cell is a eukaryotic cell, such as a cell of a multicellular organism, such as an animal, such as a mammal (e.g., a human, a pig, a cow), a bird (e.g., poultry, such as a chicken, a turkey, or a duck) or a fish. In some embodiments, the cell is a non-human animal cell (e.g., an experimental animal, a livestock animal, or a companion animal). In some embodiments, the cell is a stem cell (e.g., a hematopoietic stem cell), a fibroblast, or a T cell. In some embodiments, the cell is an immune cell, for example, a T cell (e.g., a Treg, a CD4, a CD8, a γδ or a memory T cell), a B cell (e.g., a memory B cell or a plasma cell) or a NK cell. In some embodiments, the cell is a non-dividing cell, such as a non-dividing fibroblast or a non-dividing T cell. In some embodiments, the cell is an HSC and p53 is not upregulated or is upregulated by less than 10%, 5%, 2% or 1%, for example, as determined according to the method described in Example 30 of PCT/US2019/048607. The skilled artisan will appreciate that the components of the gene modification system can be delivered in the form of polypeptides, nucleic acids (e.g., DNA, RNA), and combinations thereof.

在一个实施例中，系统和/或系统的组分以核酸的形式递送。例如，基因修饰多肽能以编码该多肽的DNA或RNA的形式递送，并且模板RNA能以RNA或其有待转录成RNA的互补DNA的形式递送。在一些实施例中，系统或系统的组分在1、2、3、4或更多个不同的核酸分子上递送。在一些实施例中，系统或系统的组分作为DNA和RNA的组合递送。在一些实施例中，系统或系统的组分作为DNA和蛋白质的组合递送。在一些实施例中，系统或系统的组分作为RNA和蛋白质的组合递送。在一些实施例中，基因修饰多肽作为蛋白质递送。In one embodiment, the system and/or components of the system are delivered in the form of nucleic acids. For example, a genetically modified polypeptide can be delivered in the form of a DNA or RNA encoding the polypeptide, and a template RNA can be delivered in the form of RNA or its complementary DNA to be transcribed into RNA. In some embodiments, the system or components of the system are delivered on 1, 2, 3, 4 or more different nucleic acid molecules. In some embodiments, the system or components of the system are delivered as a combination of DNA and RNA. In some embodiments, the system or components of the system are delivered as a combination of DNA and protein. In some embodiments, the system or components of the system are delivered as a combination of RNA and protein. In some embodiments, the genetically modified polypeptide is delivered as a protein.

在一些实施例中，使用载体将系统或系统的组分递送至细胞，例如哺乳动物细胞或人细胞。载体可以是例如质粒或病毒。在一些实施例中，递送是体内、体外、离体或原位的。在一些实施例中，病毒是腺相关病毒(AAV)、慢病毒或腺病毒。在一些实施例中，系统或系统的组分与病毒样颗粒或病毒体一起被递送至细胞。在一些实施例中，递送使用一种以上的病毒、病毒样颗粒或病毒体。In some embodiments, a system or a component of a system is delivered to a cell, such as a mammalian cell or a human cell, using a vector. The vector can be, for example, a plasmid or a virus. In some embodiments, delivery is in vivo, in vitro, ex vivo or in situ. In some embodiments, the virus is an adeno-associated virus (AAV), a lentivirus or an adenovirus. In some embodiments, a system or a component of a system is delivered to a cell together with a virus-like particle or a virion. In some embodiments, delivery uses more than one virus, virus-like particle or virion.

在一个实施例中，本文所述的组合物和系统可以配制在脂质体或其他类似的囊泡中。脂质体是球形囊泡结构，这些球形囊泡结构由围绕内部水性隔室的单层或多层的脂质双层和相对不可渗透的外部亲脂性磷脂双层构成。脂质体可以是阴离子的、中性的或阳离子的。脂质体具有生物相容性，无毒，可以递送亲水性和亲脂性药物分子，保护其货物免受血浆酶的降解，并将其负载运输穿过生物膜和血脑屏障(BBB)(有关综述，参见，例如，Spuch和Navarro，Journal of Drug Delivery[药物递送杂志]，第2011卷，文章ID 469679，第12页，2011.doi：10.1155/2011/469679)。In one embodiment, compositions and systems described herein can be formulated in liposomes or other similar vesicles. Liposomes are spherical vesicle structures, which are composed of a monolayer or multilayer lipid bilayer around an internal aqueous compartment and a relatively impermeable external lipophilic phospholipid bilayer. Liposomes can be anionic, neutral or cationic. Liposomes are biocompatible, nontoxic, can deliver hydrophilic and lipophilic drug molecules, protect their cargo from degradation by plasma enzymes, and transport their loads across biomembranes and blood-brain barriers (BBB) (for review, see, e.g., Spuch and Navarro, Journal of Drug Delivery [Drug Delivery Magazine], Vol. 2011, Article ID 469679, p. 12, 2011.doi: 10.1155/2011/469679).

囊泡可以由若干种不同类型的脂质制成；然而，磷脂最常用于生成脂质体作为药物载剂。用于制备多层囊泡脂质的方法是本领域已知的(参见例如美国专利号6,693,086，其关于多层囊泡脂质制备的传授内容通过援引并入本文)。尽管当脂质膜与水溶液混合时，囊泡形成可以是自发的，但也可以通过经由使用均质器、超声波仪或挤压设备以振荡的形式施加力来加快囊泡形成(关于综述，参见例如，Spuch和Navarro，Journal of DrugDelivery[药物递送杂志]，第2011卷，文章ID 469679，第12页，2011.doi：10.1155/2011/469679)。可以通过挤出通过具有减小尺寸的过滤器来制备挤出的脂质，如Templeton等人，Nature Biotech[自然生物技术]，15：647-652，1997中所述，该文献关于挤出脂质制备的传授内容通过援引并入本文。Vesicles can be made of several different types of lipids; however, phospholipids are most commonly used to generate liposomes as drug carriers. Methods for preparing multilamellar vesicle lipids are known in the art (see, e.g., U.S. Patent No. 6,693,086, which is incorporated herein by reference for its teachings on the preparation of multilamellar vesicle lipids). Although vesicle formation can be spontaneous when the lipid film is mixed with an aqueous solution, vesicle formation can also be accelerated by applying force in the form of oscillations using a homogenizer, sonicator, or extrusion device (for review, see, e.g., Spuch and Navarro, Journal of Drug Delivery, Vol. 2011, Article ID 469679, p. 12, 2011. doi: 10.1155/2011/469679). Extruded lipids can be prepared by extrusion through a filter of decreasing size as described in Templeton et al., Nature Biotech, 15:647-652, 1997, which is incorporated herein by reference for its teachings on the preparation of extruded lipids.

多种纳米颗粒可用于递送，例如脂质体、脂质纳米颗粒、阳离子脂质纳米颗粒、可电离脂质纳米颗粒、聚合物纳米颗粒、金纳米颗粒、树枝状大分子、环糊精纳米颗粒、胶束或上述的组合。A variety of nanoparticles can be used for delivery, such as liposomes, lipid nanoparticles, cationic lipid nanoparticles, ionizable lipid nanoparticles, polymer nanoparticles, gold nanoparticles, dendrimers, cyclodextrin nanoparticles, micelles, or combinations thereof.

脂质纳米颗粒是为本文所述的药物组合物提供生物相容性且可生物降解的递送系统的载剂的实例。纳米结构化的脂质载剂(NLC)是经修饰的固体脂质纳米颗粒(SLN)，这些经修饰的固体脂质纳米颗粒保留了SLN的特征、改善了药物稳定性和负载能力、并且防止了药物泄漏。聚合物纳米颗粒(PNP)是药物递送的重要组成部分。这些纳米颗粒可以有效地将药物递送引导至特定靶标并且改善药物稳定性和受控的药物释放。也可以使用脂质聚合物纳米颗粒(PLN)，即一种组合了脂质体和聚合物的载剂。这些纳米颗粒具有PNP和脂质体的互补优势。PLN由核-壳结构构成；聚合物核提供了稳定的结构，并且磷脂壳提供了良好的生物相容性。这样，这两种组分增加了药物包封效率、促进了表面修饰、并且防止了水溶性药物的泄漏。对于综述，参见例如，Li等人2017，Nanomaterials[纳米材料]7，122；doi：10.3390/nano7060122。Lipid nanoparticles are examples of carriers that provide biocompatibility and biodegradable delivery systems for pharmaceutical compositions described herein. Nanostructured lipid carriers (NLCs) are modified solid lipid nanoparticles (SLNs), which retain the characteristics of SLNs, improve drug stability and loading capacity, and prevent drug leakage. Polymer nanoparticles (PNPs) are an important component of drug delivery. These nanoparticles can effectively guide drug delivery to specific targets and improve drug stability and controlled drug release. Lipopolymer nanoparticles (PLNs), i.e., a carrier that combines liposomes and polymers, can also be used. These nanoparticles have the complementary advantages of PNPs and liposomes. PLNs are composed of core-shell structures; polymer cores provide stable structures, and phospholipid shells provide good biocompatibility. In this way, these two components increase drug encapsulation efficiency, promote surface modification, and prevent leakage of water-soluble drugs. For a review, see, e.g., Li et al. 2017, Nanomaterials 7, 122; doi: 10.3390/nano7060122.

外泌体也可用作本文所述的组合物和系统的药物递送媒介物。对于综述，参见Ha等人2016年7月.Acta Pharmaceutica Sinica B[药学学报]第6卷第4期，第287-296页；https：//doi.org/10.1016/j.apsb.2016.02.001。Exosomes can also be used as drug delivery vehicles for the compositions and systems described herein. For a review, see Ha et al., July 2016. Acta Pharmaceutica Sinica B, Vol. 6, No. 4, pp. 287-296; https://doi.org/10.1016/j.apsb.2016.02.001.

融合体与靶细胞相互作用并融合，并因此可用作多种分子的递送媒介物。它们通常由封闭管腔或腔的两亲性脂质双层和与两亲性脂质双层相互作用的融合剂组成。融合剂组分已被证明是可工程化的，以便为融合和载荷递送赋予靶细胞特异性，从而允许创建具有可编程细胞特异性的递送媒介物(参见例如专利申请WO 2020014209，其涉及融合体设计、制备和使用的教导通过援引并入本文)。Fusogens interact and fuse with target cells, and can therefore be used as delivery vehicles for a variety of molecules. They are generally composed of an amphipathic lipid bilayer that closes the lumen or cavity and a fusogen that interacts with the amphipathic lipid bilayer. Fusogen components have been shown to be engineered to give target cell specificity for fusion and load delivery, thereby allowing the creation of delivery vehicles with programmable cell specificity (see, for example, patent application WO 2020014209, which relates to the design, preparation and use of fusogens, which are incorporated herein by reference).

在一些实施例中，基因修饰系统的一种或多种蛋白质组分可以与模板核酸(例如，模板RNA)预先关联。例如，在一些实施例中，基因修饰多肽可以首先与模板核酸(例如，模板RNA)组合以形成核糖核蛋白(RNP)复合物。在一些实施例中，可通过例如转染、核转染、病毒、囊泡、LNP、外泌体、融合体将RNP递送至细胞。In some embodiments, one or more protein components of the gene modification system can be pre-associated with a template nucleic acid (e.g., a template RNA). For example, in some embodiments, a gene modification polypeptide can first be combined with a template nucleic acid (e.g., a template RNA) to form a ribonucleoprotein (RNP) complex. In some embodiments, RNPs can be delivered to cells by, for example, transfection, nuclear transfection, viruses, vesicles, LNPs, exosomes, fusions.

可以将基因修饰系统引入细胞、组织和多细胞生物中。在一些实施例中，系统或系统的组分经由机械手段或物理手段递送至细胞。Genetic modification systems can be introduced into cells, tissues, and multicellular organisms. In some embodiments, the system or components of the system are delivered to the cell via mechanical or physical means.

以下文献中描述了蛋白治疗剂的配制品：Meyer(编辑)，Therapeutic ProteinDrug Products：Practical Approaches to formulation in the Laboratory，Manufacturing，and the Clinic[治疗性蛋白药物产品：实验室、制造和临床中配制品的实践方法]，Woodhead Publishing Series[伍德海德出版系列](2012)。The formulation of protein therapeutics is described in Meyer (ed.), Therapeutic Protein Drug Products: Practical Approaches to formulation in the Laboratory, Manufacturing, and the Clinic, Woodhead Publishing Series (2012).

组织特异性活性/施用Tissue-specific activity/administration

在一些实施例中，本文所述的系统可以利用一个或多个特征(例如，启动子或微小RNA结合位点)来限制脱靶细胞或组织中的活性。In some embodiments, the systems described herein can utilize one or more features (eg, promoter or microRNA binding site) to limit activity in off-target cells or tissues.

在一些实施例中，本文所述的核酸(例如，模板RNA或编码模板RNA的DNA)包含启动子序列，例如组织特异性启动子序列。在一些实施例中，组织特异性启动子用于增加基因修饰系统的靶细胞特异性。例如，可以基于启动子在靶细胞类型中有活性但在非靶细胞类型中无活性(或在较低水平上有活性)来选择启动子。因此，即使启动子整合到非靶细胞的基因组中，它也不会驱动整合基因的表达(或仅驱动低水平表达)。如本文所述，在模板RNA中具有组织特异性启动子序列的系统也可与微小RNA结合位点(例如在模板RNA或编码基因修饰蛋白的核酸中，例如，如本文所述)组合使用。在模板RNA中具有组织特异性启动子序列的系统也可与由组织特异性启动子驱动的编码基因修饰多肽的DNA组合使用，例如，以在靶细胞中获得比非靶细胞中更高水平的基因修饰蛋白。在一些实施例中，例如，对于肝脏适应症，组织特异性启动子选自WO 2020014209(通过援引并入本文)的表3。In some embodiments, nucleic acid described herein (e.g., template RNA or DNA encoding template RNA) comprises a promoter sequence, such as a tissue-specific promoter sequence. In some embodiments, tissue-specific promoters are used to increase the target cell specificity of the gene modification system. For example, a promoter can be selected based on the fact that the promoter is active in the target cell type but inactive (or active at a lower level) in the non-target cell type. Therefore, even if the promoter is integrated into the genome of the non-target cell, it will not drive the expression of the integrated gene (or only drive low-level expression). As described herein, a system with a tissue-specific promoter sequence in the template RNA can also be used in combination with a microRNA binding site (e.g., in a template RNA or a nucleic acid encoding a gene modification protein, for example, as described herein). A system with a tissue-specific promoter sequence in the template RNA can also be used in combination with a DNA encoding a gene modification polypeptide driven by a tissue-specific promoter, for example, to obtain a higher level of gene modification protein in the target cell than in the non-target cell. In some embodiments, for example, for liver indications, a tissue-specific promoter is selected from Table 3 of WO 2020014209 (incorporated herein by reference).

在一些实施例中，本文所述的核酸(例如，模板RNA或编码模板RNA的DNA)包含微小RNA结合位点。在一些实施例中，微小RNA结合位点用于增加基因修饰系统的靶细胞特异性。例如，可以基于在非靶细胞类型中存在但在靶细胞类型中不存在(或相对于非靶细胞而言以降低的水平存在)的miRNA的识别来选择微小RNA结合位点。因此，当模板RNA存在于非靶细胞中时，它将与miRNA结合，而当模板RNA存在于靶细胞中时，它将不会与niRNA结合(或结合，但相对于非靶细胞而言以降低的水平结合)。尽管不希望受到理论的束缚，但miRNA与模板RNA的结合可以干扰其活性，例如，可以干扰异源对象序列插入基因组。因此，该系统将比其编辑非靶细胞的基因组更有效地编辑靶细胞的基因组，例如，异源对象序列将比插入非靶细胞的基因组更有效地插入靶细胞的基因组，或者插入或缺失在靶细胞中比在非靶细胞中更有效地产生。在模板RNA(或编码它的DNA)中具有微小RNA结合位点的系统也可以与编码基因修饰多肽的核酸组合使用，其中基因修饰多肽的表达受第二微小RNA结合位点的调节，例如如本文所述。在一些实施例中，例如，对于肝适应症，miRNA选自WO 2020014209(通过援引并入本文)的表4。In some embodiments, nucleic acid as described herein (e.g., template RNA or DNA encoding template RNA) comprises microRNA binding site. In some embodiments, microRNA binding site is used to increase the target cell specificity of gene modification system. For example, microRNA binding site can be selected based on the recognition of miRNA present in non-target cell type but not present in target cell type (or present at a reduced level relative to non-target cell). Therefore, when template RNA is present in non-target cell, it will be combined with miRNA, and when template RNA is present in target cell, it will not be combined with niRNA (or combined, but combined at a reduced level relative to non-target cell). Although it is not desired to be bound by theory, the combination of miRNA and template RNA can interfere with its activity, for example, heterologous object sequence can be interfered with to insert into genome. Therefore, the system will edit the genome of target cell more effectively than its editing of non-target cell genome, for example, heterologous object sequence will be more effectively inserted into the genome of target cell than inserting into the genome of non-target cell, or insertion or deletion is more effectively produced in target cell than in non-target cell. Systems with microRNA binding sites in the template RNA (or DNA encoding it) can also be used in combination with nucleic acids encoding gene-modified polypeptides, wherein the expression of the gene-modified polypeptide is regulated by a second microRNA binding site, for example as described herein. In some embodiments, for example, for liver indications, the miRNA is selected from Table 4 of WO 2020014209 (incorporated herein by reference).

在一些实施例中，模板RNA包含微小RNA序列、siRNA序列、指导RNA序列、或piwiRNA序列。In some embodiments, the template RNA comprises a microRNA sequence, a siRNA sequence, a guide RNA sequence, or a piwiRNA sequence.

启动子Promoter

在一些实施例中，一种或多种启动子或增强子元件可操作地连接至编码基因修饰蛋白的核酸或模板核酸，例如，其控制异源对象序列的表达。在某些实施例中，该一个或多个启动子或增强子元件包含细胞类型或组织特异性元件。在一些实施例中，启动子或增强子是相同的或源自天然地控制异源对象序列表达的启动子或增强子。例如，鸟氨酸转氨甲酰酶启动子和增强子可用于在本发明提供的系统或方法中控制鸟氨酸转氨甲酰酶基因的表达以便纠正鸟氨酸转氨甲酰酶缺陷。在一些实施例中，启动子是表16或17中的启动子或其功能片段或变体。In some embodiments, one or more promoters or enhancer elements are operably connected to nucleic acids or template nucleic acids encoding gene modification proteins, for example, they control the expression of heterologous subject sequences. In certain embodiments, the one or more promoters or enhancer elements comprise cell type or tissue specific elements. In some embodiments, the promoter or enhancer is identical or derived from a promoter or enhancer that naturally controls the expression of heterologous subject sequences. For example, the ornithine transcarbamylase promoter and enhancer can be used to control the expression of the ornithine transcarbamylase gene in the system or method provided by the invention to correct the ornithine transcarbamylase defect. In some embodiments, the promoter is a promoter in Table 16 or 17 or a functional fragment or variant thereof.

例如，可在统一资源定位器(例如，www.invivogen.com/tissue-specific-promoters)中找到可商购的示例性组织特异性启动子。在一些实施例中，启动子是天然启动子或最小启动子，例如由来自给定基因的5’区域的单个片段组成。在一些实施例中，天然启动子包括核心启动子及其天然5’UTR。在一些实施例中，5’UTR包含内含子。在其他实施例中，这些包括复合型启动子，这些复合型启动子组合了起点不同的启动子元件，或由远端增强子与起点相同的最小启动子组装而产生。For example, commercially available exemplary tissue-specific promoters can be found in the uniform resource locator (e.g., www.invivogen.com/tissue-specific-promoters). In some embodiments, the promoter is a natural promoter or a minimal promoter, for example, consisting of a single fragment from the 5' region of a given gene. In some embodiments, the natural promoter includes a core promoter and its natural 5'UTR. In some embodiments, the 5'UTR includes an intron. In other embodiments, these include composite promoters that combine promoter elements with different starting points, or are produced by assembling a distal enhancer with the same minimal promoter as the starting point.

示例性细胞或组织特异性启动子在下表中提供，并且编码它们的示例性核酸序列是本领域已知的并且可以使用多种资源容易地获得，例如NCBI数据库，包括RefSeq，以及真核启动子数据库(//epd.epfl.ch//index.php)。Exemplary cell or tissue specific promoters are provided in the table below, and exemplary nucleic acid sequences encoding them are known in the art and can be readily obtained using a variety of resources, such as the NCBI database, including RefSeq, and the Eukaryotic Promoter Database (//epd.epfl.ch//index.php).

表16.示例性细胞或组织特异性启动子Table 16. Exemplary cell or tissue specific promoters

表17.另外的示例性细胞或组织特异性启动子Table 17. Additional exemplary cell or tissue specific promoters

取决于所利用的宿主/载体系统，可以在表达载体中使用许多合适的转录和翻译控制元件中的任一种，包括组成型和诱导型启动子、转录增强子元件、转录终止子等(参见例如，Bitter等人(1987)Methods in Enzymology[酶学方法]，153：516-544；其通过援引以其全文并入本文)。Depending on the host/vector system utilized, any of a number of suitable transcription and translation control elements may be used in the expression vector, including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, and the like (see, e.g., Bitter et al. (1987) Methods in Enzymology, 153:516-544; which is incorporated herein by reference in its entirety).

在一些实施例中，编码基因修饰蛋白或模板核酸的核酸与控制元件(例如，转录控制元件，例如启动子)可操作地连接。在一些实施例中，转录控制元件可以在以下中起作用：真核细胞例如哺乳动物细胞；或原核细胞(例如细菌或古细菌细胞)。在一些实施例中，编码多肽的核苷酸序列与例如允许该编码多肽的核苷酸序列在原核和真核细胞中表达的多个控制元件可操作地连接。In some embodiments, the nucleic acid encoding the gene modification protein or template nucleic acid is operably linked to a control element (e.g., a transcriptional control element, such as a promoter). In some embodiments, the transcriptional control element may function in: a eukaryotic cell such as a mammalian cell; or a prokaryotic cell (e.g., a bacterial or archaeal cell). In some embodiments, the nucleotide sequence encoding the polypeptide is operably linked to a plurality of control elements that allow, for example, the nucleotide sequence encoding the polypeptide to be expressed in prokaryotic and eukaryotic cells.

出于说明目的，空间上受限的启动子的实例包括但不限于神经元特异性启动子、脂肪细胞特异性启动子、心肌细胞特异性启动子、平滑肌特异性启动子、光感受器特异性启动子等。神经元特异性空间上受限的启动子包括但不限于神经元特异性烯醇化酶(NSE)启动子(参见例如EMBL HSENO2、X51956)；芳香族氨基酸脱羧酶(AADC)启动子、神经丝启动子(参见例如，GenBank HUMNFL，L04147)；突触蛋白启动子(参见例如，GenBank HUMSYNIB，M55301)；thy-1启动子(参见例如，Chen等人(1987)Cell[细胞]51：7-19；以及Llewellyn，等人(2010)Nat.Med.[自然·医学]16(10)：1161-1166)；5-羟色胺受体启动子(参见例如，GenBank S62283)；酪氨酸羟化酶启动子(TH)(参见例如，Oh等人(2009)Gene Ther[基因疗法]16：437；Sasaoka等人(1992)Mol.Brain Res.[分子脑研究]16：274；Boundy等人(1998)J.Neurosci.[神经科学杂志]18：9989；以及Kaneda等人(1991)Neuron[神经元]6：583-594)；GnRH启动子(参见例如，Radovick等人(1991)Proc.Natl.Acad.Sci.USA[美国国家科学院院刊]88：3402-3406)；L7启动子(参见例如，Oberdick等人(1990)Science[科学]248：223-226)；DNMT启动子(参见例如，Bartge等人(1988)Proc.Natl.Acad.Sci.USA[美国国家科学院院刊]85：3648-3652)；脑啡肽启动子(参见例如，Comb等人(1988)EMBO J.[欧洲分子生物学学会杂志]17：3793-3805)；髓鞘碱性蛋白(MBP)启动子；Ca2+-钙调蛋白依赖性蛋白激酶II-α(CamKHα)启动子(参见例如，Mayford等人(1996)Proc.Natl.Acad.Sci.USA[美国国家科学院院刊]93：13250；以及Casanova等人(2001)Genesis[遗传]31：37)；CMV增强子/血小板源性生长因子-β启动子(参见例如，Liu等人(2004)Gene Therapy[基因疗法]11：52-60)；等。For purposes of illustration, examples of spatially restricted promoters include, but are not limited to, neuron-specific promoters, adipocyte-specific promoters, cardiomyocyte-specific promoters, smooth muscle-specific promoters, photoreceptor-specific promoters, and the like. Neuron-specific spatially restricted promoters include, but are not limited to, the neuron-specific enolase (NSE) promoter (see, e.g., EMBL HSENO2, X51956); the aromatic amino acid decarboxylase (AADC) promoter, the neurofilament promoter (see, e.g., GenBank HUMNFL, L04147); the synapsin promoter (see, e.g., GenBank HUMSYNIB, M55301); the thy-1 promoter (see, e.g., Chen et al. (1987) Cell 51:7-19; and Llewellyn, et al. (2010) Nat. Med. 16(10):1161-1166); the serotonin receptor promoter (see, e.g., GenBank S62283); the tyrosine hydroxylase promoter (TH) (see, e.g., Oh et al. (2009) Gene Ther [Gene Therapy] 16: 437; Sasaoka et al. (1992) Mol. Brain =The promoters of the DNMT promoters are shown in Table 1. The promoters of the DNMT promoters are shown in Table 2. The promoters of the DNMT promoters are shown in Table 3. The promoters of the DNMT promoters are shown in Table 4. The promoters of the DNMT promoters are shown in Table 5. J. [Journal of the European Molecular Biology Association] 17: 3793-3805); myelin basic protein (MBP) promoter; Ca2+-calmodulin-dependent protein kinase II-α (CamKHα) promoter (see, e.g., Mayford et al. (1996) Proc. Natl. Acad. Sci. USA [Proceedings of the National Academy of Sciences of the United States of America] 93: 13250; and Casanova et al. (2001) Genesis [Genetics] 31: 37); CMV enhancer/platelet-derived growth factor-β promoter (see, e.g., Liu et al. (2004) Gene Therapy [Gene Therapy] 11: 52-60); etc.

脂肪细胞特异性的空间上受限的启动子包括但不限于：aP2基因启动子/增强子，例如，人aP2基因的-5.4kb至+21bp区域(参见例如，Tozzo等人(1997)Endocrinol[内分泌学].138：1604；Ross等人(1990)Proc.Natl.Acad.Sci.USA[美国国家科学院院刊]87：9590；以及Pavjani等人(2005)Nat.Med.[自然·医学]11：797)；葡萄糖转运蛋白-4(GLUT4)启动子(参见例如，Knight等人(2003)Proc.Natl.Acad.Sci.USA[美国国家科学院院刊]100：14725)；脂肪酸转位酶(FAT/CD36)启动子(参见例如Kuriki等人(2002)Biol.Pharm.Bull.[生物和医药学报]25：1476；以及Sato等人(2002)J.Biol.Chem.[生物化学杂志]277：15703)；硬脂酰基-辅酶A去饱和酶-1(SCD1)启动子(Tabor等人(1999)J.Biol.Chem.[生物化学杂志]274：20603)；瘦素启动子(参见例如，Mason等人(1998)Endocrinol.[内分泌学]139：1013；以及Chen等人(1999)Biochem.Biophys.Res.Comm.[生物化学与生物物理研究通讯]262：187)；脂连蛋白启动子(参见例如，Kita等人(2005)Biochem.Biophys.Res.Comm.[生物化学与生物物理研究通讯]331：484；以及Chakrabarti(2010)Endocrinol.[内分泌学]151：2408)；降脂蛋白启动子(参见例如，Platt等人(1989)Proc.Natl.Acad.Sci.USA[美国国家科学院院刊]86：7490)；抗胰岛素蛋白启动子(参见例如，Seo等人(2003)Molec.Endocrinol.[分子内分泌学]17：1522)；等。Adipocyte-specific spatially restricted promoters include, but are not limited to: the aP2 gene promoter/enhancer, e.g., the -5.4 kb to +21 bp region of the human aP2 gene (see, e.g., Tozzo et al. (1997) Endocrinol [Endocrinology]. 138:1604; Ross et al. (1990) Proc. Natl. Acad. Sci. USA [Proceedings of the National Academy of Sciences of the United States of America] 87:9590; and Pavjani et al. (2005) Nat. Med. [Nature Medicine] 11:797); glucose transporter- 4 (GLUT4) promoter (see, e.g., Knight et al. (2003) Proc. Natl. Acad. Sci. USA 100:14725); fatty acid translocase (FAT/CD36) promoter (see, e.g., Kuriki et al. (2002) Biol. Pharm. Bull. 25:1476; and Sato et al. (2002) J. Biol. Chem. 277:15703); stearoyl-CoA desaturase-1 (S CD1) promoter (Tabor et al. (1999) J. Biol. Chem. 274:20603); leptin promoter (see, e.g., Mason et al. (1998) Endocrinol. 139:1013; and Chen et al. (1999) Biochem. Biophys. Res. Comm. 262:187); adiponectin promoter (see, e.g., Kita et al. (2005) Biochem. Biophys. Res. Comm. 264:187). es. Comm. [Biochemical and Biophysical Research Communications] 331: 484; and Chakrabarti (2010) Endocrinol. [Endocrinology] 151: 2408); lipoprotein promoter (see, e.g., Platt et al. (1989) Proc. Natl. Acad. Sci. USA [Proceedings of the National Academy of Sciences of the United States of America] 86: 7490); anti-insulin protein promoter (see, e.g., Seo et al. (2003) Molec. Endocrinol. [Molecular Endocrinology] 17: 1522); etc.

心肌细胞特异性的空间上受限的启动子包括但不限于源自以下基因的控制序列：肌球蛋白轻链-2、α-肌球蛋白重链、AE3、心肌肌钙蛋白C、心肌肌动蛋白等。Franz等人(1997)Cardiovasc.Res.[心血管研究]35：560-566；Robbins等人(1995)Ann.N.Y.Acad.Sci.[纽约科学院年鉴]752：492-505；Linn等人(1995)Circ.Res.[循环研究]76：584-591；Parmacek等人(1994)Mol.Cell.Biol.[分子细胞生物学]14：1870-1885；Hunter等人(1993)Hypertension[高血压]22：608-617；以及Sartorelli等人(1992)Proc.Natl.Acad.Sci.USA[美国国家科学院院刊]89：4047-4051。Cardiomyocyte-specific, spatially restricted promoters include, but are not limited to, control sequences derived from genes such as myosin light chain-2, α-myosin heavy chain, AE3, cardiac troponin C, cardiac actin, and the like. Franz et al. (1997) Cardiovasc. Res. 35:560-566; Robbins et al. (1995) Ann. N.Y. Acad. Sci. 752:492-505; Linn et al. (1995) Circ. Res. 76:584-591; Parmacek et al. (1994) Mol. Cell. Biol. 14:1870-1885; Hunter et al. (1993) Hypertension 22:608-617; and Sartorelli et al. (1992) Proc. Natl. Acad. Sci. USA 89:4047-4051.

平滑肌特异性空间受限启动子包括但不限于SM22α启动子(参见，例如，Akyürek等人(2000)Mol.Med.[分子医学]6：983；和美国专利号7,169,874)；平滑肌细胞分化特异性抗原(smoothelin)启动子(参见例如，WO 2001/018048)；α-平滑肌肌动蛋白启动子；等。例如，已显示SM22α启动子的0.4kb区域(其中包含两个CArG元件)介导血管平滑肌细胞特异性表达(参见，例如，Kim，等人(1997)Mol.Cell.Biol.[分子细胞生物学]17，2266-2278；Li，等人，(1996)J.Cell Biol.[细胞生物学杂志]132，849-859；和Moessler，等人(1996)Development[发育]122，2415-2425)。Smooth muscle-specific spatially restricted promoters include, but are not limited to, the SM22α promoter (see, e.g., Akyürek et al. (2000) Mol. Med. 6:983; and U.S. Pat. No. 7,169,874); the smooth muscle cell differentiation-specific antigen (smoothelin) promoter (see, e.g., WO 2001/018048); the α-smooth muscle actin promoter; etc. For example, a 0.4 kb region of the SM22α promoter, which contains two CArG elements, has been shown to mediate vascular smooth muscle cell-specific expression (see, e.g., Kim, et al. (1997) Mol. Cell. Biol. 17, 2266-2278; Li, et al. (1996) J. Cell Biol. 132, 849-859; and Moessler, et al. (1996) Development 122, 2415-2425).

光感受器特异性的空间上受限的启动子包括但不限于视紫红质启动子；视紫红质激酶启动子(Young等人(2003)Ophthalmol.Vis.Sci.[眼科和视觉科学]44：4076)；β磷酸二酯酶基因启动子(Nicoud等人(2007)J.Gene Med.[基因医学杂志]9：1015)；视网膜色素变性基因启动子(Nicoud等人(2007)同上)；光感受器间类视黄醇结合蛋白(IRBP)基因增强子(Nicoud等人(2007)同上)；IRBP基因启动子(Yokoyama等人(1992)Exp Eye Res.[实验眼科研究杂志]55：225)；等。Photoreceptor-specific, spatially restricted promoters include, but are not limited to, the rhodopsin promoter; the rhodopsin kinase promoter (Young et al. (2003) Ophthalmol. Vis. Sci. 44:4076); the beta phosphodiesterase gene promoter (Nicoud et al. (2007) J. Gene Med. 9:1015); the retinitis pigmentosa gene promoter (Nicoud et al. (2007) supra); the interphotoreceptor retinoid binding protein (IRBP) gene enhancer (Nicoud et al. (2007) supra); the IRBP gene promoter (Yokoyama et al. (1992) Exp Eye Res. 55:225); etc.

在一些实施例中，基因修饰系统，例如编码基因修饰多肽的DNA、编码模板RNA的DNA或编码异源对象序列的DNA或RNA，被设计成使得一个或多个元件可操作地连接到组织特异性启动子，例如在T细胞中有活性的启动子。在另外的实施例中，T细胞活性启动子在其他细胞类型例如B细胞、NK细胞中是无活性的。在一些实施例中，T细胞活性启动子源自编码T细胞受体组分(例如TRAC、TRBC、TRGC、TRDC)的基因的启动子。在一些实施例中，T细胞活性启动子源自编码T细胞特异性分化蛋白簇(例如CD3，例如CD3D、CD3E、CD3G、CD3Z)的组分的基因的启动子。在一些实施例中，通过比较跨细胞类型的公开可用的基因表达数据并从在T细胞中具有增强的表达的基因中选择启动子来发现基因修饰系统中的T细胞特异性启动子。在一些实施例中，可以根据所期望的表达宽度选择启动子，例如仅在T细胞中具有活性的启动子、仅在NK细胞中具有活性的启动子、在T细胞和NK细胞中都具有活性的启动子。In some embodiments, a gene modification system, such as a DNA encoding a gene modification polypeptide, a DNA encoding a template RNA, or a DNA or RNA encoding a heterologous object sequence, is designed so that one or more elements are operably connected to a tissue-specific promoter, such as a promoter active in a T cell. In another embodiment, a T cell active promoter is inactive in other cell types such as B cells and NK cells. In some embodiments, a T cell active promoter is derived from a promoter of a gene encoding a T cell receptor component (such as TRAC, TRBC, TRGC, TRDC). In some embodiments, a T cell active promoter is derived from a promoter of a gene encoding a component of a T cell specific differentiation protein cluster (such as CD3, such as CD3D, CD3E, CD3G, CD3Z). In some embodiments, a T cell specific promoter in a gene modification system is found by comparing publicly available gene expression data across cell types and selecting a promoter from a gene with enhanced expression in a T cell. In some embodiments, a promoter can be selected according to a desired expression width, such as a promoter active only in a T cell, a promoter active only in a NK cell, or a promoter active in both a T cell and a NK cell.

本领域已知的细胞特异性启动子可用于引导基因修饰蛋白的表达，例如，如本文所述。非限制性示例性哺乳动物细胞特异性启动子已被表征并用于以细胞特异性方式表达Cre重组酶的小鼠。某些非限制性示例性哺乳动物细胞特异性启动子列于US 9845481的表1中，该文献通过援引并入本文。Cell-specific promoters known in the art can be used to direct the expression of the gene-modified protein, for example, as described herein. Non-limiting exemplary mammalian cell-specific promoters have been characterized and used in mice expressing Cre recombinase in a cell-specific manner. Certain non-limiting exemplary mammalian cell-specific promoters are listed in Table 1 of US 9845481, which is incorporated herein by reference.

在一些实施例中，如本文所述的载体包含表达盒。典型地，表达盒包含本发明的与启动子序列可操作地连接的核酸分子。例如，当启动子能够影响编码序列的表达时，则该启动子与该编码序列可操作地连接(例如，该编码序列在该启动子的转录控制之下)。编码序列能以有义或反义取向与调节序列可操作地连接。在某些实施例中，启动子是异源启动子。在某些实施例中，表达盒可以包含另外的元件，例如，内含子、增强子、聚腺苷酸化位点、土拨鼠反应元件(WRE)和/或已知影响编码序列表达水平的其他元件。启动子典型地控制编码序列或功能性RNA的表达。在某些实施例中，启动子序列包含近端和更远端上游元件，并可以进一步包含增强子元件。增强子典型地可以刺激启动子的活性，且可以是启动子的固有元件或插入以增强启动子的水平或组织特异性的异源元件。在某些实施例中，启动子整体源自天然基因。在某些实施例中，启动子由源自不同天然存在的启动子的不同元件构成。在某些实施例中，启动子包含合成的核苷酸序列。本领域技术人员将理解，不同的启动子将引导基因在不同的组织或细胞类型中、或在不同的发育阶段、或应答于不同的环境条件或应答于药物或转录辅助因子的存在或不存在的表达。无处不在的、细胞类型特异性的、组织特异性的、发育阶段特异性的和条件性的启动子，例如，药物反应性启动子(例如，四环素反应性启动子)为本领域技术人员所熟知。示例性启动子包括但不限于：磷酸甘油酸激酶(PKG)启动子、CAG(CMV增强子、鸡β肌动蛋白启动子(CBA)和兔β珠蛋白内含子的复合物)、NSE(神经元特异性烯醇化酶)、突触蛋白或NeuN启动子、SV40早期启动子、小鼠乳腺肿瘤病毒LTR启动子；腺病毒主要晚期启动子(Ad MLP)、单纯疱疹病毒(HSV)启动子、巨细胞病毒(CMV)启动子例如CMV立即早期启动子区(CMVIE)、SFFV启动子、劳斯肉瘤病毒(RSV)启动子、合成启动子、杂合启动子等。其他启动子可以是人类来源的或来自其他物种(包括来自小鼠)。常见的启动子包括例如：人巨细胞病毒(CMV)立即早期基因启动子、SV40早期启动子、劳斯肉瘤病毒长末端重复序列、[β]-肌动蛋白、大鼠胰岛素启动子、磷酸甘油酸激酶启动子、人α-1抗胰蛋白酶(hAAT)启动子、甲状腺素转运蛋白启动子、TBG启动子和其他肝特异性启动子、结蛋白启动子和类似的肌肉特异性启动子、EF1-α启动子、具有多组织特异性的杂合启动子、对神经元特异的启动子(如突触蛋白)和甘油醛-3-磷酸脱氢酶启动子，所有这些都是本领域技术人员熟知且容易获得的启动子，可用于获得目的编码序列的高水平表达。另外，源自非病毒基因(如鼠金属硫蛋白基因)的序列也将在本文找到用途。此类启动子序列可商购自例如Stratagene公司(Stratagene)(加利福尼亚州圣地亚哥(San Diego，CA))。另外的示例性启动子序列描述于例如WO 2018213786 A1(其通过援引以其全文并入本文)中。In some embodiments, the vector as described herein comprises an expression cassette. Typically, the expression cassette comprises a nucleic acid molecule operably connected to a promoter sequence of the present invention. For example, when a promoter can affect the expression of a coding sequence, the promoter is operably connected to the coding sequence (for example, the coding sequence is under the transcriptional control of the promoter). The coding sequence can be operably connected to a regulatory sequence in a sense or antisense orientation. In certain embodiments, the promoter is a heterologous promoter. In certain embodiments, the expression cassette may comprise additional elements, for example, introns, enhancers, polyadenylation sites, woodchuck response elements (WREs) and/or other elements known to affect the expression level of the coding sequence. The promoter typically controls the expression of a coding sequence or a functional RNA. In certain embodiments, the promoter sequence comprises proximal and more distal upstream elements, and may further comprise an enhancer element. An enhancer typically can stimulate the activity of a promoter, and may be an intrinsic element of a promoter or a heterologous element inserted to enhance the level or tissue specificity of a promoter. In certain embodiments, the promoter as a whole is derived from a natural gene. In certain embodiments, the promoter is composed of different elements derived from different naturally occurring promoters. In certain embodiments, the promoter comprises a synthetic nucleotide sequence. It will be appreciated by those skilled in the art that different promoters will direct the expression of genes in different tissues or cell types, or at different developmental stages, or in response to different environmental conditions or in response to the presence or absence of drugs or transcriptional cofactors. Ubiquitous, cell type-specific, tissue-specific, developmental stage-specific and conditional promoters, for example, drug-responsive promoters (e.g., tetracycline-responsive promoters) are well known to those skilled in the art. Exemplary promoters include, but are not limited to, phosphoglycerate kinase (PKG) promoter, CAG (complex of CMV enhancer, chicken beta actin promoter (CBA) and rabbit beta globin intron), NSE (neuron-specific enolase), synapsin or NeuN promoter, SV40 early promoter, mouse mammary tumor virus LTR promoter; adenovirus major late promoter (Ad MLP), herpes simplex virus (HSV) promoter, cytomegalovirus (CMV) promoter such as CMV immediate early promoter region (CMVIE), SFFV promoter, Rous sarcoma virus (RSV) promoter, synthetic promoters, hybrid promoters, etc. Other promoters may be of human origin or from other species including from mice. Common promoters include, for example, human cytomegalovirus (CMV) immediate early gene promoter, SV40 early promoter, Rous sarcoma virus long terminal repeat, [β]-actin, rat insulin promoter, phosphoglycerate kinase promoter, human α-1 antitrypsin (hAAT) promoter, thyroxine transporter promoter, TBG promoter and other liver-specific promoters, desmin promoter and similar muscle-specific promoters, EF1-α promoter, hybrid promoters with multi-tissue specificity, promoters specific to neurons (such as synaptophysin) and glyceraldehyde-3-phosphate dehydrogenase promoter, all of which are well known and readily available to those skilled in the art, and can be used to obtain high-level expression of the target coding sequence. In addition, sequences derived from non-viral genes (such as mouse metallothionein genes) will also find use herein. Such promoter sequences are commercially available, for example, from Stratagene (San Diego, CA). Additional exemplary promoter sequences are described, for example, in WO 2018213786 A1 (which is incorporated herein by reference in its entirety).

在一些实施例中，载脂蛋白E增强子(ApoE)或其功能片段用于例如促使在肝中的表达。在一些实施例中，使用两个拷贝的ApoE增强子或其功能片段。在一些实施例中，ApoE增强子或其功能片段与启动子(例如，人α-1抗胰蛋白酶(hAAT)启动子)组合使用。In some embodiments, the apolipoprotein E enhancer (ApoE) or a functional fragment thereof is used, for example, to promote expression in the liver. In some embodiments, two copies of the ApoE enhancer or a functional fragment thereof are used. In some embodiments, the ApoE enhancer or a functional fragment thereof is used in combination with a promoter (e.g., human alpha-1 antitrypsin (hAAT) promoter).

在一些实施例中，调节序列赋予组织特异性基因表达能力。在一些情况下，组织特异性调节序列结合以组织特异性方式诱导转录的组织特异性转录因子。各种组织特异性调节序列(例如，启动子、增强子等)是本领域已知的。示例性组织特异性调节序列包括但不限于以下组织特异性启动子：肝特异性甲状腺素结合球蛋白(TBG)启动子、胰岛素启动子、胰高血糖素启动子、生长抑素启动子、胰多肽(PPY)启动子、突触蛋白-1(Syn)启动子、肌酸激酶(MCK)启动子、哺乳动物结蛋白(DES)启动子、α-肌球蛋白重链(a-MHC)启动子或心肌肌钙蛋白T(cTnT)启动子。其他示例性启动子包括：β-肌动蛋白启动子、乙型肝炎病毒核心启动子，Sandig等人，GeneTher.[基因疗法]，3：1002-9(1996)；甲胎蛋白(AFP)启动子，Arbuthnot等人，Hum.Gene Ther.[人类基因疗法]，7：1503-14(1996))，骨钙素启动子(Stein等人，Mol.Biol.Rep.[分子生物学报告]，24：185-96(1997))；骨唾液蛋白启动子(Chen等人，J.Bone Miner.Res.[骨与矿物质研究杂志]11：654-64(1996))，CD2启动子(Hansal等人，J.Immunol.[免疫学杂志]，161：1063-8(1998)；免疫球蛋白重链启动子；T细胞受体α链启动子，神经元例如神经元特异性烯醇化酶(NSE)启动子(Andersen等人Cell.Mol.Neurobiol.[细胞和分子神经生物学]，13：503-15(1993))，神经丝轻链基因启动子(Piccioli等人，Proc.Natl.Acad.Sci.USA[美国国家科学院院刊]，88：5611-5(1991))，和神经元特异性vgf基因启动子(Piccioli等人，Neuron[神经元]，15：373-84(1995))，以及其他。另外的示例性启动子序列描述于例如美国专利号10300146(其通过援引以其全文并入本文)中。在一些实施例中，组织特异性调节元件(例如，组织特异性启动子)选自已知与在给定组织中高度表达的基因可操作地连接的一种，例如，如通过RNA-seq或蛋白质表达数据、或其组合所测量的。用于通过表达分析组织特异性的方法教授于Fagerberg等人MolCell Proteomics[分子与细胞蛋白质组学]13(2)：397-406(2014)中，该文献通过援引以其全文并入本文。In certain embodiments, regulatory sequences confer tissue-specific gene expression capability. In some cases, tissue-specific regulatory sequences are combined with tissue-specific transcription factors that induce transcription in a tissue-specific manner. Various tissue-specific regulatory sequences (e.g., promoters, enhancers, etc.) are known in the art. Exemplary tissue-specific regulatory sequences include, but are not limited to, the following tissue-specific promoters: liver-specific thyroxine binding globulin (TBG) promoter, insulin promoter, glucagon promoter, somatostatin promoter, pancreatic polypeptide (PPY) promoter, synaptophysin-1 (Syn) promoter, creatine kinase (MCK) promoter, mammalian desmin (DES) promoter, alpha-myosin heavy chain (a-MHC) promoter, or cardiac troponin T (cTnT) promoter. Other exemplary promoters include: β-actin promoter, hepatitis B virus core promoter, Sandig et al., Gene Ther. [Gene Ther.], 3:1002-9 (1996); alpha-fetoprotein (AFP) promoter, Arbuthnot et al., Hum. Gene Ther. [Human Gene Ther.], 7:1503-14 (1996)), osteocalcin promoter (Stein et al., Mol. Biol. Rep. [Molecular Biology Reports], 24:185-96 (1997)); bone sialoprotein promoter (Chen et al., J. Bone Miner. Res. [Journal of Bone and Mineral Research] 11: 654-64 (1996)), CD2 promoter (Hansal et al., J. Immunol. [Journal of Immunology], 161: 1063-8 (1998); immunoglobulin heavy chain promoter; T cell receptor alpha chain promoter, neurons such as neuron-specific enolase (NSE) promoter (Andersen et al. Cell. Mol. Neurobiol. [Cellular and Molecular Neurobiology], 13: 503-15 (1993)), neurofilament light chain gene promoter (Piccioli et al., Proc. Natl. Acad. Sci. USA [Proceedings of the National Academy of Sciences of the United States of America], 88:5611-5 (1991)), and the neuron-specific vgf gene promoter (Piccioli et al., Neuron [Neuron], 15:373-84 (1995)), among others. Additional exemplary promoter sequences are described, for example, in U.S. Patent No. 10300146 (which is incorporated herein by reference in its entirety). In some embodiments, the tissue-specific regulatory element (e.g., tissue-specific promoter) is selected from one known to be operably linked to a gene that is highly expressed in a given tissue, for example, as measured by RNA-seq or protein expression data, or a combination thereof. Methods for analyzing tissue specificity by expression are taught in Fagerberg et al. Mol Cell Proteomics [Molecular and Cellular Proteomics] 13 (2): 397-406 (2014), which is incorporated herein by reference in its entirety.

在一些实施例中，本文所述的载体是多顺反子表达构建体。多顺反子表达构建体包括例如携带第一表达盒和第二表达盒的构建体，该第一表达盒例如包含第一启动子和第一编码核酸序列，该第二表达盒例如包含第二启动子和第二编码核酸序列。在一些情况下，此类多顺反子表达构建体可特别用于递送非翻译基因产物(例如发夹RNA)以及多肽(例如，基因修饰多肽和基因修饰模板)。在一些实施例中，多顺反子表达构建体可以表现出一种或多种所包括的转基因的降低的表达水平，例如，这是因为启动子干扰或存在非常接近的不相容的核酸元件。如果多顺反子表达构建体是病毒载体的一部分，则在一些情况下，自互补核酸序列的存在可能会干扰病毒繁殖或包装所需结构的形成。In certain embodiments, carrier as described herein is a polycistronic expression construct.Polycistronic expression constructs include, for example, a construct carrying a first expression cassette and a second expression cassette, the first expression cassette for example comprising a first promoter and a first encoding nucleic acid sequence, and the second expression cassette for example comprising a second promoter and a second encoding nucleic acid sequence.In some cases, such polycistronic expression constructs can be particularly useful for delivering non-translated gene products (e.g., hairpin RNA) and polypeptides (e.g., genetically modified polypeptides and genetically modified templates).In certain embodiments, polycistronic expression constructs can show the expression level of one or more included transgenic reductions, for example, because promoters interfere or there are very close incompatible nucleic acid elements.If polycistronic expression constructs are a part of a viral vector, then in some cases, the presence of self-complementary nucleic acid sequences may interfere with the formation of viral propagation or packaging required structure.

在一些实施例中，序列编码含发夹的RNA。在一些实施例中，发夹RNA是指导RNA、模板RNA、shRNA、或微小RNA。在一些实施例中，第一启动子是RNA聚合酶I启动子。在一些实施例中，第一启动子是RNA聚合酶II启动子。在一些实施例中，第二启动子是RNA聚合酶III启动子。在一些实施例中，第二启动子是U6或H1启动子。In some embodiments, the sequence encodes a hairpin-containing RNA. In some embodiments, the hairpin RNA is a guide RNA, a template RNA, a shRNA, or a microRNA. In some embodiments, the first promoter is an RNA polymerase I promoter. In some embodiments, the first promoter is an RNA polymerase II promoter. In some embodiments, the second promoter is an RNA polymerase III promoter. In some embodiments, the second promoter is a U6 or H1 promoter.

不希望受理论束缚，与含有仅一个顺反子的表达系统相比，多顺反子表达构建体可能无法实现最佳的表达水平。利用包含两个或更多个启动子元件的多顺反子表达构建体实现的表达水平降低的所认为的原因之一是启动子干扰现象(参见例如，Curtin J A，DaneA P，Swanson A，Alexander I E，Ginn S L.Bidirectional promoter interferencebetween two widely used internal heterologous promoters in a late-generationlentiviral construct[晚期慢病毒构建体中两个广泛使用的内部异源启动子之间的双向启动子干扰].Gene Ther.[基因疗法]2008年3月；15(5)：384-90；和Martin-Duque P，Jezzard S，Kaftansis L，Vassaux G.Direct comparison of the insulatingproperties of two genetic elements in an adenoviral vector containing twodifferent expression cassettes[在含有两个不同表达盒的腺病毒载体中对两个遗传元件的绝缘特性的直接比较].Hum Gene Ther.[人类基因疗法]2004年10月；15(10)：995-1002；两个参考文献均通过援引并入本文以披露启动子干扰现象)。在一些实施例中，可以通过以下克服启动子干扰的问题，例如通过产生包含仅一个启动子的多顺反子表达构建体，该启动子促进由内部核糖体进入位点分开的多个编码核酸序列的转录；或通过将包含具有转录绝缘子元件的自身启动子的顺反子分开。在一些实施例中，多个顺反子的单启动子驱动的表达可能导致顺反子的不均匀表达水平。在一些实施例中，不能有效地分离启动子并且分离元件可能与一些基因转移载体(例如，一些逆转录病毒载体)不相容。Without wishing to be bound by theory, multicistronic expression constructs may not achieve optimal expression levels compared to expression systems containing only one cistron. One of the proposed reasons for the reduced expression levels achieved with polycistronic expression constructs containing two or more promoter elements is the phenomenon of promoter interference (see, e.g., Curtin JA, Dane AP, Swanson A, Alexander IE, Ginn SL. Bidirectional promoter interference between two widely used internal heterologous promoters in a late-generation lentiviral construct. Gene Ther. 2008 Mar;15(5):384-90; and Martin-Duque P, Jezzard S, Kaftansis L, Vassaux G. Direct comparison of the insulating properties of two genetic elements in an adenoviral vector containing two different expression cassettes. Hum Gene Ther. 2008 Mar;15(5):384-90; and Martin-Duque P, Jezzard S, Kaftansis L, Vassaux G. Direct comparison of the insulating properties of two genetic elements in an adenoviral vector containing two different expression cassettes. Hum Gene Ther. 2008 Mar;15(5):384-90; and Ther. [Human Gene Therapy] 2004 Oct; 15(10): 995-1002; both references are incorporated herein by reference for their disclosure of the promoter interference phenomenon). In some embodiments, the problem of promoter interference can be overcome by, for example, generating a polycistronic expression construct comprising only one promoter that promotes transcription of multiple encoding nucleic acid sequences separated by an internal ribosome entry site; or by separating cistrons comprising their own promoters with transcription insulator elements. In some embodiments, single promoter-driven expression of multiple cistrons may result in uneven expression levels of the cistrons. In some embodiments, the promoter cannot be effectively separated and the separation element may be incompatible with some gene transfer vectors (e.g., some retroviral vectors).

微小RNAMicroRNA

微小RNA(miRNA)和其他小干扰核酸通常经由靶RNA转录本切割/降解或靶信使RNA(mRNA)的翻译阻抑来调节基因表达。在一些情况下，miRNA可以天然表达，典型地作为最终的19-25个非翻译的RNA产物。miRNA通常通过与靶mRNA的3′非翻译区(UTR)的序列特异性相互作用来表现出它们的活性。这些内源表达的miRNA可形成发夹前体，随后被加工成miRNA双链体，并进一步加工成成熟的单链miRNA分子。这种成熟的miRNA通常会指导多蛋白复合物miRISC，它根据靶mRNA与成熟miRNA的互补性来识别靶mRNA的3′UTR区。有用的转基因产物可以包括例如调节连接的多肽表达的miRNA或miRNA结合位点。miRNA基因的非限制性列表；例如，在如US 10300146，22：25-25：48(其通过援引并入本文)中所列的那些方法的方法中，这些基因及其同源物的产物可用作转基因或用作小干扰核酸(例如，miRNA海绵、反义寡核苷酸)的靶标。在一些实施例中，将一个或多个前述miRNA的一个或多个结合位点掺入转基因(例如，由rAAV载体递送的转基因)中，例如以抑制转基因在携带该转基因的动物的一种或多种组织中的表达。在一些实施例中，可以选择结合位点以便以组织特异性方式控制转基因的表达。例如，可以将肝脏特异性miR-122的结合位点掺入转基因中以抑制该转基因在肝脏中的表达。另外的示例性miRNA序列描述于例如美国专利号10,300,146(其通过援引以其全文并入本文)中。MicroRNA (miRNA) and other small interfering nucleic acids usually regulate gene expression via target RNA transcript cutting/degradation or translation repression of target messenger RNA (mRNA). In some cases, miRNA can be naturally expressed, typically as the final 19-25 non-translated RNA products. MiRNA usually exhibits their activity through sequence-specific interactions with the 3' untranslated region (UTR) of the target mRNA. These endogenously expressed miRNAs can form hairpin precursors, which are subsequently processed into miRNA duplexes and further processed into mature single-stranded miRNA molecules. This mature miRNA usually guides the multiprotein complex miRISC, which recognizes the 3'UTR region of the target mRNA according to the complementarity of the target mRNA and the mature miRNA. Useful transgenic products can include, for example, miRNA or miRNA binding sites that regulate the expression of the polypeptide connected. A non-limiting list of miRNA genes; for example, in methods such as those listed in US 10300146, 22: 25-25: 48 (which is incorporated herein by reference), the products of these genes and their homologs can be used as transgenes or as targets for small interfering nucleic acids (e.g., miRNA sponges, antisense oligonucleotides). In some embodiments, one or more binding sites for one or more of the aforementioned miRNAs are incorporated into a transgene (e.g., a transgene delivered by a rAAV vector), for example to inhibit the expression of the transgene in one or more tissues of an animal carrying the transgene. In some embodiments, the binding site can be selected so as to control the expression of the transgene in a tissue-specific manner. For example, a binding site for liver-specific miR-122 can be incorporated into a transgene to inhibit the expression of the transgene in the liver. Additional exemplary miRNA sequences are described, for example, in U.S. Pat. No. 10,300,146 (which is incorporated herein by reference in its entirety).

miR抑制剂或miRNA抑制剂通常是阻断miRNA表达和/或加工的药剂。此类药剂的实例包括但不限于：抑制miRNA与Drosha复合物相互作用的微小RNA拮抗剂、微小RNA特异性反义、微小RNA海绵和微小RNA寡核苷酸(双链、发夹、短寡核苷酸)。微小RNA抑制剂，例如miRNA海绵，可以在细胞中从转基因表达(例如，如Ebert，M.S.Nature Methods[自然方法]，2007年8月12日电子出版中所述；其通过援引以其全文并入本文)。在一些实施例中，微小RNA海绵或其他miR抑制剂与AAV一起使用。微小RNA海绵通常通过互补的七聚体种子序列特异性抑制miRNA。在一些实施例中，可以使用单个海绵序列沉默整个miRNA家族。其他用于在细胞中沉默miRNA功能(miRNA靶标的去阻抑)的方法对于本领域普通技术人员来说将是显而易见的。MiR inhibitors or miRNA inhibitors are generally agents that block miRNA expression and/or processing. Examples of such agents include, but are not limited to, microRNA antagonists, microRNA-specific antisense, microRNA sponges, and microRNA oligonucleotides (double-stranded, hairpins, short oligonucleotides) that inhibit the interaction of miRNA with the Drosha complex. MicroRNA inhibitors, such as miRNA sponges, can be expressed in cells from transgenics (e.g., as described in Ebert, M.S. Nature Methods [Natural Methods], August 12, 2007, electronic publication; it is incorporated herein by reference in its entirety). In some embodiments, microRNA sponges or other miR inhibitors are used together with AAV. MicroRNA sponges typically specifically inhibit miRNAs through complementary heptamer seed sequences. In some embodiments, a single sponge sequence can be used to silence the entire miRNA family. Other methods for silencing miRNA function (de-repression of miRNA targets) in cells will be apparent to those of ordinary skill in the art.

在一些实施例中，本文所述的基因修饰系统、模板RNA或多肽被施用至靶组织(例如第一组织)或在靶组织(例如第一组织)中具有活性(例如，在其中更具活性)。在一些实施例中，基因修饰系统、模板RNA或多肽未施用于非靶组织或在非靶组织中活性较低(例如，在其中不具有活性)。在一些实施例中，本文所述的基因修饰系统、模板RNA或多肽可用于修饰靶组织(例如第一组织)中的DNA(例如，并且不修饰非靶组织中的DNA)。In some embodiments, the gene modification systems, template RNAs, or polypeptides described herein are administered to a target tissue (e.g., a first tissue) or are active in a target tissue (e.g., a first tissue) (e.g., more active therein). In some embodiments, the gene modification systems, template RNAs, or polypeptides are not administered to non-target tissues or are less active in non-target tissues (e.g., not active therein). In some embodiments, the gene modification systems, template RNAs, or polypeptides described herein can be used to modify DNA in a target tissue (e.g., a first tissue) (e.g., and not modify DNA in non-target tissues).

在一些实施例中，基因修饰系统包含(a)本文所述的多肽或编码其的核酸，(b)本文所述的模板核酸(例如，模板RNA)，和(c)对靶组织特异性的一个或多个第一组织特异性表达控制序列，其中对靶组织特异性的一个或多个第一组织特异性表达控制序列与(a)、(b)、或(a)和(b)可操作地关联，其中，当与(a)关联时，(a)包含编码多肽的核酸。In some embodiments, the gene modification system comprises (a) a polypeptide described herein or a nucleic acid encoding the same, (b) a template nucleic acid described herein (e.g., a template RNA), and (c) one or more first tissue-specific expression control sequences specific for a target tissue, wherein the one or more first tissue-specific expression control sequences specific for a target tissue are operably associated with (a), (b), or (a) and (b), wherein, when associated with (a), (a) comprises a nucleic acid encoding a polypeptide.

在一些实施例中，(b)中的核酸包含RNA。In some embodiments, the nucleic acid in (b) comprises RNA.

在一些实施例中，(b)中的核酸包含DNA。In some embodiments, the nucleic acid in (b) comprises DNA.

在一些实施例中，(b)中的核酸：(i)是单链区段或包含单链区段，例如，是单链DNA或包含单链区段和一个或多个双链区段；(ii)具有反向末端重复序列；或(iii)(i)和(ii)两者。In some embodiments, the nucleic acid in (b): (i) is a single-stranded segment or comprises a single-stranded segment, for example, is a single-stranded DNA or comprises a single-stranded segment and one or more double-stranded segments; (ii) has an inverted terminal repeat sequence; or (iii) both (i) and (ii).

在一些实施例中，(b)中的核酸是双链区段或包含双链区段。In some embodiments, the nucleic acid in (b) is or comprises a double-stranded segment.

在一些实施例中，(a)包含编码多肽的核酸。In some embodiments, (a) comprises a nucleic acid encoding a polypeptide.

在一些实施例中，(a)中的核酸包含RNA。In some embodiments, the nucleic acid in (a) comprises RNA.

在一些实施例中，(a)中的核酸包含DNA。In some embodiments, the nucleic acid in (a) comprises DNA.

在一些实施例中，(a)中的核酸：(i)是单链区段或包含单链区段，例如，是单链DNA或包含单链区段和一个或多个双链区段；(ii)具有反向末端重复序列；或(iii)(i)和(ii)两者。In some embodiments, the nucleic acid in (a): (i) is a single-stranded segment or comprises a single-stranded segment, for example, is a single-stranded DNA or comprises a single-stranded segment and one or more double-stranded segments; (ii) has an inverted terminal repeat sequence; or (iii) both (i) and (ii).

在一些实施例中，(a)中的核酸是双链区段或包含双链区段。In some embodiments, the nucleic acid in (a) is or comprises a double-stranded segment.

在一些实施例中，(a)、(b)、或(a)和(b)中的核酸是线性的。In some embodiments, the nucleic acid in (a), (b), or (a) and (b) is linear.

在一些实施例中，(a)、(b)、或(a)和(b)中的核酸是环状的，例如质粒或小环。In some embodiments, the nucleic acid in (a), (b), or (a) and (b) is circular, such as a plasmid or a minicircle.

在一些实施例中，异源对象序列与第一启动子可操作地关联。In some embodiments, the heterologous subject sequence is operably associated with a first promoter.

在一些实施例中，一个或多个第一组织特异性表达控制序列包含组织特异性启动子。In some embodiments, the one or more first tissue-specific expression control sequences comprise a tissue-specific promoter.

在一些实施例中，组织特异性启动子包含与以下可操作地关联的第一启动子：(i)异源对象序列，(ii)编码逆转录病毒RT的核酸，或(iii)(i)和(ii)。In some embodiments, the tissue-specific promoter comprises a first promoter operably associated with: (i) a heterologous subject sequence, (ii) a nucleic acid encoding a retroviral RT, or (iii) both (i) and (ii).

在一些实施例中，一个或多个第一组织特异性表达控制序列包含与以下可操作地关联的组织特异性微小RNA识别序列：(i)异源对象序列，(ii)编码逆转录病毒RT结构域的核酸，或(iii)(i)和(ii)。In some embodiments, the one or more first tissue-specific expression control sequences comprise a tissue-specific microRNA recognition sequence operably associated with: (i) a heterologous subject sequence, (ii) a nucleic acid encoding a retroviral RT domain, or (iii) (i) and (ii).

在一些实施例中，系统包含组织特异性启动子，并且该系统进一步包含一种或多种组织特异性微小RNA识别序列，其中：(i)组织特异性启动子与以下可操作地关联：(I)异源对象序列，(II)编码逆转录病毒RT结构域的核酸，或(III)(I)和(II)；和/或(ii)一种或多种组织特异性微小RNA识别序列与以下可操作地关联：(I)异源对象序列，(II)编码逆转录病毒RT的核酸，或(III)(I)和(II)。In some embodiments, the system comprises a tissue-specific promoter, and the system further comprises one or more tissue-specific microRNA recognition sequences, wherein: (i) the tissue-specific promoter is operably associated with: (I) a heterologous subject sequence, (II) a nucleic acid encoding a retroviral RT domain, or (III) (I) and (II); and/or (ii) the one or more tissue-specific microRNA recognition sequences are operably associated with: (I) a heterologous subject sequence, (II) a nucleic acid encoding a retroviral RT, or (III) (I) and (II).

在一些实施例中，其中(a)包含编码多肽的核酸，该核酸包含与编码多肽的核酸可操作地关联的启动子。In some embodiments, where (a) comprises a nucleic acid encoding a polypeptide, the nucleic acid comprises a promoter operably associated with the nucleic acid encoding the polypeptide.

在一些实施例中，编码多肽的核酸包含对靶组织具有特异性的、与多肽编码序列可操作地关联的一个或多个第二组织特异性表达控制序列。In some embodiments, the nucleic acid encoding the polypeptide comprises one or more second tissue-specific expression control sequences specific for the target tissue operably associated with the polypeptide encoding sequence.

在一些实施例中，一个或多个第二组织特异性表达控制序列包含组织特异性启动子。In some embodiments, the one or more second tissue-specific expression control sequences comprise a tissue-specific promoter.

在一些实施例中，组织特异性启动子是与编码多肽的核酸可操作地关联的启动子。In some embodiments, a tissue-specific promoter is a promoter operably associated with a nucleic acid encoding a polypeptide.

在一些实施例中，一个或多个第二组织特异性表达控制序列包含组织特异性微小RNA识别序列。In some embodiments, the one or more second tissue-specific expression control sequences comprise a tissue-specific microRNA recognition sequence.

在一些实施例中，与编码多肽的核酸可操作地关联的启动子是组织特异性启动子，该系统进一步包含一个或多个组织特异性微小RNA识别序列。In some embodiments, the promoter operably associated with the nucleic acid encoding the polypeptide is a tissue-specific promoter, and the system further comprises one or more tissue-specific microRNA recognition sequences.

在一些实施例中，本发明提供的系统的核酸组分序列(例如，编码多肽或包含异源对象序列)的侧翼是修饰蛋白质表达水平的非翻译区(UTR)。各种5′和3’UTR会影响蛋白质表达。例如，在一些实施例中，编码序列之前可以是修饰RNA稳定性或蛋白质翻译的5′UTR。在一些实施例中，序列之后可以是修饰RNA稳定性或翻译的3′UTR。在一些实施例中，序列之前可以是5′UTR，然后是修饰RNA稳定性或翻译的3′UTR。在一些实施例中，5′和/或3′UTR可选自补体因子3(C3)(CACTCCTCCCCATCCTCTCCCTCTGTCCCTCTGTCCCTCTGACCCTGCACTGTCCCAGCACC；SEQ ID NO：11,004)或血清类黏蛋白1(ORM1)(CAGGACACAGCCTTGGATCAGGACAGAGACTTGGGGGCCATCCTGCCCCTCCAACCCGACATGTGTACCTCAGCTTTTTCCCTCACTTGCATCAATAAAGCTTCTGTGTTTGGAACAGCTAA；SEQ ID NO：11,005)的5′和3′UTR(Asrani等人RNA Biology[RNA生物学]2018)。在某些实施例中，5′UTR是来自C3的5′UTR并且3′UTR是来自ORM1的3′UTR。在某些实施例中，用于蛋白质表达(例如基因修饰多肽或异源对象序列的mRNA(或编码RNA的DNA))的5′UTR和3′UTR包含优化的表达序列。在一些实施例中，5′UTR包含GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACC(SEQ ID NO：11,006)和/或3′UTR包含UGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGA(SEQ ID NO：11,007)，例如，如Richner等人Cell[细胞]168(6)：P1114-1125(2017)所述，其序列通过援引并入本文。在一些实施例中，可以选择5′和/或3′UTR来增强蛋白质表达。在一些实施例中，可以选择5′和/或3′UTR来修饰蛋白质表达，从而最大程度地减少过度生产抑制。在一些实施例中，UTR在编码序列周围，例如在编码序列之外以及在其他实施例中靠近编码序列。在一些实施例中，UTR中包含另外的调节元件(例如，miRNA结合位点、顺式调节位点)。In some embodiments, the nucleic acid component sequences of the systems provided herein (e.g., encoding polypeptides or comprising heterologous subject sequences) are flanked by untranslated regions (UTRs) that modify protein expression levels. Various 5' and 3' UTRs can affect protein expression. For example, in some embodiments, the coding sequence may be preceded by a 5' UTR that modifies RNA stability or protein translation. In some embodiments, the sequence may be followed by a 3' UTR that modifies RNA stability or translation. In some embodiments, the sequence may be preceded by a 5' UTR followed by a 3' UTR that modifies RNA stability or translation. In some embodiments, the 5′ and/or 3′ UTR may be selected from the 5′ and 3′ UTRs of complement factor 3 (C3) (CACTCCTCCCCATCCTCTCCCTCTGTCCCTCTGTCCCTCTGACCCTGCACTGTCCCAGCACC; SEQ ID NO: 11,004) or orosomucoid 1 (ORM1) (CAGGACACAGCCTTGGATCAGGACAGAGACTTGGGGGCCATCCTGCCCCTCCAACCCGACATGTGTACCTCAGCTTTTTCCCTCACTTGCATCAATAAAGCTTCTGTGTTTGGAACAGCTAA; SEQ ID NO: 11,005) (Asrani et al. RNA Biology 2018). In certain embodiments, the 5′ UTR is the 5′ UTR from C3 and the 3′ UTR is the 3′ UTR from ORM1. In certain embodiments, the 5′UTR and 3′UTR for protein expression (e.g., mRNA (or DNA encoding RNA) of a genetically modified polypeptide or heterologous subject sequence) comprise optimized expression sequences. In some embodiments, the 5′UTR comprises GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACC (SEQ ID NO: 11,006) and/or the 3′UTR comprises UGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGA (SEQ ID NO: 11,007), e.g., as described in Richner et al. Cell 168(6): P1114-1125 (2017), the sequences of which are incorporated herein by reference. In some embodiments, the 5′ and/or 3′UTR can be selected to enhance protein expression. In some embodiments, 5' and/or 3' UTRs can be selected to modify protein expression to minimize overproduction inhibition. In some embodiments, the UTR is around the coding sequence, for example, outside the coding sequence and in other embodiments, near the coding sequence. In some embodiments, additional regulatory elements (e.g., miRNA binding sites, cis-regulatory sites) are included in the UTR.

在一些实施例中，基因修饰系统的开放阅读框，例如，编码基因修饰多肽的mRNA(或编码mRNA的DNA)的ORF或异源对象序列的mRNA(或编码mRNA的DNA)的一个或多个ORF，侧翼有增强其表达的5′和/或3′非翻译区(UTR)。在一些实施例中，系统的mRNA组分(或从DNA组分产生的转录本)的5′UTR包含序列5′-GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACC-3′；SEQ ID NO：11,008)。在一些实施例中，系统的mRNA组分(或从DNA组分产生的转录本)的3′UTR包含序列5′-UGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGA-3′(SEQ ID NO：11,009)。已经由以下证明这种5′UTR和3′UTR的组合可导致可操作地连接的ORF的期望表达：Richner等人Cell[细胞]168(6)：P1114-1125(2017)，其教导和序列通过援引并入本文。在一些实施例中，本文所述的系统包含编码转录本的DNA，其中该DNA包含对应的5′UTR和3′UTR序列，其中T取代以上所列的序列中的U。在一些实施例中，用于产生系统的RNA组分的DNA载体进一步包含用于启动体外转录的5′UTR上游的启动子，例如T7、T3或SP6启动子。以上5′UTR以GGG开头，这对于使用T7RNA聚合酶优化转录是合适的开始。对于调整转录水平和改变转录起始位点核苷酸以适应替代性的5′UTR，Davidson等人.Pac Symp Biocomput[PacSymp生物计算]433-443(2010)的传授内容描述了满足这两个特征的T7启动子变体及其发现方法。In some embodiments, the open reading frame of the gene modification system, e.g., an ORF of an mRNA (or DNA encoding an mRNA) encoding a gene modification polypeptide or one or more ORFs of an mRNA (or DNA encoding an mRNA) of a heterologous subject sequence, is flanked by 5′ and/or 3′ untranslated regions (UTRs) that enhance expression thereof. In some embodiments, the 5′UTR of the mRNA component of the system (or a transcript produced from the DNA component) comprises the sequence 5′-GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACC-3′; SEQ ID NO: 11,008). In some embodiments, the 3′UTR of the mRNA component of the system (or a transcript produced from the DNA component) comprises the sequence 5′-UGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGA-3′ (SEQ ID NO: 11,009). This combination of 5′UTR and 3′UTR has been demonstrated to result in the desired expression of an operably linked ORF by Richner et al. Cell 168(6): P1114-1125 (2017), the teachings and sequences of which are incorporated herein by reference. In some embodiments, the system described herein comprises DNA encoding a transcript, wherein the DNA comprises corresponding 5′UTR and 3′UTR sequences, wherein T replaces U in the sequences listed above. In some embodiments, the DNA vector used to produce the RNA component of the system further comprises a promoter upstream of the 5′UTR for initiating in vitro transcription, such as a T7, T3, or SP6 promoter. The above 5′UTR begins with GGG, which is a suitable start for optimizing transcription using T7 RNA polymerase. For adjusting transcription levels and changing the transcription start site nucleotide to accommodate alternative 5′UTRs, the teachings of Davidson et al. Pac Symp Biocomput 433-443 (2010) describe T7 promoter variants that meet these two characteristics and methods for discovering them.

病毒载体及其组分Viral vectors and their components

除了本文所述的相关酶或结构域的来源，例如作为本文使用的聚合酶和聚合酶功能(例如DNA-依赖性DNA聚合酶、RNA-依赖性RNA聚合酶、RNA-依赖性DNA聚合酶、DNA-依赖性RNA聚合酶、逆转录酶)的来源，病毒还是本文所述系统的有用的递送媒剂来源。一些酶，例如逆转录酶，可能具有多种活性，例如能够进行RNA-依赖性DNA聚合和DNA-依赖性DNA聚合，例如第一和第二链合成。在一些实施例中，用作基因修饰递送系统或其组分来源的病毒可选自如Baltimore Bacteriol Rev[细菌综述]35(3)：235-241(1971)所述的组。In addition to sources of relevant enzymes or domains described herein, for example, as a source of polymerases and polymerase functions (e.g., DNA-dependent DNA polymerases, RNA-dependent RNA polymerases, RNA-dependent DNA polymerases, DNA-dependent RNA polymerases, reverse transcriptases) used herein, viruses are also useful sources of delivery vehicles for the systems described herein. Some enzymes, such as reverse transcriptases, may have multiple activities, such as being capable of RNA-dependent DNA polymerization and DNA-dependent DNA polymerization, such as first and second strand synthesis. In some embodiments, viruses used as a source of gene modification delivery systems or components thereof can be selected from the group described in Baltimore Bacteriol Rev [Bacterial Review] 35(3): 235-241 (1971).

在一些实施例中，病毒选自组I病毒，例如，该病毒是DNA病毒并将dsDNA包装成病毒体。在一些实施例中，组I病毒选自例如腺病毒、疱疹病毒、痘病毒。In some embodiments, the virus is selected from group I viruses, for example, the virus is a DNA virus and packages dsDNA into virions. In some embodiments, the group I virus is selected from, for example, adenovirus, herpes virus, poxvirus.

在一些实施例中，病毒选自组II病毒，例如，该病毒是DNA病毒并将ssDNA包装成病毒体。在一些实施例中，组II病毒选自例如细小病毒。在一些实施例中，细小病毒是依赖性细小病毒，例如腺相关病毒(AAV)。In some embodiments, the virus is selected from a group II virus, e.g., the virus is a DNA virus and packages ssDNA into virions. In some embodiments, the group II virus is selected from, e.g., a parvovirus. In some embodiments, the parvovirus is a dependent parvovirus, e.g., an adeno-associated virus (AAV).

在一些实施例中，病毒选自组III病毒，例如，该病毒是RNA病毒并将dsRNA包装成病毒体。在一些实施例中，组III病毒选自例如呼肠孤病毒。在一些实施例中，包含在此类病毒体中的dsRNA的一条或两条链是编码分子，在转导至宿主细胞后能够直接用作mRNA，例如，在转导至宿主细胞后可以直接翻译成蛋白质而不需要任何干预性核酸复制或聚合步骤。In some embodiments, the virus is selected from a group III virus, e.g., the virus is an RNA virus and packages the dsRNA into a virion. In some embodiments, the group III virus is selected from, e.g., a reovirus. In some embodiments, one or both strands of the dsRNA contained in such a virion are coding molecules that can be used directly as mRNA after transduction into a host cell, e.g., can be directly translated into a protein after transduction into a host cell without any intervening nucleic acid replication or polymerization steps.

在一些实施例中，病毒选自组IV病毒，例如，该病毒是RNA病毒并将ssRNA(+)包装成病毒体。在一些实施例中，组IV病毒选自例如冠状病毒、小RNA病毒、披膜病毒。在一些实施例中，包含在此类病毒体中的ssRNA(+)是编码分子，在转导至宿主细胞后能够直接用作mRNA，例如，在转导至宿主细胞后可以直接翻译成蛋白质而不需要任何干预性核酸复制或聚合步骤。In some embodiments, the virus is selected from a group IV virus, e.g., the virus is an RNA virus and packages the ssRNA(+) into a virion. In some embodiments, the group IV virus is selected from, e.g., a coronavirus, a picornavirus, a togavirus. In some embodiments, the ssRNA(+) contained in such a virion is a coding molecule that can be directly used as an mRNA after transduction into a host cell, e.g., can be directly translated into a protein after transduction into a host cell without any intervening nucleic acid replication or polymerization steps.

在一些实施例中，病毒选自组V病毒，例如，该病毒是RNA病毒并将ssRNA(-)包装成病毒体。在一些实施例中，组V病毒选自例如正黏病毒、弹状病毒。在一些实施例中，具有ssRNA(-)基因组的RNA病毒还在病毒体内携带酶，该酶被转导至具有病毒基因组的宿主细胞，例如RNA依赖性RNA聚合酶，能够将ssRNA(-)拷贝到可以由宿主直接翻译的ssRNA(+)。In some embodiments, the virus is selected from group V viruses, for example, the virus is an RNA virus and packages ssRNA (-) into virions. In some embodiments, group V viruses are selected from, for example, orthomyxoviruses, rhabdoviruses. In some embodiments, RNA viruses with ssRNA (-) genomes also carry enzymes in the virion that are transduced into host cells with viral genomes, such as RNA-dependent RNA polymerases that can copy ssRNA (-) into ssRNA (+) that can be directly translated by the host.

在一些实施例中，病毒选自组VI病毒，例如，该病毒是逆转录病毒并将ssRNA(+)包装成病毒体。在一些实施例中，组VI病毒选自例如逆转录病毒。在一些实施例中，逆转录病毒是慢病毒，例如，HIV-1、HIV-2、SIV、BIV。在一些实施例中，逆转录病毒是泡沫病毒属(spumavirus)，例如泡沫病毒(foamy virus)，例如HFV、SFV、BFV。在一些实施例中，包含在此类病毒体中的ssRNA(+)是编码分子，在转导至宿主细胞后能够直接用作mRNA，例如，在转导至宿主细胞后可以直接翻译成蛋白质而不需要任何干预性核酸复制或聚合步骤。在一些实施例中，ssRNA(+)首先被逆转录并拷贝以产生dsDNA基因组中间体，mRNA可以由该基因组中间体在宿主细胞中得以转录。在一些实施例中，具有ssRNA(+)基因组的RNA病毒还在病毒体内携带酶，该酶被转导至具有病毒基因组的宿主细胞，例如RNA依赖性DNA聚合酶，能够将ssRNA(+)拷贝到可以转录为mRNA并由宿主翻译的dsDNA。在一些实施例中，来自组VI逆转录病毒的逆转录酶作为基因修饰多肽的逆转录酶结构域掺入。In some embodiments, the virus is selected from group VI virus, for example, the virus is a retrovirus and ssRNA (+) is packaged into a virion. In some embodiments, group VI virus is selected from, for example, a retrovirus. In some embodiments, the retrovirus is a slow virus, for example, HIV-1, HIV-2, SIV, BIV. In some embodiments, the retrovirus is a spumavirus, for example, a foamy virus, for example, HFV, SFV, BFV. In some embodiments, the ssRNA (+) contained in such virions is a coding molecule that can be directly used as mRNA after being transduced to a host cell, for example, after being transduced to a host cell, it can be directly translated into protein without any intervening nucleic acid replication or polymerization step. In some embodiments, ssRNA (+) is first reverse transcribed and copied to produce a dsDNA genome intermediate, and mRNA can be transcribed in a host cell by the genome intermediate. In some embodiments, the RNA virus with a ssRNA (+) genome also carries an enzyme in the virus body that is transduced into a host cell with the viral genome, such as an RNA-dependent DNA polymerase that is capable of copying the ssRNA (+) to dsDNA that can be transcribed into mRNA and translated by the host. In some embodiments, a reverse transcriptase from a Group VI retrovirus is incorporated as a reverse transcriptase domain of a gene modifying polypeptide.

在一些实施例中，病毒选自组VII病毒，例如，该病毒是逆转录病毒并将dsRNA包装成病毒体。在一些实施例中，组VII病毒选自例如嗜肝DNA病毒。在一些实施例中，包含在此类病毒体中的dsRNA的一条或两条链是编码分子，在转导至宿主细胞后能够直接用作mRNA，例如，在转导至宿主细胞后可以直接翻译成蛋白质而不需要任何干预性核酸复制或聚合步骤。在一些实施例中，这样的病毒体中包含的dsRNA的一条或两条链首先被逆转录并拷贝以产生dsDNA基因组中间体，mRNA可以由该基因组中间体在宿主细胞中得以转录。在一些实施例中，具有dsRNA基因组的RNA病毒还在病毒体内携带酶，该酶被转导至具有病毒基因组的宿主细胞，例如RNA依赖性DNA聚合酶，能够将dsRNA拷贝到可以转录为mRNA并由宿主翻译的dsDNA。在一些实施例中，来自VII组逆转录病毒的逆转录酶作为基因修饰多肽的逆转录酶结构域掺入。In some embodiments, the virus is selected from group VII virus, for example, the virus is a retrovirus and dsRNA is packaged into a virion. In some embodiments, group VII virus is selected from, for example, hepadnavirus. In some embodiments, one or two chains of the dsRNA contained in such virions are coding molecules, which can be directly used as mRNA after being transduced to a host cell, for example, can be directly translated into protein without any intervening nucleic acid replication or polymerization step after being transduced to a host cell. In some embodiments, one or two chains of the dsRNA contained in such virions are first reverse transcribed and copied to produce a dsDNA genome intermediate, and mRNA can be transcribed in a host cell by the genome intermediate. In some embodiments, RNA viruses with dsRNA genomes also carry enzymes in virions, which are transduced to host cells with viral genomes, such as RNA-dependent DNA polymerases, which can copy dsRNA to dsDNA that can be transcribed into mRNA and translated by the host. In some embodiments, the reverse transcriptase from group VII retroviruses is incorporated as the reverse transcriptase domain of a gene-modified polypeptide.

在一些实施例中，本发明中用于递送核酸的病毒体还可以携带参与基因修饰过程的酶。例如，逆转录病毒病毒体可以包含与核酸一起被递送到宿主细胞中的逆转录酶结构域。在一些实施例中，RNA模板可以与病毒体内的基因修饰多肽相关联，从而在从病毒颗粒转导核酸后两者共同递送至靶细胞。在一些实施例中，病毒体中的核酸可以包含DNA，例如线性ssDNA、线性dsDNA、环状ssDNA、环状dsDNA、小环DNA、dbDNA、ceDNA。在一些实施例中，病毒体中的核酸可以包含RNA，例如线性ssRNA、线性dsRNA、环状ssRNA、环状dsRNA。在一些实施例中，病毒基因组可以在转导至宿主细胞后环化，例如，线性ssRNA分子可以经历共价连接以形成环状ssRNA，线性dsRNA分子可以经历共价连接以形成环状dsRNA或一个或多个环状ssRNA。在一些实施例中，病毒基因组可以通过在宿主细胞中的滚环复制来复制。在一些实施例中，病毒基因组可以包含单个核酸分子，例如，包含非分段基因组。在一些实施例中，病毒基因组可以包含两个或更多个核酸分子，例如，包含分段基因组。在一些实施例中，病毒体中的核酸可以与一种或蛋白质相关联。在一些实施例中，病毒体中的一种或多种蛋白质可在转导后被递送至宿主细胞。在一些实施例中，可通过向靶核酸添加病毒体包装信号而使天然病毒适于核酸递送，其中宿主细胞用于包装含有包装信号的靶核酸。In some embodiments, the virion used to deliver nucleic acid in the present invention can also carry enzymes involved in the gene modification process. For example, a retroviral virion can include a reverse transcriptase domain delivered to a host cell together with the nucleic acid. In some embodiments, the RNA template can be associated with the gene modification polypeptide in the virion, so that the two are co-delivered to the target cell after the nucleic acid is transduced from the viral particle. In some embodiments, the nucleic acid in the virion can include DNA, such as linear ssDNA, linear dsDNA, circular ssDNA, circular dsDNA, small ring DNA, dbDNA, ceDNA. In some embodiments, the nucleic acid in the virion can include RNA, such as linear ssRNA, linear dsRNA, circular ssRNA, circular dsRNA. In some embodiments, the viral genome can be cyclized after being transduced to the host cell, for example, a linear ssRNA molecule can undergo covalent bonding to form a circular ssRNA, and a linear dsRNA molecule can undergo covalent bonding to form a circular dsRNA or one or more circular ssRNAs. In some embodiments, the viral genome can be replicated by rolling circle replication in a host cell. In some embodiments, the viral genome can include a single nucleic acid molecule, for example, including a non-segmented genome. In some embodiments, the viral genome may comprise two or more nucleic acid molecules, e.g., comprising a segmented genome. In some embodiments, the nucleic acid in the virion may be associated with one or more proteins. In some embodiments, one or more proteins in the virion may be delivered to the host cell after transduction. In some embodiments, the native virus may be adapted for nucleic acid delivery by adding a virion packaging signal to the target nucleic acid, wherein the host cell is used to package the target nucleic acid containing the packaging signal.

在一些实施例中，用作递送媒介物的病毒体可以包含共生人类病毒。在一些实施例中，用作递送媒介物的病毒体可以包含指环病毒，其用途描述于WO 2018232017 A1中，该文献通过援引以其全文并入本文。In some embodiments, the virosome used as a delivery vehicle may comprise a commensal human virus. In some embodiments, the virosome used as a delivery vehicle may comprise an anellovirus, the use of which is described in WO 2018232017 A1, which is incorporated herein by reference in its entirety.

AAV施用AAV administration

在一些实施例中，腺相关病毒(AAV)与本文所述的系统、模板核酸和/或多肽联合使用。在一些实施例中，AAV用于递送、施用或包装本文所述的系统、模板核酸和/或多肽。在一些实施例中，AAV是重组AAV(rAAV)。In some embodiments, adeno-associated virus (AAV) is used in combination with the systems, template nucleic acids, and/or polypeptides described herein. In some embodiments, AAV is used to deliver, administer, or package the systems, template nucleic acids, and/or polypeptides described herein. In some embodiments, AAV is a recombinant AAV (rAAV).

在一些实施例中，系统包含(a)本文所述的多肽或编码其的核酸，(b)本文所述的模板核酸(例如，模板RNA)，和(c)对靶组织特异性的一个或多个第一组织特异性表达控制序列，其中对靶组织特异性的一个或多个第一组织特异性表达控制序列与(a)、(b)、或(a)和(b)可操作地关联，其中，当与(a)关联时，(a)包含编码多肽的核酸。In some embodiments, the system comprises (a) a polypeptide described herein or a nucleic acid encoding the same, (b) a template nucleic acid described herein (e.g., a template RNA), and (c) one or more first tissue-specific expression control sequences specific for a target tissue, wherein the one or more first tissue-specific expression control sequences specific for a target tissue are operably associated with (a), (b), or (a) and (b), wherein, when associated with (a), (a) comprises a nucleic acid encoding a polypeptide.

在一些实施例中，本文所述的系统还包含第一重组腺相关病毒(rAAV)衣壳蛋白；其中(a)或(b)中的至少一个与第一rAAV衣壳蛋白相关联，其中(a)或(b)中的至少一个侧翼为AAV反向末端重复序列(ITR)。In some embodiments, the system described herein further comprises a first recombinant adeno-associated virus (rAAV) capsid protein; wherein at least one of (a) or (b) is associated with a first rAAV capsid protein, wherein at least one of (a) or (b) is flanked by AAV inverted terminal repeat sequences (ITRs).

在一些实施例中，(a)和(b)与第一rAAV衣壳蛋白相关联。In some embodiments, (a) and (b) are associated with a first rAAV capsid protein.

在一些实施例中，(a)和(b)在单个核酸上。In some embodiments, (a) and (b) are on a single nucleic acid.

在一些实施例中，该系统进一步包含第二rAAV衣壳蛋白，其中(a)或(b)中的至少一个与第二rAAV衣壳蛋白相关联，并且其中与第二rAAV衣壳蛋白相关联的(a)或(b)中的至少一个和与第一rAAV衣壳蛋白相关联的(a)或(b)中的至少一个不同。In some embodiments, the system further comprises a second rAAV capsid protein, wherein at least one of (a) or (b) is associated with the second rAAV capsid protein, and wherein at least one of (a) or (b) associated with the second rAAV capsid protein is different from at least one of (a) or (b) associated with the first rAAV capsid protein.

在一些实施例中，(a)或(b)中的至少一个与第一或第二rAAV衣壳蛋白相关联分散在第一或第二rAAV衣壳蛋白的内部，该第一或第二rAAV衣壳蛋白是以AAV衣壳颗粒的形式。In some embodiments, at least one of (a) or (b) is associated with a first or second rAAV capsid protein and is dispersed within the first or second rAAV capsid protein, which is in the form of an AAV capsid particle.

在一些实施例中，该系统还包含纳米颗粒，其中该纳米颗粒与(a)或(b)中的至少一个相关联。In some embodiments, the system further comprises a nanoparticle, wherein the nanoparticle is associated with at least one of (a) or (b).

在一些实施例中，(a)和(b)分别与以下相关联：a)第一rAAV衣壳蛋白和第二rAAV衣壳蛋白；b)纳米颗粒和第一rAAV衣壳蛋白；c)第一rAAV衣壳蛋白；d)第一腺病毒衣壳蛋白；e)第一纳米颗粒和第二纳米颗粒；或f)第一纳米颗粒。In some embodiments, (a) and (b) are respectively associated with: a) a first rAAV capsid protein and a second rAAV capsid protein; b) a nanoparticle and a first rAAV capsid protein; c) a first rAAV capsid protein; d) a first adenovirus capsid protein; e) a first nanoparticle and a second nanoparticle; or f) a first nanoparticle.

病毒载体可用于递送本发明提供的系统的全部或部分，例如用于本发明提供的方法中。源自不同病毒的系统已被用于递送多肽或核酸；例如：整合酶缺陷型慢病毒、腺病毒、腺相关病毒(AAV)、单纯疱疹病毒和杆状病毒(在Hodge等人Hum Gene Ther[人类基因疗法]2017；Narayanavari等人Crit Rev Biochem Mol Biol[生物化学和分子生物学评论]2017；Boehme等人Curr Gene Ther[当今基因疗法]2015)中进行了综述。Viral vectors can be used to deliver all or part of the system provided by the present invention, for example, in the method provided by the present invention. Systems derived from different viruses have been used to deliver polypeptides or nucleic acids; for example: integrase-deficient lentivirus, adenovirus, adeno-associated virus (AAV), herpes simplex virus and baculovirus (in Hodge et al. Hum Gene Ther [Human Gene Therapy] 2017; Narayanavari et al. Crit Rev Biochem Mol Biol [Review of Biochemistry and Molecular Biology] 2017; Boehme et al. Curr Gene Ther [Current Gene Therapy] 2015) were reviewed.

腺病毒是常见的病毒，由于具有明确的生物学特性、遗传稳定性、高转导效率并易于大规模生产，其已被用作基因递送媒剂(例如，参见Lee等人Genes&Diseases[基因与疾病]2017中的综述)。它们具有线性dsDNA基因组，并有多种血清型，在组织和细胞嗜性方面有所不同。为了防止感染性病毒在受体细胞中复制，用于包装的腺病毒基因组被缺失了一些或全部内源病毒蛋白，这些内源病毒蛋白在病毒生产细胞中以反式形式提供。这使得基因组依赖于辅助功能，这意味着它们只能在所谓的辅助功能提供的缺失组分存在的情况下被复制并包装成病毒颗粒。去除所有病毒ORF的辅助依赖性腺病毒系统可与包装高达约37kb的外源DNA兼容(Parks等人J Virol[病毒学杂志]1997)。在一些实施例中，腺病毒载体用于递送对应于基因修饰系统的多肽或模板组分的DNA，或两者都包含在单独或相同的腺病毒载体上。在一些实施例中，腺病毒是不能自包装的辅助依赖性腺病毒(HD-AdV)。在一些实施例中，腺病毒是高容量腺病毒(HC-AdV)，其已缺失了全部或大部分内源病毒ORF，同时保留了包装成腺病毒颗粒所需的序列组分。对于这种类型的载体，基因组包装所需的唯一腺病毒序列是非编码序列：两端的反向末端重复序列(ITR)和5′端的包装信号(Jager等人Nat Protoc[自然实验手册]2009)。在一些实施例中，腺病毒基因组还包含填充DNA以满足用于最佳生产和稳定性的最小基因组大小(参见，例如，Hausl等人Mol Ther[分子疗法]2010)。在一些实施例中，腺病毒用于将基因修饰系统递送至肝。Adenoviruses are common viruses that have been used as gene delivery vehicles due to their well-defined biological properties, genetic stability, high transduction efficiency, and ease of large-scale production (e.g., see Lee et al., Genes & Diseases, 2017 for a review). They have a linear dsDNA genome and are available in multiple serotypes that differ in tissue and cell tropism. To prevent infectious virus replication in recipient cells, adenoviral genomes used for packaging are deleted for some or all endogenous viral proteins, which are provided in trans in viral production cells. This renders the genomes dependent on helper functions, meaning that they can only be replicated and packaged into viral particles in the presence of the missing components provided by the so-called helper functions. Helper-dependent adenoviral systems that have all viral ORFs removed are compatible with packaging of up to about 37 kb of foreign DNA (Parks et al., J Virol, 1997). In some embodiments, adenoviral vectors are used to deliver DNA corresponding to polypeptides or template components of a gene modification system, or both are contained on separate or the same adenoviral vector. In some embodiments, the adenovirus is a helper-dependent adenovirus (HD-AdV) that is not self-packaging. In some embodiments, the adenovirus is a high-capacity adenovirus (HC-AdV) that has deleted all or most of the endogenous viral ORFs while retaining the sequence components required for packaging into adenoviral particles. For this type of vector, the only adenoviral sequences required for genome packaging are non-coding sequences: inverted terminal repeats (ITRs) at both ends and a packaging signal at the 5′ end (Jager et al. Nat Protoc [Natural Experiment Manual] 2009). In some embodiments, the adenovirus genome also contains filler DNA to meet the minimum genome size for optimal production and stability (see, e.g., Hausl et al. Mol Ther [Molecular Therapy] 2010). In some embodiments, adenovirus is used to deliver gene modification systems to the liver.

在一些实施例中，腺病毒用于将基因修饰系统递送至HSC，例如HDAd5/35++。HDAd5/35++是具有经修饰的血清型35纤维(其将载体从肝去靶向)的腺病毒(Wang等人Blood Adv[血液研究进展]2019)。在一些实施例中，将基因修饰系统递送至HSC的腺病毒利用在原始HSC上特异性表达的受体，例如，CD46。In some embodiments, adenovirus is used to deliver the gene modification system to HSC, such as HDAd5/35++. HDAd5/35++ is an adenovirus with modified serotype 35 fibers that de-target the vector from the liver (Wang et al. Blood Adv 2019). In some embodiments, adenoviruses that deliver the gene modification system to HSC utilize receptors specifically expressed on primitive HSCs, such as CD46.

腺相关病毒(AAV)属于细小病毒科，更特别地构成依赖性细小病毒属。AAV基因组由线性单链DNA分子构成，该分子包含约4.7千碱基(kb)并且由两个主要的开放阅读框(ORF)(编码非结构Rep(复制)和结构Cap(衣壳)蛋白)组成。cap基因内的第二ORF被鉴定为编码组装激活蛋白(AAP)。AAV编码区两侧的DNA是两个顺式作用反向末端重复(ITR)序列，长度约为145个核苷酸，具有间断的回文序列，这些回文序列可折叠成能量稳定的发夹结构，这些发夹结构用作DNA复制的引物。除了它们在DNA复制中的作用外，ITR序列已被证明与病毒DNA整合到细胞基因组中、从宿主基因组或质粒中拯救以及将病毒核酸包裹到成熟病毒体中有关(Muzyczka，(1992)Curr.Top.Micro.Immunol.[微生物学和免疫学的当前主题]158：97-129).在一些实施例中，一种或多种基因修饰核酸组分的侧翼是源自AAV的ITR，用于病毒包装。参见，例如，WO 2019113310。Adeno-associated virus (AAV) belongs to the Parvoviridae family, more specifically to the genus Dependaviridovirus. The AAV genome consists of a linear single-stranded DNA molecule that contains approximately 4.7 kilobases (kb) and consists of two major open reading frames (ORFs) encoding nonstructural Rep (replication) and structural Cap (capsid) proteins. The second ORF within the cap gene was identified as encoding assembly activation protein (AAP). The DNA flanking the AAV coding region is two cis-acting inverted terminal repeat (ITR) sequences, approximately 145 nucleotides in length, with intermittent palindromic sequences that can fold into energy-stable hairpin structures that serve as primers for DNA replication. In addition to their role in DNA replication, ITR sequences have been shown to be involved in the integration of viral DNA into the cellular genome, rescue from the host genome or plasmid, and packaging of viral nucleic acids into mature virions (Muzyczka, (1992) Curr. Top. Micro. Immunol. [Current Topics in Microbiology and Immunology] 158: 97-129). In some embodiments, the flanks of one or more genetically modified nucleic acid components are AAV-derived ITRs for viral packaging. See, for example, WO 2019113310.

在一些实施例中，基因修饰系统的一种或多种组分通过至少一种AAV载体携带。在一些实施例中，针对特定细胞、组织、生物的嗜性选择至少一种AAV载体。在一些实施例中，AAV载体是假型的，例如AAV2/8，其中AAV2描述了构建体的设计，但衣壳蛋白被来自AAV8的蛋白替换。应当理解，任何所述载体可以是假型衍生物，其中用于包装AAV基因组的衣壳蛋白源自不同AAV血清型的衣壳蛋白。不希望受限于载体选择，示例性AAV血清型的列表可在表18中找到。在一些实施例中，用于基因修饰的AAV可针对新型细胞或组织嗜性进行进化，如文献中已证明的(例如，Davidsson等人Proc Natl Acad Sci U S A[美国国家科学院院刊]2019)。In some embodiments, one or more components of the gene modification system are carried by at least one AAV vector. In some embodiments, at least one AAV vector is selected for the tropism of specific cells, tissues, and organisms. In some embodiments, the AAV vector is a pseudotype, such as AAV2/8, wherein AAV2 describes the design of the construct, but the capsid protein is replaced by the protein from AAV8. It should be understood that any of the vectors can be a pseudotype derivative, wherein the capsid protein used to package the AAV genome is derived from the capsid protein of different AAV serotypes. It is not desirable to be limited to vector selection, and a list of exemplary AAV serotypes can be found in Table 18. In some embodiments, the AAV for gene modification can be evolved for novel cell or tissue tropism, as has been demonstrated in the literature (e.g., Davidsson et al. Proc Natl Acad Sci U S A [Proceedings of the National Academy of Sciences of the United States] 2019).

在一些实施例中，AAV递送载体是具有两个AAV反向末端重复序列(ITR)和目的核苷酸序列(例如，编码基因修饰多肽或DNA模板，或两者的序列)的载体，所述ITR中的每个具有间断(或非连续)回文序列，即由三个区段构成的序列：第一个区段和最后一个片段在5′→3′读取时是相同的，但在彼此相对放置时会杂交，以及一个不同的区段将相同的区段分开。参见，例如，WO 2012123430。In some embodiments, the AAV delivery vector is a vector having two AAV inverted terminal repeats (ITRs) and a nucleotide sequence of interest (e.g., a sequence encoding a gene-modifying polypeptide or a DNA template, or both), each of the ITRs having an interrupted (or non-continuous) palindromic sequence, i.e., a sequence consisting of three segments: the first segment and the last segment are identical when read 5′→3′, but hybridize when placed relative to each other, and a different segment separates the identical segments. See, e.g., WO 2012123430.

通常，通过引入一个或多个编码rAAV或scAAV基因组、Rep蛋白和Cap蛋白的质粒来产生带有衣壳的AAV病毒体(Grimm等人，1998)。在反式引入这些辅助质粒后，AAV基因组从宿主基因组中被“拯救”(即释放并随后回收)，并进一步包裹以产生感染性AAV。在一些实施例中，通过将侧翼为ITR的核酸与辅助功能一起引入包装细胞中，将一种或多种基因修饰核酸包装到AAV颗粒中。Typically, AAV virions with capsids are produced by introducing one or more plasmids encoding rAAV or scAAV genomes, Rep proteins, and Cap proteins (Grimm et al., 1998). After the introduction of these helper plasmids in trans, the AAV genome is "rescued" (i.e., released and subsequently recovered) from the host genome and further packaged to produce infectious AAV. In some embodiments, one or more genetically modified nucleic acids are packaged into AAV particles by introducing nucleic acids flanked by ITRs together with helper functions into packaging cells.

在一些实施例中，AAV基因组是所谓的自互补基因组(称为scAAV)，使得位于ITR之间的序列包含所期望的核酸序列(例如，编码基因修饰多肽或模板的DNA，或两者)以及所期望的核酸序列的反向互补序列，使得这两种组分可以折叠和自杂交。在一些实施例中，自互补模块由允许DNA自身折叠的间插序列分开，例如，形成茎环。scAAV的优势在于在进入细胞核后准备好进行转录，而不是首先依赖ITR引发和第二链合成来形成dsDNA。在一些实施例中，一种或多种基因修饰组分被设计为scAAV，其中AAV ITR之间的序列包含两个反向互补模块，它们可以自杂交以产生dsDNA。In some embodiments, the AAV genome is a so-called self-complementary genome (referred to as scAAV), such that the sequence between the ITRs contains the desired nucleic acid sequence (e.g., DNA encoding a gene-modified polypeptide or template, or both) and the reverse complement of the desired nucleic acid sequence, so that the two components can fold and self-hybridize. In some embodiments, the self-complementary modules are separated by intervening sequences that allow the DNA to fold itself, for example, to form a stem loop. The advantage of scAAV is that it is ready for transcription after entering the nucleus, rather than first relying on ITR initiation and second-strand synthesis to form dsDNA. In some embodiments, one or more gene modification components are designed as scAAV, wherein the sequence between the AAV ITRs contains two reverse complementary modules that can self-hybridize to produce dsDNA.

在一些实施例中，递送至细胞的核酸(例如，编码多肽或模板，或两者)是封闭末端的线性双链体DNA(CELiD DNA或ceDNA)。在一些实施例中，ceDNA源自AAV基因组的复制形式(Li等人PLoS One[公共科学图书馆·综合]2013)。在一些实施例中，核酸(例如，编码多肽或模板DNA，或两者)的侧翼为ITR，例如AAV ITR，其中至少一个ITR包含末端解离位点和复制蛋白结合位点(有时称为复制型蛋白结合位点)。在一些实施例中，ITR源自腺相关病毒，例如AAV1、AAV2、AAV3、AAV4、AAV5、AAV6、AAV7、AAV8、AAV9、AAV10、AAV11、AAV12或其组合。在一些实施例中，ITR是对称的。在一些实施例中，ITR是不对称的。在一些实施例中，提供至少一种Rep蛋白以使构建体能够复制。在一些实施例中，至少一种Rep蛋白源自腺相关病毒，例如AAV1、AAV2、AAV3、AAV4、AAV5、AAV6、AAV7、AAV8、AAV9、AAV10、AAV11、AAV12或其组合。在一些实施例中，通过向生产细胞提供(i)侧翼为ITR(例如AAV ITR)的DNA，和(ii)ITR依赖性复制所需的组分，例如AAV蛋白Rep78和Rep52(或编码蛋白的核酸)来产生ceDNA。在一些实施例中，ceDNA不含任何衣壳蛋白，例如，未包装到感染性AAV颗粒中。在一些实施例中，ceDNA被配制成LNP(参见例如WO 2019051289 A1)。In some embodiments, the nucleic acid delivered to the cell (e.g., encoding a polypeptide or a template, or both) is a linear duplex DNA with closed ends (CELiD DNA or ceDNA). In some embodiments, ceDNA is derived from a replicative form of the AAV genome (Li et al. PLoS One [Public Library of Science Comprehensive] 2013). In some embodiments, the nucleic acid (e.g., encoding a polypeptide or a template DNA, or both) is flanked by ITRs, such as AAV ITRs, wherein at least one ITR comprises a terminal dissociation site and a replication protein binding site (sometimes referred to as a replication protein binding site). In some embodiments, the ITR is derived from an adeno-associated virus, such as AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, or a combination thereof. In some embodiments, the ITR is symmetrical. In some embodiments, the ITR is asymmetrical. In some embodiments, at least one Rep protein is provided to enable the construct to replicate. In some embodiments, at least one Rep protein is derived from an adeno-associated virus, such as AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, or a combination thereof. In some embodiments, ceDNA is produced by providing (i) DNA flanked by ITRs (e.g., AAV ITRs), and (ii) components required for ITR-dependent replication, such as AAV proteins Rep78 and Rep52 (or nucleic acids encoding the proteins) to production cells. In some embodiments, ceDNA does not contain any capsid protein, for example, it is not packaged into infectious AAV particles. In some embodiments, ceDNA is formulated into LNPs (see, for example, WO 2019051289 A1).

在一些实施例中，ceDNA载体由两个自互补序列组成，例如本文定义的不对称或对称或基本对称的ITR，位于所述表达盒的侧翼，其中ceDNA载体不与衣壳蛋白相关联。在一些实施例中，ceDNA载体包含在AAV基因组中发现的两个自互补序列，其中至少一个ITR包含AAV的可操作的Rep结合元件(RBE)(在本文中有时也称为“RBS”)和末端解离位点(trs)或RBE的功能变体。参见，例如，WO 2019113310。In some embodiments, the ceDNA vector consists of two self-complementary sequences, such as asymmetric or symmetric or substantially symmetric ITRs as defined herein, flanking the expression cassette, wherein the ceDNA vector is not associated with a capsid protein. In some embodiments, the ceDNA vector comprises two self-complementary sequences found in the AAV genome, wherein at least one ITR comprises an operable Rep binding element (RBE) of AAV (sometimes also referred to herein as "RBS") and a terminal dissociation site (trs) or a functional variant of the RBE. See, e.g., WO 2019113310.

在一些实施例中，AAV基因组包含分别编码四种复制蛋白和三种衣壳蛋白的两个基因。在一些实施例中，基因的侧翼中任何一侧有145-bp的反向末端重复序列(ITR)。在一些实施例中，病毒体包含例如以1：1：10比率产生的多达三种衣壳蛋白(Vp1、Vp2、和/或Vp3)。在一些实施例中，衣壳蛋白产生自相同的开放阅读框和/或差异剪接(Vp1)和替代性的翻译起始位点(分别为Vp2和Vp3)。通常，Vp3是病毒体中最丰富的亚基，并参与细胞表面的受体识别，从而定义了病毒的嗜性。在一些实施例中，Vp1在Vp1的N末端包含例如在病毒感染性方面起作用的磷脂酶结构域。In some embodiments, the AAV genome comprises two genes encoding four replication proteins and three capsid proteins, respectively. In some embodiments, there are 145-bp inverted terminal repeats (ITRs) on either side of the flanks of the gene. In some embodiments, the virion comprises up to three capsid proteins (Vp1, Vp2, and/or Vp3), for example, produced at a ratio of 1:1:10. In some embodiments, the capsid proteins are produced from the same open reading frame and/or differential splicing (Vp1) and alternative translation start sites (Vp2 and Vp3, respectively). Typically, Vp3 is the most abundant subunit in the virion and is involved in receptor recognition on the cell surface, thereby defining the tropism of the virus. In some embodiments, Vp1 comprises a phospholipase domain, for example, that plays a role in viral infectivity, at the N-terminus of Vp1.

在一些实施例中，病毒载体的包装能力限制了可以包装到载体中的基因修饰系统的大小。例如，AAV的包装能力可以是约4.5kb(例如，约3.0、3.5、4.0、4.5、5.0、5.5、或6.0kb)，例如，包括一个或两个反向末端重复序列(ITR)，例如，145个碱基ITR。In some embodiments, the packaging capacity of the viral vector limits the size of the genetic modification system that can be packaged into the vector. For example, the packaging capacity of AAV can be about 4.5 kb (e.g., about 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, or 6.0 kb), for example, including one or two inverted terminal repeats (ITRs), for example, 145 base ITRs.

在一些实施例中，重组AAV(rAAV)包含在载体转基因盒侧翼的顺式作用145-bpITR，例如，提供高达4.5kb用于外源DNA的包装。感染后，在一些情况下，rAAV可以表达本发明的融合蛋白，并且通过以环状头对尾多联体的附加体形式持续存在而不整合到宿主基因组中。rAAV可例如在体外和体内使用。在一些实施例中，AAV介导的基因递送要求基因的编码序列的长度在大小上等于或大于野生型AAV基因组。In some embodiments, recombinant AAV (rAAV) comprises a cis-acting 145-bp ITR flanking the vector transgene cassette, for example, providing up to 4.5 kb for packaging of exogenous DNA. After infection, in some cases, rAAV can express the fusion protein of the present invention and persist in the form of an episome in a circular head-to-tail concatemer without integrating into the host genome. rAAV can be used, for example, in vitro and in vivo. In some embodiments, AAV-mediated gene delivery requires that the length of the coding sequence of the gene is equal to or greater than the wild-type AAV genome in size.

超过该大小的基因的AAV递送和/或大的生理调节元件的使用可以例如通过将要递送的一种或多种蛋白质分成两个或更多个片段来完成。在一些实施例中，N末端片段与内含肽-N序列融合。在一些实施例中，C末端片段与内含肽-C序列融合。在实施例中，将片段包装到两个或更多个AAV载体中。AAV delivery of genes exceeding this size and/or the use of large physiological regulatory elements can be accomplished, for example, by dividing the protein or proteins to be delivered into two or more fragments. In some embodiments, the N-terminal fragment is fused to an intein-N sequence. In some embodiments, the C-terminal fragment is fused to an intein-C sequence. In an embodiment, the fragments are packaged into two or more AAV vectors.

在一些实施例中，通过将大的转基因表达盒分成两个单独的半部分(5′和3′端，或头和尾)来产生双重AAV载体，例如，其中盒的每一半被包装在单个AAV载体中(其＜5kb)。在一些实施例中，然后可以在通过两个双重AAV载体对同一细胞进行的共感染后实现全长转基因表达盒的重新组装。在一些实施例中，共感染之后是以下中的一项或多项：(1)5′和3′基因组之间的同源重组(HR)(双重AAV重叠载体)；(2)5′和3′基因组的ITR介导的尾对头连环化(双重AAV反式剪接载体)；和/或(3)这两种机制的组合(双重AAV杂合载体)。在一些实施例中，体内使用双重AAV载体导致全长蛋白质的表达。在一些实施例中，双重AAV载体平台的使用代表了用于大小大于约4.0、4.1、4.2、4.3、4.4、4.5、4.6、4.7、4.8、4.9、或5.0kb的转基因的有效且可行的基因转移策略。在一些实施例中，AAV载体还可用于例如在核酸和肽的体外生产中用靶核酸转导细胞。在一些实施例中，AAV载体可用于体内和离体基因疗法程序(参见，例如，West等人，Virology[病毒学]160：38-47(1987)；美国专利号4,797,368；WO93/24641；Kotin，Human Gene Therapy[人类基因疗法]5：793-801(1994)；Muzyczka，J.Clin.Invest.[临床研究期刊]94：1351(1994)；其各自通过援引以其全文并入本文)。重组AAV载体的构建描述于许多公开物中，包括美国专利号5,173,414；Tratschin等人，Mol.Cell.Biol.[分子细胞生物学]5：3251-3260(1985)；Tratschin，等人，Mol.Cell.Biol.[分子细胞生物学]4：2072-2081(1984)；Hermonat和Muzyczka，PNAS[美国国家科学院院刊]81：6466-6470(1984)；以及Samulski等人，J.Virol.[病毒学杂志]63：03822-3828(1989)(其通过援引以其全文并入本文)。In some embodiments, a dual AAV vector is generated by dividing a large transgene expression cassette into two separate halves (5' and 3' ends, or head and tail), for example, wherein each half of the cassette is packaged in a single AAV vector (which is <5 kb). In some embodiments, reassembly of the full-length transgene expression cassette can then be achieved upon co-infection of the same cell by two dual AAV vectors. In some embodiments, co-infection is followed by one or more of the following: (1) homologous recombination (HR) between the 5' and 3' genomes (dual AAV overlapping vectors); (2) ITR-mediated tail-to-head concatemerization of the 5' and 3' genomes (dual AAV trans-splicing vectors); and/or (3) a combination of these two mechanisms (dual AAV hybrid vectors). In some embodiments, in vivo use of the dual AAV vector results in expression of the full-length protein. In some embodiments, the use of dual AAV vector platforms represents an effective and feasible gene transfer strategy for transgenes greater than about 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, or 5.0 kb in size. In some embodiments, AAV vectors can also be used to transduce cells with target nucleic acids, for example, in the in vitro production of nucleic acids and peptides. In some embodiments, AAV vectors can be used for in vivo and ex vivo gene therapy procedures (see, e.g., West et al., Virology 160: 38-47 (1987); U.S. Pat. No. 4,797,368; WO 93/24641; Kotin, Human Gene Therapy 5: 793-801 (1994); Muzyczka, J. Clin. Invest. 94: 1351 (1994); each of which is incorporated herein by reference in its entirety). The construction of recombinant AAV vectors is described in many publications, including U.S. Pat. No. 5,173,414; Tratschin et al., Mol. Cell. Biol. 5:3251-3260 (1985); Tratschin, et al., Mol. Cell. Biol. 4:2072-2081 (1984); Hermonat and Muzyczka, PNAS 81:6466-6470 (1984); and Samulski et al., J. Virol. 63:03822-3828 (1989) (each of which is incorporated herein by reference in its entirety).

在一些实施例中，本文所述的基因修饰多肽(例如，具有或不具有一种或多种指导核酸)可以使用AAV、慢病毒、腺病毒或其他质粒或病毒载体类型进行递送，特别是使用来自以下文献的配制品和剂量：例如，美国专利号8,454,972(针对腺病毒的配制品、剂量)、美国专利号8,404,658(针对AAV的配制品、剂量)和美国专利号5,846,946(针对DNA质粒的配制品、剂量)以及来自临床试验和关于涉及慢病毒、AAV和腺病毒的临床试验的公开物。例如，对于AAV，施用途径、配制品和剂量可如美国专利号8,454,972和涉及AAV的临床试验中所述。对于腺病毒，施用途径、配制品和剂量可如美国专利号8,404,658和涉及腺病毒的临床试验中所述。对于质粒递送，施用途径、配制品和剂量可如美国专利号5,846,946和涉及质粒的临床研究中所述。剂量可以基于或外推为平均70kg的个体(例如男性成人)，并且可以针对患者、受试者、不同重量和物种的哺乳动物进行调整。施用频率在医学或兽医学从业者(例如医师、兽医师)的范围之内，其取决于常规因素，包括患者或受试者的年龄、性别、一般健康状况、其他状况以及着手解决的特定病症或症状。在一些实施例中，可以将病毒载体注射到目的组织中。对于细胞类型的特异性基因修饰，在一些实施例中，基因修饰多肽和任选的指导核酸的表达可以由细胞类型的特异性启动子驱动。In some embodiments, the genetically modified polypeptides described herein (e.g., with or without one or more guide nucleic acids) can be delivered using AAV, lentivirus, adenovirus, or other plasmid or viral vector types, particularly using formulations and dosages from the following literature: for example, U.S. Pat. No. 8,454,972 (formulations, dosages for adenovirus), U.S. Pat. No. 8,404,658 (formulations, dosages for AAV), and U.S. Pat. No. 5,846,946 (formulations, dosages for DNA plasmids) and from clinical trials and disclosures on clinical trials involving lentivirus, AAV, and adenovirus. For example, for AAV, the route of administration, formulation, and dosage may be as described in U.S. Pat. No. 8,454,972 and clinical trials involving AAV. For adenovirus, the route of administration, formulation, and dosage may be as described in U.S. Pat. No. 8,404,658 and clinical trials involving adenovirus. For plasmid delivery, the route of administration, formulation, and dosage may be as described in U.S. Pat. No. 5,846,946 and clinical studies involving plasmids. Dosages can be based on or extrapolated to an average 70 kg individual (e.g., male adult) and can be adjusted for patients, subjects, mammals of different weights and species. The frequency of administration is within the scope of the medical or veterinary practitioner (e.g., physician, veterinarian), depending on conventional factors, including the age, sex, general health, other conditions, and specific conditions or symptoms addressed by the patient or subject. In some embodiments, the viral vector can be injected into the target tissue. For specific genetic modification of cell types, in some embodiments, the expression of the genetically modified polypeptide and the optional guide nucleic acid can be driven by a specific promoter of the cell type.

在一些实施例中，例如，由于纯化方法不需要可以激活免疫反应的细胞颗粒的超速离心，AAV允许低毒性。在一些实施例中，AAV允许引起插入诱变的可能性低，原因是例如它基本上不整合到宿主基因组中。In some embodiments, AAV allows for low toxicity, for example, because the purification method does not require ultracentrifugation of cell pellets that can activate an immune response. In some embodiments, AAV allows for a low likelihood of causing insertional mutagenesis, for example, because it does not substantially integrate into the host genome.

在一些实施例中，AAV具有约4.4、4.5、4.6、4.7、或4.75kb的包装限制。在一些实施例中，基因修饰多肽编码序列、启动子和转录终止子可以装配到单个病毒载体中。在一些情况下，SpCas9(4.1kb)可能难以包装成AAV。因此，在一些实施例中，使用长度比其他基因修饰多肽编码序列或碱基编辑器短的基因修饰多肽编码序列。在一些实施例中，基因修饰多肽编码序列小于约4.5kb、4.4kb、4.3kb、4.2kb、4.1kb、4kb、3.9kb、3.8kb、3.7kb、3.6kb、3.5kb、3.4kb、3.3kb、3.2kb、3.1kb、3kb、2.9kb、2.8kb、2.7kb、2.6kb、2.5kb、2kb、或1.5kb。In some embodiments, AAV has a packaging limit of about 4.4, 4.5, 4.6, 4.7, or 4.75 kb. In some embodiments, the gene modified polypeptide coding sequence, promoter, and transcription terminator can be assembled into a single viral vector. In some cases, SpCas9 (4.1 kb) may be difficult to package into AAV. Therefore, in some embodiments, a gene modified polypeptide coding sequence having a length shorter than other gene modified polypeptide coding sequences or base editors is used. In some embodiments, the gene modified polypeptide coding sequence is less than about 4.5 kb, 4.4 kb, 4.3 kb, 4.2 kb, 4.1 kb, 4 kb, 3.9 kb, 3.8 kb, 3.7 kb, 3.6 kb, 3.5 kb, 3.4 kb, 3.3 kb, 3.2 kb, 3.1 kb, 3 kb, 2.9 kb, 2.8 kb, 2.7 kb, 2.6 kb, 2.5 kb, 2 kb, or 1.5 kb.

AAV可以是AAV1、AAV2、AAV5或其任何组合。在一些实施例中，AAV的类型是根据要靶向的细胞来选择的；例如，可选择AAV血清型1、2、5或杂合衣壳AAV1、AAV2、AAV5或其任何组合用于靶向脑或神经元细胞；或者可以选择AAV4用于靶向心脏组织。在一些实施例中，选择AAV8用于递送至肝脏。关于这些细胞的示例性AAV血清型描述于例如Grimm，D.等人，J.Virol.[病毒学杂志]82：5887-5911(2008)(其通过援引以其全文并入本文)中。在一些实施例中，AAV是指所有血清型、亚型和天然存在的AAV以及重组AAV。AAV可用于指代病毒本身或其衍生物。在一些实施例中，AAV包括AAV1、AAV2、AAV3、AAV3B、AAV4、AAV5、AAV6、AAV6.2、AAV7、AAVrh.64R1、AAVhu.37、AAVrh.8、AAVrh.32.33、AAV8、AAV9、AAV-DJ、AAV2/8、AAVrhlO、AAVLK03、AV10、AAV11、AAV 12、rhlO、和其杂合体，禽AAV、牛AAV、犬AAV、马AAV、灵长类动物AAV、非灵长类动物AAV、和羊AAV。各种AAV血清型的基因组序列，以及天然末端重复序列(TR)、Rep蛋白和衣壳亚基的序列是本领域已知的。此类序列可在文献或公共数据库如GenBank中找到。表18中列出了另外的示例性AAV血清型。AAV can be AAV1, AAV2, AAV5, or any combination thereof. In some embodiments, the type of AAV is selected based on the cells to be targeted; for example, AAV serotype 1, 2, 5 or hybrid capsid AAV1, AAV2, AAV5, or any combination thereof can be selected for targeting brain or neuronal cells; or AAV4 can be selected for targeting cardiac tissue. In some embodiments, AAV8 is selected for delivery to the liver. Exemplary AAV serotypes for these cells are described in, for example, Grimm, D. et al., J. Virol. [Journal of Virology] 82: 5887-5911 (2008) (which is incorporated herein by reference in its entirety). In some embodiments, AAV refers to all serotypes, subtypes, and naturally occurring AAVs as well as recombinant AAVs. AAV can be used to refer to the virus itself or its derivatives. In some embodiments, AAV includes AAV1, AAV2, AAV3, AAV3B, AAV4, AAV5, AAV6, AAV6.2, AAV7, AAVrh.64R1, AAVhu.37, AAVrh.8, AAVrh.32.33, AAV8, AAV9, AAV-DJ, AAV2/8, AAVrhlO, AAVLK03, AV10, AAV11, AAV 12, rhlO, and hybrids thereof, avian AAV, bovine AAV, canine AAV, equine AAV, primate AAV, non-primate AAV, and ovine AAV. The genomic sequences of various AAV serotypes, as well as the sequences of natural terminal repeats (TR), Rep proteins, and capsid subunits are known in the art. Such sequences can be found in the literature or in public databases such as GenBank. Additional exemplary AAV serotypes are listed in Table 18.

表18.示例性AAV血清型。Table 18. Exemplary AAV serotypes.

在一些实施例中，药物组合物(例如，包含如本文所述的AAV的药物组合物)具有少于10％的空衣壳、少于8％的空衣壳、少于7％的空衣壳、少于5％的空衣壳、少于3％的空衣壳、或少于1％的空衣壳。在一些实施例中，药物组合物具有少于约5％的空衣壳。在一些实施例中，空衣壳的数量低于检测限。在一些实施例中，药物组合物具有少量空衣壳是有利的，原因是例如空衣壳可能产生例如很少或没有实质性的治疗益处的不良反应(例如，免疫反应、炎性反应、肝脏反应和/或心脏反应)。In some embodiments, a pharmaceutical composition (e.g., a pharmaceutical composition comprising an AAV as described herein) has less than 10% empty capsids, less than 8% empty capsids, less than 7% empty capsids, less than 5% empty capsids, less than 3% empty capsids, or less than 1% empty capsids. In some embodiments, the pharmaceutical composition has less than about 5% empty capsids. In some embodiments, the number of empty capsids is below the limit of detection. In some embodiments, it is advantageous for a pharmaceutical composition to have a small amount of empty capsids because, for example, empty capsids may produce adverse reactions (e.g., immune, inflammatory, liver, and/or cardiac reactions) that have little or no substantial therapeutic benefit.

在一些实施例中，药物组合物中的残余宿主细胞蛋白(rHCP)少于或等于100ng/mlrHCP/1 x 10¹³vg/ml，例如，少于或等于40ng/ml rHCP/1 x 10¹³vg/ml或1-50ng/ml rHCP/1x 10¹³vg/ml。在一些实施例中，药物组合物包含少于10ng rHCP/1.0 x 10¹³vg、或少于5ngrHCP/1.0 x 10¹³vg、少于4ng rHCP/1.0 x 10¹³vg、或少于3ng rHCP/1.0 x 10¹³vg，或介于之间的任何浓度。在一些实施例中，药物组合物中的残余宿主细胞DNA(hcDNA)少于或等于5x 10⁶pg/ml hcDNA/1 x 10¹³vg/ml、少于或等于1.2 x 10⁶pg/ml hcDNA/1 x 10¹³vg/ml、或1x 10⁵pg/ml hcDNA/1 x 10¹³vg/ml。在一些实施例中，所述药物组合物中的残余宿主细胞DNA少于5.0 x 10⁵pg/1 x 10¹³vg、少于2.0 x 10⁵pg/1.0 x 10¹³vg、少于1.1 x 10⁵pg/1.0x 10¹³vg、少于1.0 x 10⁵pghcDNA/1.0 x 10¹³vg、少于0.9 x 10⁵pg hcDNA/1.0 x 10¹³vg、少于0.8 x 10⁵pg hcDNA/1.0 x 10¹³vg，或介于之间的任何浓度。In some embodiments, the residual host cell protein (rHCP) in the pharmaceutical composition is less than or equal to 100 ng/ml rHCP/1 x 10 ¹³ vg/ml, for example, less than or equal to 40 ng/ml rHCP/1 x 10 ¹³ vg/ml or 1-50 ng/ml rHCP/1 x 10 ¹³ vg/ml. In some embodiments, the pharmaceutical composition comprises less than 10 ng rHCP/1.0 x 10 ¹³ vg, or less than 5 ng rHCP/1.0 x 10 ¹³ vg, less than 4 ng rHCP/1.0 x 10 ¹³ vg, or less than 3 ng rHCP/1.0 x 10 ¹³ vg, or any concentration therebetween. In some embodiments, the residual host cell DNA (hcDNA) in the pharmaceutical composition is less than or equal to 5 x 10 ⁶ pg/ml hcDNA/1 x 10 ¹³ vg/ml, less than or equal to 1.2 x 10 ⁶ pg/ml hcDNA/1 x 10 ¹³ vg/ml, or 1 x 10 ⁵ pg/ml hcDNA/1 x 10 ¹³ vg/ml. In some embodiments, the residual host cell DNA in the pharmaceutical composition is less than 5.0 x 10 ⁵ pg/1 x 10 ¹³ vg, less than 2.0 x 10 ⁵ pg/1.0 x 10 ¹³ vg, less than 1.1 x 10 ⁵ pg/1.0 x 10 ¹³ vg, less than 1.0 x 10 ⁵ pg hcDNA/1.0 x 10 ¹³ vg, less than 0.9 x 10 ⁵ pg hcDNA/1.0 x 10 ¹³ vg, less than 0.8 x 10 ⁵ pg hcDNA/1.0 x 10 ¹³ vg, or any concentration therebetween.

在一些实施例中，药物组合物中的残余质粒DNA少于或等于1.7 x 10⁵pg/ml/1.0x 10¹³vg/ml、或1 x 10⁵pg/ml/1 x 1.0 x 10¹³vg/ml、或1.7 x 10⁶pg/ml/1.0 x 10¹³vg/ml。在一些实施例中，药物组合物中的残余DNA质粒少于10.0 x 10⁵pg/1.0 x 10¹³vg、少于8.0x 10⁵pg/1.0 x 10¹³vg或少于6.8 x 10⁵pg/1.0 x 10¹³vg。在实施例中，药物组合物包含少于0.5ng/1.0 x 10¹³vg、少于0.3ng/1.0 x 10¹³vg、少于0.22ng/1.0 x 10¹³vg或少于0.2ng/1.0 x 10¹³vg或任何中间浓度的牛血清白蛋白(BSA)。在实施例中，药物组合物中的全能核酸酶(benzonase)为少于0.2ng/1.0 x 10¹³vg、少于0.1ng/1.0 x 10¹³vg、少于0.09ng/1.0x 10¹³vg、少于0.08ng/1.0 x 10¹³vg或任何中间浓度。在实施例中，药物组合物中的泊洛沙姆188(Poloxamer 188)为约10至150ppm、约15至100ppm或约20至80ppm。在实施例中，药物组合物中的铯为少于50pg/g(ppm)、少于30pg/g(ppm)或少于20pg/g(ppm)或任何中间浓度。In some embodiments, the residual plasmid DNA in the pharmaceutical composition is less than or equal to 1.7 x 10 ⁵ pg/ml/1.0 x 10 ¹³ vg/ml, or 1 x 10 ⁵ pg/ml/1 x 1.0 x 10 ¹³ vg/ml, or 1.7 x 10 ⁶ pg/ml/1.0 x 10 ¹³ vg/ml. In some embodiments, the residual DNA plasmid in the pharmaceutical composition is less than 10.0 x 10 ⁵ pg/1.0 x 10 ¹³ vg, less than 8.0 x 10 ⁵ pg/1.0 x 10 ¹³ vg, or less than 6.8 x 10 ⁵ pg/1.0 x 10 ¹³ vg. In embodiments, the pharmaceutical composition comprises less than 0.5 ng/1.0 x 10 ¹³ vg, less than 0.3 ng/1.0 x 10 ¹³ vg, less than 0.22 ng/1.0 x 10 ¹³ vg, or less than 0.2 ng/1.0 x 10 ¹³ vg, or any intermediate concentration of bovine serum albumin (BSA). In embodiments, the benzonase in the pharmaceutical composition is less than 0.2 ng/1.0 x 10 ¹³ vg, less than 0.1 ng/1.0 x 10 ¹³ vg, less than 0.09 ng/1.0 x 10 ¹³ vg, less than 0.08 ng/1.0 x 10 ¹³ vg, or any intermediate concentration. In embodiments, the poloxamer 188 in the pharmaceutical composition is about 10 to 150 ppm, about 15 to 100 ppm, or about 20 to 80 ppm. In embodiments, the cesium in the pharmaceutical composition is less than 50 pg/g (ppm), less than 30 pg/g (ppm), or less than 20 pg/g (ppm), or any intermediate concentration.

在实施例中，药物组合物包含少于10％、少于8％、少于7％、少于6％、少于5％、少于4％、少于3％、少于2％或介于之间的任何百分比的总杂质，例如，如通过SDS-PAGE测定。在实施例中，例如，如通过SDS-PAGE测定的总纯度为大于90％、大于92％、大于93％、大于94％、大于95％、大于96％、大于97％、大于98％、或介于之间的任何百分比。在实施例中，例如，如通过SDS-PAGE测量的，没有单一的未命名相关杂质多于5％、多于4％、多于3％或多于2％、或介于之间的任何百分比。在实施例中，药物组合物包含的填充的衣壳相对于总衣壳(例如，如通过分析型超速离心测量的峰1+峰2)的百分比为大于85％、大于86％、大于87％、大于88％、大于89％、大于90％、大于91％、大于91.9％、大于92％、大于93％，或介于之间的任何百分比。在药物组合物的实施例中，通过分析型超速离心在峰1中测量的填充的衣壳的百分比为20-80％、25-75％、30-75％、35-75％或37.4-70.3％。在药物组合物的实施例中，通过分析型超速离心在峰2中测量的填充的衣壳的百分比为20％-80％、20％-70％、22％-65％、24％-62％、或24.9％-60.1％。In embodiments, the pharmaceutical composition comprises less than 10%, less than 8%, less than 7%, less than 6%, less than 5%, less than 4%, less than 3%, less than 2% or any percentage in between of total impurities, e.g., as determined by SDS-PAGE. In embodiments, for example, the total purity as determined by SDS-PAGE is greater than 90%, greater than 92%, greater than 93%, greater than 94%, greater than 95%, greater than 96%, greater than 97%, greater than 98%, or any percentage in between. In embodiments, for example, as measured by SDS-PAGE, there is no single unnamed related impurity greater than 5%, greater than 4%, greater than 3% or greater than 2%, or any percentage in between. In embodiments, the pharmaceutical composition comprises a percentage of filled capsid relative to total capsid (e.g., peak 1 + peak 2 as measured by analytical ultracentrifugation) of greater than 85%, greater than 86%, greater than 87%, greater than 88%, greater than 89%, greater than 90%, greater than 91%, greater than 91.9%, greater than 92%, greater than 93%, or any percentage in between. In embodiments of the pharmaceutical composition, the percentage of filled capsid measured in peak 1 by analytical ultracentrifugation is 20-80%, 25-75%, 30-75%, 35-75%, or 37.4-70.3%. In embodiments of the pharmaceutical composition, the percentage of filled capsid measured in peak 2 by analytical ultracentrifugation is 20%-80%, 20%-70%, 22%-65%, 24%-62%, or 24.9%-60.1%.

在一个实施例中，药物组合物包含1.0至5.0 x 10¹³vg/mL、1.2至3.0 x 10¹³vg/mL或1.7至2.3 x 10¹³vg/ml的基因组效价。在一个实施例中，药物组合物显示出小于5CFU/mL、小于4CFU/mL、小于3CFU/mL、小于2CFU/mL或小于1CFU/mL或任何中间浓度的生物负载。在实施例中，根据USP，例如USP<85>(通过援引以其全文并入)的内毒素的量少于1.0EU/mL、少于0.8EU/mL或少于0.75EU/mL。在实施例中，根据USP，例如USP<785>(通过援引以其全文并入)的药物组合物的渗透压摩尔浓度为350至450mOsm/kg、370至440mOsm/kg或390至430mOsm/kg。在实施例中，药物组合物含有少于1200个大于25μm的颗粒/容器、少于1000个大于25μm的颗粒/容器、少于500个大于25μm的颗粒/容器或任何中间值。在实施例中，药物组合物含有少于10,000个大于10μm的颗粒/容器、少于8000个大于10μm的颗粒/容器、或少于600个大于10pm的颗粒/容器。In one embodiment, the pharmaceutical composition comprises 1.0 to 5.0 x 10 ¹³ vg/mL, 1.2 to 3.0 x 10 ¹³ vg/mL or 1.7 to 2.3 x 10 ¹³ vg/ml genomic titer. In one embodiment, the pharmaceutical composition shows less than 5CFU/mL, less than 4CFU/mL, less than 3CFU/mL, less than 2CFU/mL or less than 1CFU/mL or any intermediate concentration of bioburden. In an embodiment, according to USP, such as USP<85> (incorporated by reference in its entirety) the amount of endotoxin is less than 1.0EU/mL, less than 0.8EU/mL or less than 0.75EU/mL. In an embodiment, according to USP, such as USP<785> (incorporated by reference in its entirety) the osmotic pressure molarity of the pharmaceutical composition is 350 to 450mOsm/kg, 370 to 440mOsm/kg or 390 to 430mOsm/kg. In embodiments, the pharmaceutical composition contains less than 1200 particles/container greater than 25 μm, less than 1000 particles/container greater than 25 μm, less than 500 particles/container greater than 25 μm, or any intermediate value. In embodiments, the pharmaceutical composition contains less than 10,000 particles/container greater than 10 μm, less than 8000 particles/container greater than 10 μm, or less than 600 particles/container greater than 10 μm.

在一个实施例中，药物组合物具有0.5至5.0 x 10¹³vg/mL、1.0至4.0 x 10¹³vg/mL、1.5至3.0 x 10¹³vg/ml或1.7至2.3 x 10¹³vg/ml的基因组效价。在一个实施例中，本文所述的药物组合物包含以下中的一项或多项：小于约0.09ng全能核酸酶/1.0 x 10¹³vg，小于约30pg/g(ppm)的铯，约20至80ppm泊洛沙姆188，小于约0.22ng BSA/1.0 x 10¹³vg，小于约6.8 x 10⁵pg的残余DNA质粒/1.0 x 10¹³vg，小于约1.1 x 10⁵pg的残余hcDNA/1.0 x10¹³vg，小于约4ng的rHCP/1.0 x 10¹³vg，pH 7.7至8.3，约390至430mOsm/kg，小于约600个大小＞25μm的颗粒/容器，小于约6000个大小＞10μm的颗粒/容器，约1.7 x 10¹³-2.3 x10¹³vg/mL基因组效价，约3.9 x 10⁸至8.4 x 10¹⁰IU/1.0 x 10¹³vg的感染效价，约100-300pg/1.0 x 10¹³vg的总蛋白，在约7.5 x 10¹³vg/kg剂量的病毒载体情况下A7SMA小鼠＞24天的平均存活，根据基于体外细胞的测定的约70％至130％相对效力和/或小于约5％空衣壳。在各种实施例中，本文所述的药物组合物包含本文讨论的任何病毒颗粒，该药物组合物保留了参考标准品的±20％之间、±15％之间、±10％之间、或±5％内的效力。在一些实施例中，使用合适的体外细胞测定或体内动物模型来测量效力。In one embodiment, the pharmaceutical composition has a genomic titer of 0.5 to 5.0 x 10 ¹³ vg/mL, 1.0 to 4.0 x 10 ¹³ vg/mL, 1.5 to 3.0 x 10 ¹³ vg/ml, or 1.7 to 2.3 x 10 ¹³ vg/ml. In one embodiment, the pharmaceutical composition described herein comprises one or more of the following: less than about 0.09 ng of full-potency nuclease/1.0 x 10 ¹³ vg, less than about 30 pg/g (ppm) of cesium, about 20 to 80 ppm of poloxamer 188, less than about 0.22 ng of BSA/1.0 x 10 ¹³ vg, less than about 6.8 x 10 ⁵ pg of residual DNA plasmid/1.0 x 10 ¹³ vg, less than about 1.1 x 10 ⁵ pg of residual hcDNA/1.0 x 10 ¹³ vg, less than about 4 ng of rHCP/1.0 x 10 ¹³ vg, pH 7.7 to 8.3, about 390 to 430 mOsm/kg, less than about 600 particles/container with a size of >25 μm, less than about 6000 particles/container with a size of >10 μm, about 1.7 x 10 ^13-2.3 x10 ¹³ vg/mL genome titer, infectious titer of about 3.9 x 10 ⁸ to 8.4 x 10 ¹⁰ IU/1.0 x 10 ¹³ vg, total protein of about 100-300 pg/1.0 x 10 ¹³ vg, mean survival of A7SMA mice >24 days at a viral vector dose of about 7.5 x 10 ¹³ vg/kg, about 70% to 130% relative potency based on an in vitro cell-based assay and/or less than about 5% empty capsids. In various embodiments, the pharmaceutical compositions described herein contain any of the viral particles discussed herein, the pharmaceutical compositions retaining potency within ±20%, ±15%, ±10%, or ±5% of a reference standard. In some embodiments, potency is measured using a suitable in vitro cell assay or in vivo animal model.

WO 2019094253中传授了制备、表征和给予AAV颗粒的另外的方法，该文献通过援引以其全文并入本文。Additional methods for preparing, characterizing, and administering AAV particles are taught in WO 2019094253, which is incorporated herein by reference in its entirety.

可与本发明一致使用的其他rAAV构建体包括Wang等人2019中描述的那些，可在以下网址获得：//doi.org/10.1038/s41573-019-0012-9，包括其表1，将该文献通过援引以其全文并入。Other rAAV constructs that may be used in accordance with the present invention include those described in Wang et al. 2019, available at: //doi.org/10.1038/s41573-019-0012-9, including Table 1 thereof, which is incorporated by reference in its entirety.

脂质纳米颗粒Lipid Nanoparticles

本文提供的方法和系统可以采用任何合适的载剂或递送形式，在某些实施例中包括脂质纳米颗粒(LNP)。在一些实施例中，脂质纳米颗粒包含一种或多种离子脂质，诸如非阳离子脂质(例如，中性或阴离子或两性离子脂质)；一种或多种缀合脂质(如WO2019217941的表5中描述的PEG缀合脂质或缀合至聚合物的脂质；其通过援引以其全文并入本文)；一种或多种固醇(例如，胆固醇)；以及，任选地，一种或多种靶向分子(例如，缀合的受体、受体配体、抗体)；或前述内容的组合。The method and system provided herein can adopt any suitable carrier or delivery form, including lipid nanoparticles (LNP) in certain embodiments. In certain embodiments, lipid nanoparticles include one or more ionic lipids, such as non-cationic lipids (e.g., neutral or anionic or zwitterionic lipids); one or more conjugated lipids (PEG conjugated lipids or conjugated to the lipid of polymer as described in Table 5 of WO2019217941; it is incorporated herein in its entirety by citing); one or more sterols (e.g., cholesterol); and, optionally, one or more targeting molecules (e.g., conjugated receptors, receptor ligands, antibodies); or a combination of the foregoing.

可用于形成纳米颗粒(例如，脂质纳米颗粒)的脂质包括例如WO 2019217941(通过援引并入)的表4中描述的那些-例如，含脂质的纳米颗粒可包含WO 2019217941的表4中的一种或多种脂质。脂质纳米颗粒可以包括另外的要素，如聚合物，如通过援引并入的WO2019217941的表5中描述的聚合物。Lipids that can be used to form nanoparticles (e.g., lipid nanoparticles) include, for example, those described in Table 4 of WO 2019217941 (incorporated by reference) - for example, the lipid-containing nanoparticles may comprise one or more lipids in Table 4 of WO 2019217941. The lipid nanoparticles may include additional elements, such as polymers, such as those described in Table 5 of WO 2019217941, incorporated by reference.

在一些实施例中，缀合脂质，当存在时，可以包括以下的一种或多种：PEG-二酰基甘油(DAG)(如1-(单甲氧基-聚乙二醇)-2，3-二肉豆蔻酰甘油(PEG-DMG))、PEG-二烷氧基丙基(DAA)、PEG-磷脂、PEG-神经酰胺(Cer)、聚乙二醇化磷脂酰乙醇胺(PEG-PE)、PEG琥珀酸二酰基甘油(PEGS-DAG)(如4-0-(2′，3′-二(十四烷酰氧基)丙基-1-0-(w-甲氧基(聚乙氧基)乙基)丁二酸酯(PEG-S-DMG))、PEG二烷氧基丙基氨基甲酸酯、N-(羰基-甲氧基聚乙二醇2000)-1，2-二硬脂酰-sn-甘油-3-磷酸乙醇胺钠盐，以及在WO 2019051289的表2中描述的那些(通过援引并入)和前述的组合。In some embodiments, the conjugated lipid, when present, may include one or more of: PEG-diacylglycerol (DAG) (such as 1-(monomethoxy-polyethylene glycol)-2,3-dimyristoylglycerol (PEG-DMG)), PEG-dialkoxypropyl (DAA), PEG-phospholipids, PEG-ceramide (Cer), pegylated phosphatidylethanolamine (PEG-PE), PEG succinic diacylglycerol (PEGS-DAG) (such as 4-0-(2′,3′-di(tetradecanoyloxy)propyl-1-0-(w-methoxy(polyethoxy)ethyl)succinate (PEG-S-DMG)), PEG dialkoxypropyl carbamate, N-(carbonyl-methoxypolyethylene glycol 2000)-1,2-distearoyl-sn-glycero-3-phosphoethanolamine sodium salt, and those described in Table 2 of WO 2019051289 (incorporated by reference), and combinations of the foregoing.

在一些实施例中，可掺入脂质纳米颗粒中的固醇包括胆固醇或胆固醇衍生物中的一种或多种，如通过援引并入的W02009/127060或US 2010/0130588中的那些。另外的示例性固醇包括植物固醇，包括通过援引并入本文的Eygeris等人(2020)，dx.doi.org/10.1021/acs.nanolett.0c01386中描述的那些。In some embodiments, the sterol that can be incorporated into the lipid nanoparticles includes one or more of cholesterol or cholesterol derivatives, such as those in WO2009/127060 or US 2010/0130588, which are incorporated by reference. Additional exemplary sterols include plant sterols, including those described in Eygeris et al. (2020), dx.doi.org/10.1021/acs.nanolett.0c01386, which are incorporated by reference herein.

在一些实施例中，脂质颗粒包含可电离脂质、非阳离子脂质、抑制颗粒聚集的缀合脂质和固醇。这些组分的量可以独立地变化，以获得所需特性。例如，在一些实施例中，脂质纳米颗粒包含：可电离脂质，其量是总脂质的约20mol％至约90mol％(在其他实施例中，它可以是存在于脂质纳米颗粒中的总脂质的20％-70％(mol)、30％-60％(mol)或40％-50％(mol)；约50mol％至约90mol％)；非阳离子脂质，其量是总脂质的约5mol％至约30mol％；缀合脂质，其量是总脂质的约0.5mol％至约20mol％，以及固醇，其量是总脂质的约20mol％至约50mol％。总脂质与核酸(例如，编码基因修饰多肽或模板核酸)的比率可以根据需要而变化。例如，总脂质与核酸(质量或重量)的比率可为约10∶1至约30∶1。In some embodiments, lipid particles include ionizable lipids, non-cationic lipids, conjugated lipids and sterols that inhibit particle aggregation. The amounts of these components can be changed independently to obtain desired properties. For example, in some embodiments, lipid nanoparticles include: ionizable lipids in an amount of about 20 mol% to about 90 mol% of total lipids (in other embodiments, it can be 20%-70% (mol), 30%-60% (mol) or 40%-50% (mol) of total lipids present in lipid nanoparticles; about 50 mol% to about 90 mol%); non-cationic lipids in an amount of about 5 mol% to about 30 mol% of total lipids; conjugated lipids in an amount of about 0.5 mol% to about 20 mol% of total lipids, and sterols in an amount of about 20 mol% to about 50 mol% of total lipids. The ratio of total lipids to nucleic acids (e.g., encoding gene modification polypeptides or template nucleic acids) can be varied as desired. For example, the ratio of total lipids to nucleic acids (mass or weight) can be about 10: 1 to about 30: 1.

在一些实施例中，可电离脂质可以是阳离子脂质、可电离阳离子脂质，例如可以根据pH以带正电荷的形式或中性形式存在的阳离子脂质，或可以容易地质子化的含胺脂质。在一些实施例中，阳离子脂质是例如在生理条件下能够带正电的脂质。示例性的阳离子脂质包括一个或多个带有正电荷的胺基。在一些实施例中，脂质颗粒包含阳离子脂质与中性脂质、可电离含胺脂质、生物可降解炔烃脂质、类固醇、包括多不饱和脂质的磷脂、结构脂质(例如固醇)、PEG、胆固醇和聚合物缀合脂质一起配制。在一些实施例中，阳离子脂质可以是可电离的阳离子脂质。如本文所披露的示例性阳离子脂质可具有超过6.0的有效pKa。在实施例中，脂质纳米颗粒可包含具有与第一阳离子脂质不同的有效pKa(例如，大于第一有效pKa)的第二阳离子脂质。脂质纳米颗粒可包含40mol％至60mol％的阳离子脂质、中性脂质、类固醇、聚合物缀合脂质和治疗剂，例如本文所述的核酸(例如RNA)(例如模板核酸或编码基因修饰多肽的核酸)，包封在脂质纳米颗粒内或与脂质纳米颗粒相关联。在一些实施例中，核酸与阳离子脂质共同配制。核酸可以吸附到LNP(例如包含阳离子脂质的LNP)的表面。在一些实施例中，核酸可以包封在LNP(例如包含阳离子脂质的LNP)中。在一些实施例中，脂质纳米颗粒可包含靶向部分，例如用靶向剂包被的靶向部分。在实施例中，LNP配制品是生物可降解的。在一些实施例中，包含一种或多种本文所述的脂质(例如式(i)、(ii)、(ii)、(vii)和/或(ix))的脂质纳米颗粒包封至少1％、至少5％、至少10％、至少20％、至少30％、至少40％、至少50％、至少60％、至少70％、至少80％、至少90％、至少92％、至少95％、至少97％、至少98％或100％的RNA分子，例如，模板RNA和/或编码基因修饰多肽的mRNA。In some embodiments, the ionizable lipid can be a cationic lipid, an ionizable cationic lipid, such as a cationic lipid that can exist in a positively charged form or a neutral form according to pH, or an amine-containing lipid that can be easily protonated. In some embodiments, the cationic lipid is, for example, a lipid that can be positively charged under physiological conditions. Exemplary cationic lipids include one or more positively charged amine groups. In some embodiments, the lipid particles include cationic lipids and neutral lipids, ionizable amine-containing lipids, biodegradable alkyne lipids, steroids, phospholipids including polyunsaturated lipids, structural lipids (such as sterols), PEG, cholesterol and polymer conjugated lipids are formulated together. In some embodiments, the cationic lipid can be an ionizable cationic lipid. Exemplary cationic lipids as disclosed herein can have an effective pKa exceeding 6.0. In an embodiment, the lipid nanoparticle can include a second cationic lipid with an effective pKa different from the first cationic lipid (e.g., greater than the first effective pKa). Lipid nanoparticles can include 40mol% to 60mol% of cationic lipids, neutral lipids, steroids, polymer-conjugated lipids and therapeutic agents, such as nucleic acids (e.g., RNA) as described herein (e.g., template nucleic acids or nucleic acids encoding gene-modified polypeptides), encapsulated in lipid nanoparticles or associated with lipid nanoparticles. In certain embodiments, nucleic acids and cationic lipids are co-formulated. Nucleic acids can be adsorbed onto the surface of LNPs (e.g., LNPs comprising cationic lipids). In certain embodiments, nucleic acids can be encapsulated in LNPs (e.g., LNPs comprising cationic lipids). In certain embodiments, lipid nanoparticles can include targeting moieties, such as targeting moieties coated with targeting agents. In certain embodiments, LNP formulations are biodegradable. In some embodiments, lipid nanoparticles comprising one or more lipids described herein (e.g., formula (i), (ii), (vii), and/or (ix)) encapsulate at least 1%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 92%, at least 95%, at least 97%, at least 98%, or 100% of RNA molecules, e.g., template RNA and/or mRNA encoding a gene modifying polypeptide.

在一些实施例中，脂质与核酸的比率(质量/质量比率；w/w比率)可以在以下范围中：约1∶1至约25∶1、约10∶1至约14∶1、约3∶1至约15∶1、约4∶1至约10∶1、约5∶1至约9∶1、或约6∶1至约9∶1。可以调节脂质和核酸的量以提供所需的N/P比，例如3、4、5、6、7、8、9、10或更高的N/P比。通常，脂质纳米颗粒配制品的总脂质含量可在约5mg/ml至约30mg/mL的范围内。In some embodiments, the ratio of lipid to nucleic acid (mass/mass ratio; w/w ratio) can be in the following range: about 1: 1 to about 25: 1, about 10: 1 to about 14: 1, about 3: 1 to about 15: 1, about 4: 1 to about 10: 1, about 5: 1 to about 9: 1, or about 6: 1 to about 9: 1. The amount of lipid and nucleic acid can be adjusted to provide a desired N/P ratio, such as 3, 4, 5, 6, 7, 8, 9, 10 or higher N/P ratios. Typically, the total lipid content of the lipid nanoparticle formulation can be in the range of about 5 mg/ml to about 30 mg/mL.

可用于脂质纳米颗粒配制品中的示例性可电离脂质包括但不限于通过援引并入本文的WO 2019051289的表1中所列的那些。另外的示例性脂质包括但不限于下式中的一种或多种：US 2016/0311759的X；US 20150376115或US 2016/0376224中的I；US 20160151284的I、II或III；US 20170210967的I、IA、II或IIA；US 20150140070的I-c；US 2013/0178541的A；US 2013/0303587或US 2013/0123338的I；US 2015/0141678的I；US 2015/0239926的II、III、IV或V；US 2017/0119904的I；WO 2017/117528的I或II；US2012/0149894的A；US2015/0057373的A；WO 2013/116126的A；US 2013/0090372的A；US 2013/0274523的A；US2013/0274504的A；US 2013/0053572的A；W02013/016058的A；W02012/162210的A；US2008/042973的I；US 2012/01287670的I、II、III或IV；US 2014/0200257的I或II；US 2015/0203446的I、II或III；US 2015/0005363的I或III；US 2014/0308304的I、IA、IB、IC、ID、II、IIA、IIB、IIC、IID或III-XXIV；US 2013/0338210；W02009/132131的I、II、III或IV；US2012/01011478的A；US2012/0027796的I或XXXV；US 2012/0058144的XIV或XVII；US 2013/0323269的；US 2011/0117125的I；US 2011/0256175的I、II或III；US 2012/0202871的I、II、III、IV、V、VI、VII、VIII、IX、X、XI、XII；US 2011/0076335的I、II、III、IV、V、VI、VII、VIII、X、XII、XIII、XIV、XV或XVI；US 2006/008378的I或II；US 2013/0123338的I；US 2015/0064242的I或X-A-Y-Z；US 2013/0022649的XVI、XVII或XVIII；US 2013/0116307的I、II或III；US 2013/0116307的I、II或III；US 2010/0062967的I或II；US 2013/0189351的I-X；US2014/0039032的I；US 2018/0028664的V；US 2016/0317458的I；US 2013/0195920的I；US10,221,127的5、6或10；WO 2018/081480的III-3；WO 2020/081938的I-5或I-8；US 9,867,888的18或25；US 2019/0136231的A；WO 2020/219876的II；US2012/0027803的1；US 2019/0240349的OF-02；US 10,086,013的23；Miao等人(2020)的cKK-E12/A6；WO 2010/053572的C12-200；Dahlman等人(2017)的7C1；Whitehead等人的304-O13或503-O13；US 9,708,628的TS-P4C2；WO 2020/106946的I；WO 2020/106946的I。Exemplary ionizable lipids that can be used in lipid nanoparticle formulations include, but are not limited to, those listed in Table 1 of WO 2019051289, which is incorporated herein by reference. Additional exemplary lipids include, but are not limited to, one or more of the following formulae: X of US 2016/0311759; I of US 20150376115 or US 2016/0376224; I, II, or III of US 20160151284; I, IA, II, or IIA of US 20170210967; I-c of US 20150140070; A of US 2013/0178541; I of US 2013/0303587 or US 2013/0123338; I of US 2015/0141678; II, III, IV, or V of US 2015/0239926; I of US 2017/0119904; I or II of US 2017/117528; A of US 2012/0149894; A of US 2015/0057373; A of WO 2013/116126; A of US 2013/0090372; A of US 2013/0274523; A of US 2013/0274504; A of US 2013/0053572; A of WO2013/016058; A of WO2012/162210; I of US 2008/042973; I, II, III or IV of US 2012/01287670; I or II of US 2014/0200257; I, II or III of US 2015/0203446; I or III of US 2015/0005363; I, IA, IB, IC, ID, II, IIA, IIB, IIC, IID or III-XXIV of US 2014/0308304; US 2013/0338210; I, II, III or IV of WO2009/132131; A of US 2012/01011478; I or XXXV of US 2012/0027796; XIV or XVII of US 2012/0058144; of US 2013/0323269; I of US 2011/0117125; I, II or III of US 2011/0256175; US I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII of US 2012/0202871; I, II, III, IV, V, VI, VII, VIII, X, XII, XIII, XIV, XV or XVI of US 2011/0076335; I or II of US 2006/008378; I of US 2013/0123338; I or X-A-Y-Z of US 2015/0064242; XVI, XVII or XVIII of US 2013/0022649; I, II or III of US 2013/0116307; I, II or III of US 2013/0116307; I or II of US 2010/0062967; I-X of 2013/0189351; I of US2014/0039032; V of US 2018/0028664; I of US 2016/0317458; I of US 2013/0195920; 5, 6 or 10 of US10,221,127; III-3 of WO 2018/081480; I-5 or I-8 of WO 2020/081938; 18 or 25 of US 9,867,888; A of US 2019/0136231; II of WO 2020/219876; 1 of US2012/0027803; OF-02 of US 2019/0240349; US 23 of 10,086,013; cKK-E12/A6 of Miao et al. (2020); C12-200 of WO 2010/053572; 7C1 of Dahlman et al. (2017); 304-O13 or 503-O13 of Whitehead et al.; TS-P4C2 of US 9,708,628; I of WO 2020/106946; I of WO 2020/106946.

在一些实施例中，可电离脂质是MC3(6Z，9Z，28Z，31Z)-三十七烷-6，9，28，31-四烯-19-基-4-(二甲基氨基)丁酸酯(DLin-MC3-DMA或MC3)，例如，如WO 2019051289 A9(通过援引以其全文并入本文)的实例9中所述。在一些实施例中，可电离脂质是脂质ATX-002，例如，如WO 2019051289 A9(通过援引以其全文并入本文)的实例10中所述。在一些实施例中，可电离脂质是(13Z，16Z)-A，A-二甲基-3-壬基二十二-13，16-二烯-1-胺(化合物32)，例如，如WO 2019051289 A9(通过援引以其全文并入本文)的实例11中所述。在一些实施例中，可电离脂质是化合物6或化合物22，例如，如WO 2019051289 A9(通过援引以其全文并入本文)的实例12中所述。在一些实施例中，可电离脂质是十七烷-9-基8-((2-羟乙基)(6-氧代-6-(十一烷氧基)己基)氨基)辛酸酯(SM-102)；例如，如US 9,867,888(其通过援引以其全文并入本文)的实例1中所述。在一些实施例中，可电离脂质是9Z，12Z)-3-((4，4-双(辛基氧基)丁酰基)氧基)-2-((((3-(二乙基氨基)丙氧基)羰基)氧基)甲基)丙基十八碳-9，12-二烯酸酯(LP01)，例如，如WO 2015/095340(其通过援引以其全文并入本文)的实例13中合成的。在一些实施例中，可电离脂质是9-((4-二甲基氨基)丁酰基)氧基)十七烷二酸二((Z)-壬-2-烯-1-基)酯(L319)，例如如US2012/0027803(通过援引以其全文并入本文)的实例7、8或9中合成的。在一些实施例中，可电离脂质是1，1′-((2-(4-(2-((2-(双(2-羟基十二烷基)氨基)乙基)(2-羟基十二烷基)氨基)乙基)哌嗪-1-基)乙基)氮烷二基)双(十二烷-2-醇)(C12-200)，例如，如WO 2010/053572(其通过援引以其全文并入本文)的实例14和16中合成的。在一些实施例中，可电离脂质是；咪唑胆固醇酯(ICE)脂质(3S，10R，13R，17R)-10，13-二甲基-17-((R)-6-甲基庚-2-基)-2，3，4，7，8，9，10，11，12，13，14，15，16，17-十四氢-1H-环戊[a]菲-3-基3-(1H-咪唑4-基)丙酸酯，例如来自WO 2020/106946(其通过援引以其全文并入本文)的结构(I)。In some embodiments, the ionizable lipid is MC3 (6Z, 9Z, 28Z, 31Z) - heptatriacontane-6, 9, 28, 31-tetraen-19-yl-4- (dimethylamino) butyrate (DLin-MC3-DMA or MC3), for example, as described in Example 9 of WO 2019051289 A9 (incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is lipid ATX-002, for example, as described in Example 10 of WO 2019051289 A9 (incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is (13Z, 16Z) -A, A-dimethyl-3-nonyldocosa-13, 16-dien-1-amine (compound 32), for example, as described in Example 11 of WO 2019051289 A9 (incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is compound 6 or compound 22, for example, as described in Example 12 of WO 2019051289 A9 (incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is heptadecan-9-yl 8-((2-hydroxyethyl)(6-oxo-6-(undecyloxy)hexyl)amino)octanoate (SM-102); for example, as described in Example 1 of US 9,867,888 (incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is 9Z,12Z)-3-((4,4-bis(octyloxy)butanoyl)oxy)-2-((((3-(diethylamino)propoxy)carbonyl)oxy)methyl)propyloctadec-9,12-dienoate (LP01), for example, as synthesized in Example 13 of WO 2015/095340 (incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is di((Z)-non-2-en-1-yl) 9-((4-dimethylamino)butanoyl)oxy)heptadecanedioate (L319), for example, as synthesized in Examples 7, 8, or 9 of US 2012/0027803, which is incorporated herein by reference in its entirety. In some embodiments, the ionizable lipid is 1,1′-((2-(4-(2-((2-(bis(2-hydroxydodecyl)amino)ethyl)(2-hydroxydodecyl)amino)ethyl)piperazin-1-yl)ethyl)azanediyl)bis(dodecan-2-ol) (C12-200), for example, as synthesized in Examples 14 and 16 of WO 2010/053572, which is incorporated herein by reference in its entirety. In some embodiments, the ionizable lipid is; imidazolyl cholesteryl ester (ICE) lipid (3S, 10R, 13R, 17R)-10,13-dimethyl-17-((R)-6-methylhept-2-yl)-2,3,4,7,8,9,10,11,12,13,14,15,16,17-tetradecahydro-1H-cyclopenta[a]phenanthrene-3-yl 3-(1H-imidazol-4-yl)propanoate, for example, structure (I) from WO 2020/106946 (which is incorporated herein by reference in its entirety).

可用于(例如，与其他脂质组分组合)形成用于递送本文所述的组合物，例如本文所述的核酸(例如，RNA)(例如，模板核酸或编码基因修饰多肽的核酸)的脂质纳米颗粒的脂质化合物的一些非限制性实例包括：Some non-limiting examples of lipid compounds that can be used (e.g., in combination with other lipid components) to form lipid nanoparticles for delivering a composition described herein, e.g., a nucleic acid (e.g., RNA) described herein (e.g., a template nucleic acid or a nucleic acid encoding a gene-modifying polypeptide) include:

在一些实施例中，包含式(i)的LNP用于将本文所述的基因修饰组合物递送至肝和/或肝细胞。In some embodiments, LNPs comprising formula (i) are used to deliver the gene modification compositions described herein to the liver and/or hepatocytes.

在一些实施例中，包含式(ii)的LNP用于将本文所述的基因修饰组合物递送至肝和/或肝细胞。In some embodiments, LNPs comprising formula (ii) are used to deliver the gene modification compositions described herein to the liver and/or hepatocytes.

在一些实施例中，包含式(iii)的LNP用于将本文所述的基因修饰组合物递送至肝和/或肝细胞。In some embodiments, LNPs comprising formula (iii) are used to deliver the gene modification compositions described herein to the liver and/or hepatocytes.

在一些实施例中，包含式(v)的LNP用于将本文所述的基因修饰组合物递送至肝和/或肝细胞。In some embodiments, LNPs comprising formula (v) are used to deliver the gene modification compositions described herein to the liver and/or hepatocytes.

在一些实施例中，包含式(vi)的LNP用于将本文所述的基因修饰组合物递送至肝和/或肝细胞。In some embodiments, LNPs comprising formula (vi) are used to deliver the gene modification compositions described herein to the liver and/or hepatocytes.

在一些实施例中，包含式(viii)的LNP用于将本文所述的基因修饰组合物递送至肝和/或肝细胞。In some embodiments, LNPs comprising formula (viii) are used to deliver the gene modification compositions described herein to the liver and/or hepatocytes.

在一些实施例中，包含式(ix)的LNP用于将本文所述的基因修饰组合物递送至肝和/或肝细胞。In some embodiments, LNPs comprising formula (ix) are used to deliver the gene modification compositions described herein to the liver and/or hepatocytes.

其中in

X¹是O、NR¹或直接键，X²是C2-5亚烷基，X³是C(＝0)或直接键，R¹是H或Me，R³是Ci-3烷基，R²是Ci-3烷基，或R²与它所附接的氮原子和X²的1-3个碳原子一起形成4元、5元或6元环，或X¹是NR¹，R¹和R²与它们所附接的氮原子一起形成5元或6元环，或R²与R³和它们所附接的氮原子一起形成5元、6元或7元环，Y¹是C2-12亚烷基，Y²选自 ^X1 is O, ^NR1 or a direct bond, ^X2 is C2-5 alkylene, ^X3 is C(＝0) or a direct bond, ^R1 is H or Me, ^R3 is C1-3 alkyl, ^R2 is C1-3 alkyl, or ^R2 together with the nitrogen atom to which it is attached and 1-3 carbon atoms of ^X2 form a 4-, 5- or 6-membered ring, or ^X1 is ^NR1 , ^R1 and ^R2 together with the nitrogen atom to which they are attached form a 5- or 6-membered ring, or ^R2 and ^R3 together with the nitrogen atom to which they are attached form a 5-, 6- or 7-membered ring, ^Y1 is C2-12 alkylene, and ^Y2 is selected from

n是0至3，R⁴是Ci-15烷基，Z¹是Ci-6亚烷基或直接键，n is 0 to 3, ^R4 is C1-15 alkyl, ^Z1 is C1-6 alkylene or a direct bond,

Z²是 ^Z2 is

(在任一取向上)或不存在，条件是如果Z¹是直接键，则Z²不存在；(in either orientation) or absent, provided that if Z ¹ is a direct bond, then Z ² is absent;

R⁵是C5-9烷基或C6-10烷氧基，R⁶是C5-9烷基或C6-10烷氧基，W是亚甲基或直接键，并且R⁷是H或Me，或其盐，条件是如果R³和R²是C2烷基，X¹是O，X²是直链C3亚烷基，X³是C(＝0)，Y¹是直链Ce亚烷基，(Y²)n-R⁴是 ^R5 is C5-9 alkyl or C6-10 alkoxy, ^R6 is C5-9 alkyl or C6-10 alkoxy, W is methylene or a direct bond, and ^R7 is H or Me, or a salt thereof, provided that if ^R3 and ^R2 are C2 alkyl, ^X1 is O, ^X2 is a straight chain C3 alkylene, ^X3 is C(=0), ^Y1 is a straight chain Ce alkylene, ( ^Y2 ) ^nR4 is

，R⁴是直链C5烷基，Z¹是C2亚烷基，Z²不存在，W是亚甲基，并且R⁷是H，则R⁵和R⁶不是Cx烷氧基。, ^R4 is a linear C5 alkyl, ^Z1 is a C2 alkylene, ^Z2 is absent, W is a methylene, and ^R7 is H, then ^R5 and ^R6 are not Cx alkoxy.

在一些实施例中，包含式(xii)的LNP用于将本文所述的基因修饰组合物递送至肝和/或肝细胞。In some embodiments, LNPs comprising formula (xii) are used to deliver the gene modification compositions described herein to the liver and/or hepatocytes.

在一些实施例中，包含式(xi)的LNP用于将本文所述的基因修饰组合物递送至肝和/或肝细胞。In some embodiments, LNPs comprising formula (xi) are used to deliver the gene modification compositions described herein to the liver and/or hepatocytes.

其中 in

在一些实施例中，LNP包含式(xiii)的化合物和式(xiv)的化合物。In some embodiments, the LNP comprises a compound of formula (xiii) and a compound of formula (xiv).

在一些实施例中，包含式(xv)的LNP用于将本文所述的基因修饰组合物递送至肝和/或肝细胞。In some embodiments, LNPs comprising formula (xv) are used to deliver the gene modification compositions described herein to the liver and/or hepatocytes.

在一些实施例中，包含式(xvi)的配制品的LNP用于将本文所述的基因修饰组合物递送至肺内皮细胞。In some embodiments, LNPs comprising the formulation of formula (xvi) are used to deliver the gene modification compositions described herein to lung endothelial cells.

其中 in

在一些实施例中，用于形成用于递送本文所述组合物(例如本文所述的核酸(例如，RNA)(例如模板核酸或编码基因修饰多肽的核酸))的脂质纳米颗粒的脂质化合物通过以下反应之一制备：In some embodiments, lipid compounds used to form lipid nanoparticles for delivering a composition described herein, such as a nucleic acid (e.g., RNA) described herein (e.g., a template nucleic acid or a nucleic acid encoding a gene-modifying polypeptide) are prepared by one of the following reactions:

示例性的非阳离子脂质包括但不限于二硬脂酰-sn-甘油-磷酸乙醇胺、二硬脂酰磷脂酰胆碱(DSPC)、二油酰磷脂酰胆碱(DOPC)、二棕榈酰磷脂酰胆碱(DPPC)、二油酰磷脂酰甘油(DOPG)、二棕榈酰磷脂酰甘油(DPPG)、二油酰-磷脂酰乙醇胺(DOPE)、1，2-二油酰-sn-甘油-3-磷酸乙醇胺(DOPE)、棕榈酰油酰磷脂酰胆碱(POPC)、棕榈酰油酰磷脂酰乙醇胺(POPE)、二油酰-磷脂酰乙醇胺4-(N-马来酰亚胺甲基)-环己烷-1-甲酸盐(DOPE-mal)、二棕榈酰磷脂酰乙醇胺(DPPE)、二肉豆蔻酰磷酸乙醇胺(DMPE)、二硬脂酰-磷脂酰-乙醇胺(DSPE)、单甲基-磷脂酰乙醇胺(例如16-O-单甲基PE)、二甲基-磷脂酰乙醇胺(例如16-O-二甲基PE)、18-1-反式PE，1-硬脂酰-2-油酰-磷脂酰乙醇胺(SOPE)、氢化大豆磷脂酰胆碱(HSPC)、蛋磷脂酰胆碱(EPC)、二油酰磷脂酰丝氨酸(DOPS)、鞘磷脂(SM)、二肉豆蔻酰磷脂酰胆碱(DMPC)、二肉豆蔻酰磷脂酰甘油(DMPG)、二硬脂酰磷脂酰甘油(DSPG)、二芥子酰磷脂酰胆碱(DEPC)、棕榈酰油酰磷脂酰甘油(POPG)、二反油酰-磷脂酰乙醇胺(DEPE)、卵磷脂、磷脂酰乙醇胺、溶血卵磷脂、溶血磷脂酰乙醇胺、磷脂酰丝氨酸、磷脂酰肌醇、鞘磷脂、卵鞘磷脂(ESM)、脑磷脂、心磷脂、磷脂酸、脑苷脂、双十六烷基磷酸、溶血磷脂酰胆碱、二亚油酰磷脂酰胆碱、或其混合物。应当理解，也可以使用其他二酰基磷脂酰胆碱和二酰基磷脂酰乙醇胺磷脂。这些脂质中的酰基基团优选为源自具有C10-C24碳链的脂肪酸的酰基基团，例如月桂酰基、肉豆蔻酰基、棕榈酰基、硬脂酰基或油酰基。在某些实施例中，另外的示例性脂质包括但不限于通过援引并入本文的Kim等人(2020)dx.doi.org/10.1021/acs.nanolett.0c01386中描述的那些。在一些实施例中，这样的脂质包括发现会改善用mRNA进行肝脏转染的植物脂质(例如DGTS)。在一些实施例中，非阳离子脂质可以具有以下结构，Exemplary non-cationic lipids include, but are not limited to, distearoyl-sn-glycero-phosphoethanolamine, distearoylphosphatidylcholine (DSPC), dioleoylphosphatidylcholine (DOPC), dipalmitoylphosphatidylcholine (DPPC), dioleoylphosphatidylglycerol (DOPG), dipalmitoylphosphatidylglycerol (DPPG), dioleoyl-phosphatidylethanolamine (DOPE), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), palmitoyloleoylphosphatidylcholine (POPC), palmitoyloleoylphosphatidylethanolamine (POPE), dioleoyl-phosphatidylethanolamine 4-(N-maleimidomethyl)-cyclohexane-1-carboxylate (DOPE-mal), dipalmitoylphosphatidylethanolamine (DPPE), dimyristoylphosphatidylethanolamine (DMPE), distearoyl-phosphatidyl-ethanolamine (DSPE), monomethyl-phosphatidylethanolamine (e.g., 16-O-monomethyl PE), dimethyl-phosphatidylethanolamine (e.g. 16-O-dimethyl PE), 18-1-trans PE, 1-stearoyl-2-oleoyl-phosphatidylethanolamine (SOPE), hydrogenated soybean phosphatidylcholine (HSPC), egg phosphatidylcholine (EPC), dioleoylphosphatidylserine (DOPS), sphingomyelin (SM), dimyristoylphosphatidylcholine (DMPC), dimyristoylphosphatidylglycerol (DMPG), distearoylphosphatidylglycerol (D The phospholipids of the present invention are preferably phospholipids of the present invention, such as phospholipids, ... In certain embodiments, additional exemplary lipids include, but are not limited to, those described in Kim et al. (2020) dx.doi.org/10.1021/acs.nanolett.0c01386, which is incorporated herein by reference. In some embodiments, such lipids include plant lipids (e.g., DGTS) found to improve liver transfection with mRNA. In some embodiments, the non-cationic lipid may have the following structure,

适合用于脂质纳米颗粒中的非阳离子脂质的其他实例包括但不限于非磷脂质，例如硬脂胺、十二烷基胺、十六烷基胺、乙酰基棕榈酸酯、蓖麻酸甘油酯、硬脂酸十六烷基酯、肉豆蔻酸异丙酯、两性丙烯酸聚合物、三乙醇胺-月桂基硫酸酯、烷基-芳基硫酸酯、聚乙氧基化脂肪酸酰胺、双十八烷基二甲基溴化铵、神经酰胺、鞘磷脂等。其他非阳离子脂质在WO2017/099823或美国专利公开US 2018/0028664中描述，其内容通过援引以其全文并入本文。Other examples of non-cationic lipids suitable for use in lipid nanoparticles include, but are not limited to, non-phospholipids, such as stearylamine, dodecylamine, hexadecylamine, acetyl palmitate, ricinoleic acid glyceride, hexadecyl stearate, isopropyl myristate, amphoteric acrylic polymers, triethanolamine-lauryl sulfate, alkyl-aryl sulfate, polyethoxylated fatty acid amides, dioctadecyl dimethyl ammonium bromide, ceramide, sphingomyelin, etc. Other non-cationic lipids are described in WO2017/099823 or U.S. Patent Publication US 2018/0028664, the contents of which are incorporated herein by reference in their entirety.

在一些实施例中，非阳离子脂质是油酸或通过援引以其全文并入本文的US 2018/0028664的式I、II或IV的化合物。非阳离子脂质可以占脂质纳米颗粒中存在的总脂质的例如0-30％(mol)。在一些实施例中，非阳离子脂质含量是脂质纳米颗粒中存在的总脂质的5％-20％(mol)或10％-15％(mol)。在实施例中，可电离脂质与中性脂质的摩尔比为约2∶1至约8∶1(例如，约2∶1、3∶1、4∶1、5∶1、6∶1、7∶1或8∶1)。In some embodiments, the non-cationic lipid is oleic acid or a compound of formula I, II or IV of US 2018/0028664, which is incorporated herein by reference in its entirety. The non-cationic lipid may account for, for example, 0-30% (mol) of the total lipid present in the lipid nanoparticle. In some embodiments, the non-cationic lipid content is 5%-20% (mol) or 10%-15% (mol) of the total lipid present in the lipid nanoparticle. In an embodiment, the molar ratio of ionizable lipid to neutral lipid is about 2: 1 to about 8: 1 (e.g., about 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1 or 8: 1).

在一些实施例中，脂质纳米颗粒不包含任何磷脂。In some embodiments, the lipid nanoparticles do not comprise any phospholipids.

在一些方面，脂质纳米颗粒可进一步包含诸如固醇的组分以提供膜完整性。可用于脂质纳米颗粒中的一种示例性固醇是胆固醇及其衍生物。胆固醇衍生物的非限制性实例包括极性类似物，诸如5a-胆甾烷醇、53-粪甾烷醇、胆甾醇基-(2’-羟基)-乙基醚、胆甾醇基-(4′-羟基)-丁基醚和6-酮胆甾烷醇；非极性类似物，诸如5a-胆甾烷、胆甾烯酮、5a-胆甾烷酮、5p-胆甾烷酮和胆甾醇癸酸酯；及其混合物。在一些实施例中，胆固醇衍生物是极性类似物，例如，胆甾醇基-(4′-羟基)-丁基醚。示例性的胆固醇衍生物在PCT公开WO 2009/127060和美国专利公开US 2010/0130588中描述，其中每个通过援引以其全文并入本文。In some aspects, the lipid nanoparticles may further include components such as sterols to provide membrane integrity. An exemplary sterol that can be used in lipid nanoparticles is cholesterol and its derivatives. Non-limiting examples of cholesterol derivatives include polar analogs such as 5a-cholestanol, 53-coprostanol, cholesteryl-(2'-hydroxy)-ethyl ether, cholesteryl-(4'-hydroxy)-butyl ether and 6-ketocholestanol; non-polar analogs such as 5a-cholestanes, cholesterenone, 5a-cholestanone, 5p-cholestanone and cholesterol decanoate; and mixtures thereof. In some embodiments, cholesterol derivatives are polar analogs, for example, cholesteryl-(4'-hydroxy)-butyl ether. Exemplary cholesterol derivatives are described in PCT Publication WO 2009/127060 and U.S. Patent Publication US 2010/0130588, each of which is incorporated herein by reference in its entirety.

在一些实施例中，提供膜完整性的组分，诸如固醇，可占脂质纳米颗粒中存在的总脂质的0-50％(mol)(例如，0-10％、10％-20％、20％-30％、30％-40％或40％-50％)。在一些实施例中，这样的组分是脂质纳米颗粒的总脂质含量的20％-50％(mol)、30％-40％(mol)。In some embodiments, components that provide membrane integrity, such as sterols, may account for 0-50% (mol) of the total lipid present in the lipid nanoparticle (e.g., 0-10%, 10%-20%, 20%-30%, 30%-40% or 40%-50%). In some embodiments, such components are 20%-50% (mol), 30%-40% (mol) of the total lipid content of the lipid nanoparticle.

在一些实施例中，脂质纳米颗粒可包含聚乙二醇(PEG)或缀合的脂质分子。通常，这些用于抑制脂质纳米颗粒的聚集和/或提供空间稳定。示例性的缀合脂质包括但不限于PEG-脂质缀合物、聚噁唑啉(POZ)-脂质缀合物、聚酰胺-脂质缀合物(如ATTA-脂质缀合物)、阳离子聚合物脂质(CPL)缀合物及其混合物。在一些实施例中，缀合脂质分子是PEG-脂质缀合物，例如(甲氧基聚乙二醇)缀合脂质。In certain embodiments, lipid nanoparticles may include polyethylene glycol (PEG) or conjugated lipid molecules. Typically, these are used to suppress the aggregation of lipid nanoparticles and/or provide spatial stabilization. Exemplary conjugated lipids include but are not limited to PEG-lipid conjugates, polyoxazoline (POZ)-lipid conjugates, polyamide-lipid conjugates (such as ATTA-lipid conjugates), cationic polymer lipid (CPL) conjugates and mixtures thereof. In certain embodiments, conjugated lipid molecules are PEG-lipid conjugates, such as (methoxypolyethylene glycol) conjugated lipids.

示例性的PEG-脂质缀合物包括但不限于PEG-二酰基甘油(DAG)(诸如1-(单甲氧基-聚乙二醇)-2，3-二肉豆蔻酰甘油(PEG-DMG))、PEG-二烷氧基丙基(DAA)、PEG-磷脂、PEG-神经酰胺(Cer)、聚乙二醇化磷脂酰乙醇胺(PEG-PE)、1，2-二肉豆蔻酰基-sn-甘油，甲氧基聚乙二醇(DMG-PEG-2K)、PEG琥珀酸二酰基甘油(PEGS-DAG)(诸如4-0-(2′，3′-二(十四烷酰基氧基)丙基-1-0-(w-甲氧基(聚乙氧基)乙基)丁二酸酯(PEG-S-DMG))、PEG二烷氧基丙基氨基甲酸酯、N-(羰基-甲氧基聚乙二醇2000)-1，2-二硬脂酰基-sn-甘油-3-磷酸乙醇胺钠盐或其混合物。另外的示例性PEG-脂质缀合物例如在US 5,885,613、US 6,287,591、US2003/0077829、US 2003/0077829、US 2005/0175682、US 2008/0020058、US 2011/0117125、US 2010/0130588、US 2016/0376224、US 2017/0119904和US/099823中描述，所有这些的内容通过援引以其全文并入本文。在一些实施例中，PEG-脂质是US 2018/0028664的式III、III-a-I、III-a-2、III-b-1、III-b-2或V的化合物，其内容通过援引以其全文并入本文。在一些实施例中，PEG-脂质具有US 20150376115或US 2016/0376224的式II，两者的内容通过援引以其全文并入本文。在一些实施例中，PEG-DAA缀合物可以是例如PEG-二月桂基氧基丙基、PEG-二肉豆蔻基氧基丙基、PEG-二棕榈基氧基丙基或PEG-二硬脂基氧基丙基。PEG-脂质可以是以下的一种或多种：PEG-DMG、PEG-二月桂基甘油、PEG-二棕榈酰甘油、PEG-二硬脂基甘油、PEG-二月桂基甘油脂酰胺、PEG-二肉豆蔻基甘油脂酰胺、PEG-二棕榈酰甘油脂酰胺、PEG-二硬脂基甘油脂酰胺、PEG-胆固醇(1-[8′-(胆甾-5-烯-3[β]-氧基)甲酰胺基-3′，6′-二氧杂辛基]氨基甲酰基-[ω]-甲基-聚(乙二醇))、PEG-DMB(3，4-双十四烷氧基苄基-[ω]-甲基-聚(乙二醇)醚)和1，2-二肉豆蔻酰基-sn-甘油-3-磷酸乙醇胺-N-[甲氧基(聚乙二醇)-2000]。在一些实施例中，PEG-脂质包含PEG-DMG、1，2-二肉豆蔻酰基-sn-甘油-3-磷酸乙醇胺-N-[甲氧基(聚乙二醇)-2000]。在一些实施例中，PEG-脂质包含选自以下的结构：Exemplary PEG-lipid conjugates include, but are not limited to, PEG-diacylglycerol (DAG) (such as 1-(monomethoxy-polyethylene glycol)-2,3-dimyristoylglycerol (PEG-DMG)), PEG-dialkoxypropyl (DAA), PEG-phospholipids, PEG-ceramide (Cer), PEGylated phosphatidylethanolamine (PEG-PE), 1,2-dimyristoyl-sn-glycerol, methoxypolyethylene glycol (DMG-PEG-2K), P PEG succinate diacylglycerol (PEGS-DAG) (such as 4-0-(2',3'-di(tetradecanoyloxy)propyl-1-0-(w-methoxy(polyethoxy)ethyl)succinate (PEG-S-DMG)), PEG dialkoxypropyl carbamate, N-(carbonyl-methoxypolyethylene glycol 2000)-1,2-distearoyl-sn-glycero-3-phosphoethanolamine sodium salt or mixtures thereof. Additional exemplary PEG-lipid conjugates are described, for example, in US Pat. In some embodiments, the PEG-lipid is a compound of formula III, III-a-1, III-a-2, III-b-1, III-b-2 or V of US 2018/0028664, the contents of which are incorporated herein by reference in their entirety. In some embodiments, the PEG-lipid has US 2018/0028664, the contents of which are incorporated herein by reference in their entirety. 20150376115 or Formula II of US 2016/0376224, the contents of both of which are incorporated herein by reference in their entirety. In some embodiments, the PEG-DAA conjugate can be, for example, PEG-dilauryloxypropyl, PEG-dimyristyloxypropyl, PEG-dipalmityloxypropyl, or PEG-distearyloxypropyl. The PEG-lipid can be one or more of the following: PEG-DMG, PEG-dilaurylglycerol, PEG-dipalmitoylglycerol, PEG-distearylglycerol, PEG-dilaurylglyceramide, PEG- Dimyristylglyceramide, PEG-dipalmitoylglyceramide, PEG-distearylglyceramide, PEG-cholesterol (1-[8′-(cholest-5-ene-3[β]-oxy)formamido-3′,6′-dioxaoctyl]carbamoyl-[ω]-methyl-poly(ethylene glycol)), PEG-DMB (3,4-ditetradecyloxybenzyl-[ω]-methyl-poly(ethylene glycol) ether), and 1,2-dimyristoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethylene glycol)-2000]. In some embodiments, the PEG-lipid comprises PEG-DMG, 1,2-dimyristoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethylene glycol)-2000]. In some embodiments, the PEG-lipid comprises a structure selected from the group consisting of:

在一些实施例中，与PEG以外的分子缀合的脂质也可用于代替PEG-脂质。例如，聚噁唑啉(POZ)-脂质缀合物、聚酰胺-脂质缀合物(如ATTA-脂质缀合物)和阳离子聚合物脂质(GPL)缀合物可用于代替PEG-脂质或与PEG-脂质一起使用。In some embodiments, lipids conjugated to molecules other than PEG can also be used in place of PEG-lipids. For example, polyoxazoline (POZ)-lipid conjugates, polyamide-lipid conjugates (such as ATTA-lipid conjugates) and cationic polymer lipid (GPL) conjugates can be used in place of PEG-lipids or together with PEG-lipids.

示例性缀合脂质，即PEG-脂质、(POZ)-脂质缀合物、ATTA-脂质缀合物和阳离子聚合物-脂质在WO 2019051289 A9和WO 2020106946 A1的表2中列出的PCT和LIS专利申请中描述，所有这些的内容通过援引以其全文并入本文。Exemplary conjugated lipids, i.e., PEG-lipids, (POZ)-lipid conjugates, ATTA-lipid conjugates, and cationic polymer-lipids are described in the PCT and LIS patent applications listed in Table 2 of WO 2019051289 A9 and WO 2020106946 A1, the contents of all of which are incorporated herein by reference in their entirety.

在一些实施例中，LNP包含式(xix)化合物、式(xxi)化合物和式(xxv)化合物。在一些实施例中，包含式(xix)、式(xxi)和式(xxv)的配制品的LNP用于将本文所述的基因修饰组合物递送至肺或肺细胞。In some embodiments, the LNP comprises a compound of formula (xix), a compound of formula (xxi), and a compound of formula (xxv). In some embodiments, LNPs comprising formulations of formula (xix), formula (xxi), and formula (xxv) are used to deliver the gene modification compositions described herein to the lung or lung cells.

在一些实施例中，脂质纳米颗粒可包含一种或多种选自式(i)、式(ii)、式(iii)、式(vii)和式(ix)的阳离子脂质。在一些实施例中，LNP可以进一步包含一种或多种中性脂质，例如DSPC、DPPC、DMPC、DOPC、POPC、DOPE、SM，类固醇，例如胆固醇，和/或一种或多种聚合物缀合的脂质，例如聚乙二醇化脂质，例如PEG-DAG、PEG-PE、PEG-S-DAG、PEG-cer或PEG二烷氧基丙基氨基甲酸酯。In some embodiments, the lipid nanoparticle may comprise one or more cationic lipids selected from formula (i), formula (ii), formula (iii), formula (vii) and formula (ix). In some embodiments, the LNP may further comprise one or more neutral lipids, such as DSPC, DPPC, DMPC, DOPC, POPC, DOPE, SM, steroids, such as cholesterol, and/or one or more polymer-conjugated lipids, such as PEGylated lipids, such as PEG-DAG, PEG-PE, PEG-S-DAG, PEG-cer or PEG dialkoxypropylcarbamate.

在一些实施例中，PEG或缀合脂质可以占脂质纳米颗粒中存在的总脂质的0-20％(mol)。在一些实施例中，PEG或缀合脂质的含量为脂质纳米颗粒中存在的总脂质的0.5％-10％或2％-5％(mol)。可电离脂质、非阳离子脂质、固醇和PEG/缀合脂质的摩尔比可以根据需要变化。例如，脂质颗粒可包含按组合物的摩尔或总重量计30％-70％的可电离脂质，按组合物的摩尔或总重量计0-60％的胆固醇，按组合物的摩尔或总重量计0-30％的非阳离子脂质和按组合物的摩尔或总重量计1％-10％的缀合脂质。优选地，组合物包含按组合物的摩尔或总重量计30％-40％的可电离脂质，按组合物的摩尔或总重量计40％-50％的胆固醇，和按组合物的摩尔或总重量计10％-20％的非阳离子脂质。在一些其他实施例中，该组合物是按组合物的摩尔或总重量计50％-75％的可电离脂质，按组合物的摩尔或总重量计20％-40％的胆固醇和按组合物的摩尔或总重量计5％至10％的非阳离子脂质以及按组合物的摩尔或总重量计1％-10％的缀合脂质。该组合物可以含有按组合物的摩尔或总重量计60％-70％的可电离脂质，按组合物的摩尔或总重量计25％-35％的胆固醇，以及按组合物的摩尔或总重量计5％-10％的非阳离子脂质。该组合物还可含有按组合物的摩尔或总重量计高达90％的可电离脂质和按组合物的摩尔或总重量计2％至15％的非阳离子脂质。配制品也可以是脂质纳米颗粒配制品，例如包含按组合物的摩尔或总重量计8％-30％的可电离脂质，按组合物的摩尔或总重量计5％-30％的非阳离子脂质，以及按组合物的摩尔或总重量计0-20％的胆固醇；按组合物的摩尔或总重量计4％-25％的可电离脂质，按组合物的摩尔或总重量计4％-25％的非阳离子脂质，按组合物的摩尔或总重量计2％至25％的胆固醇，按组合物的摩尔或总重量计10％至35％的缀合脂质，以及按组合物的摩尔或总重量计5％的胆固醇；或按组合物的摩尔或总重量计2％-30％的可电离脂质，按组合物的摩尔或总重量计2％-30％的非阳离子脂质，按组合物的摩尔或总重量计1％至15％的胆固醇，按组合物的摩尔或总重量计2％至35％的缀合脂质，以及按组合物的摩尔或总重量计1％-20％的胆固醇；或按组合物的摩尔或总重量计甚至高达90％的可电离脂质和按组合物的摩尔或总重量计2％-10％的非阳离子脂质，或按组合物的摩尔或总重量计甚至100％的阳离子脂质。在一些实施例中，脂质颗粒配制品包含摩尔比为50∶10∶38.5∶1.5的可电离脂质、磷脂、胆固醇和聚乙二醇化脂质。在一些其他实施例中，脂质颗粒配制品包含摩尔比为60∶38.5∶1.5的可电离脂质、胆固醇和聚乙二醇化脂质。In some embodiments, PEG or conjugated lipids can account for 0-20% (mol) of the total lipids present in the lipid nanoparticles. In some embodiments, the content of PEG or conjugated lipids is 0.5%-10% or 2%-5% (mol) of the total lipids present in the lipid nanoparticles. The molar ratio of ionizable lipids, non-cationic lipids, sterols and PEG/conjugated lipids can be changed as needed. For example, lipid particles can include 30%-70% ionizable lipids by mole or total weight of the composition, 0-60% cholesterol by mole or total weight of the composition, 0-30% non-cationic lipids by mole or total weight of the composition and 1%-10% conjugated lipids by mole or total weight of the composition. Preferably, the composition includes 30%-40% ionizable lipids by mole or total weight of the composition, 40%-50% cholesterol by mole or total weight of the composition, and 10%-20% non-cationic lipids by mole or total weight of the composition. In some other embodiments, the composition is 50%-75% ionizable lipids by mole or total weight of the composition, 20%-40% cholesterol by mole or total weight of the composition, and 5% to 10% non-cationic lipids by mole or total weight of the composition, and 1%-10% conjugated lipids by mole or total weight of the composition. The composition may contain 60%-70% ionizable lipids by mole or total weight of the composition, 25%-35% cholesterol by mole or total weight of the composition, and 5%-10% non-cationic lipids by mole or total weight of the composition. The composition may also contain up to 90% ionizable lipids by mole or total weight of the composition and 2% to 15% non-cationic lipids by mole or total weight of the composition. The formulation can also be a lipid nanoparticle formulation, for example comprising 8%-30% ionizable lipid by mole or total weight of the composition, 5%-30% non-cationic lipid by mole or total weight of the composition, and 0-20% cholesterol by mole or total weight of the composition; 4%-25% ionizable lipid by mole or total weight of the composition, 4%-25% non-cationic lipid by mole or total weight of the composition, 2% to 25% cholesterol by mole or total weight of the composition, 10% to 35% conjugated lipid by mole or total weight of the composition, and 0% to 20% cholesterol by mole or total weight of the composition; 5% cholesterol; or 2%-30% ionizable lipids by mole or total weight of the composition, 2%-30% non-cationic lipids by mole or total weight of the composition, 1% to 15% cholesterol by mole or total weight of the composition, 2% to 35% conjugated lipids by mole or total weight of the composition, and 1%-20% cholesterol by mole or total weight of the composition; or even up to 90% ionizable lipids by mole or total weight of the composition and 2%-10% non-cationic lipids by mole or total weight of the composition, or even 100% cationic lipids by mole or total weight of the composition. In some embodiments, the lipid particle formulation comprises ionizable lipids, phospholipids, cholesterol and PEGylated lipids in a molar ratio of 50:10:38.5:1.5. In some other embodiments, the lipid particle formulation comprises ionizable lipids, cholesterol and PEGylated lipids in a molar ratio of 60:38.5:1.5.

在一些实施例中，脂质颗粒包含可电离脂质、非阳离子脂质(例如磷脂)、固醇(例如胆固醇)和聚乙二醇化脂质，其中可电离脂质的脂质摩尔比在20至70摩尔％的范围内，目标为40-60摩尔％，非阳离子脂质的摩尔百分比在0至30摩尔％的范围内，目标为0至15摩尔％，固醇的摩尔百分比在20至70摩尔％的范围内，目标为30至50摩尔％，并且聚乙二醇化脂质的摩尔百分比在1至6摩尔％的范围内，目标为2至5摩尔％。In some embodiments, the lipid particles comprise an ionizable lipid, a non-cationic lipid (e.g., a phospholipid), a sterol (e.g., cholesterol), and a PEGylated lipid, wherein the lipid molar ratio of the ionizable lipid is in the range of 20 to 70 mol%, with a target of 40-60 mol%, the molar percentage of the non-cationic lipid is in the range of 0 to 30 mol%, with a target of 0 to 15 mol%, the molar percentage of the sterol is in the range of 20 to 70 mol%, with a target of 30 to 50 mol%, and the molar percentage of the PEGylated lipid is in the range of 1 to 6 mol%, with a target of 2 to 5 mol%.

在一些实施例中，脂质颗粒包含摩尔比为50∶10∶38.5∶1.5的可电离脂质/非阳离子脂质/固醇/缀合脂质。In some embodiments, the lipid particle comprises a molar ratio of ionizable lipid/non-cationic lipid/sterol/conjugated lipid of 50:10:38.5:1.5.

在一方面，本披露提供了包含磷脂、卵磷脂、磷脂酰胆碱和磷脂酰乙醇胺的脂质纳米颗粒配制品。In one aspect, the present disclosure provides lipid nanoparticle formulations comprising phospholipids, phosphatidylcholine, phosphatidylcholine, and phosphatidylethanolamine.

在一些实施例中，还可以包括一种或多种另外的化合物。那些化合物可以单独施用，或者另外的化合物可以包括在本发明的脂质纳米颗粒中。换言之，除核酸或至少第二核酸之外，脂质纳米颗粒可含有不同于第一核酸的其他化合物。非限制性地，其他另外的化合物可以选自由以下组成的组：小的或大的有机分子或无机分子、单糖、二糖、三糖、寡糖、多糖、肽、蛋白质、其肽类似物和衍生物、肽模拟物、核酸、核酸类似物和衍生物、由生物材料制成的提取物，或其任何组合。In certain embodiments, one or more additional compounds may also be included. Those compounds may be administered alone, or additional compounds may be included in the lipid nanoparticles of the present invention. In other words, except nucleic acid or at least the second nucleic acid, the lipid nanoparticles may contain other compounds that are different from the first nucleic acid. In a non-limiting manner, other additional compounds may be selected from the group consisting of: small or large organic or inorganic molecules, monosaccharides, disaccharides, trisaccharides, oligosaccharides, polysaccharides, peptides, proteins, peptide analogs and derivatives thereof, peptide mimetics, nucleic acids, nucleic acid analogs and derivatives, extracts made from biomaterials, or any combination thereof.

在一些实施例中，脂质纳米颗粒(或包含脂质纳米颗粒的配制品)缺乏反应性杂质(例如，醛或酮)，或包含低于预选水平的反应性杂质(例如，醛或酮)。虽然不希望受理论约束，但在一些实施例中，脂质试剂用于制备脂质纳米颗粒配制品，并且脂质试剂可包含污染性反应性杂质(例如，醛或酮)。可以基于具有低于预选水平的反应性杂质(例如，醛或酮)来选择用于制造的脂质试剂。不希望受理论束缚，在一些实施例中，醛可引起RNA的修饰和损伤，例如，碱基之间的交联和/或脂质与RNA的共价缀合(例如，形成脂质-RNA加合物)。在一些情况下，这可能导致逆转录酶反应失败和/或例如在一个或多个病变的一个或多个位点掺入不适当的碱基，例如新合成的靶DNA中的突变。In some embodiments, lipid nanoparticles (or formulations comprising lipid nanoparticles) lack reactive impurities (e.g., aldehydes or ketones), or include reactive impurities (e.g., aldehydes or ketones) below a preselected level. Although not wishing to be bound by theory, in some embodiments, lipid reagents are used to prepare lipid nanoparticle formulations, and lipid reagents may include contaminating reactive impurities (e.g., aldehydes or ketones). Lipid reagents for manufacture can be selected based on reactive impurities (e.g., aldehydes or ketones) below a preselected level. Not wishing to be bound by theory, in some embodiments, aldehydes may cause modification and damage to RNA, e.g., crosslinking between bases and/or covalent conjugation of lipids to RNA (e.g., forming lipid-RNA adducts). In some cases, this may result in reverse transcriptase reaction failure and/or, for example, incorporation of inappropriate bases at one or more sites of one or more lesions, such as mutations in newly synthesized target DNA.

在一些实施例中，脂质纳米颗粒配制品使用包含小于5％、4％、3％、2％、1％、0.9％、0.8％、0.7％、0.6％、0.5％、0.4％、0.3％、0.2％或0.1％的总反应性杂质(例如醛)含量的脂质试剂产生。在一些实施例中，脂质纳米颗粒配制品使用包含小于5％、4％、3％、2％、1％、0.9％、0.8％、0.7％、0.6％、0.5％、0.4％、0.3％、0.2％或0.1％的任何单一反应性杂质(例如醛)物质的脂质试剂产生。在一些实施例中，脂质纳米颗粒配制品使用脂质试剂产生，该脂质试剂包含：(i)小于5％、4％、3％、2％、1％、0.9％、0.8％、0.7％、0.6％、0.5％、0.4％、0.3％、0.2％或0.1％的总反应性杂质(例如醛)含量；和(ii)小于5％、4％、3％、2％、1％、0.9％、0.8％、0.7％、0.6％、0.5％、0.4％、0.3％、0.2％或0.1％的任何单一反应性杂质(例如醛)物质。在一些实施例中，脂质纳米颗粒配制品使用多种脂质试剂产生，并且多种脂质试剂中的每一种独立地满足本段落中所述的一个或多个标准。在一些实施例中，多种脂质试剂中的每一种满足相同的标准，例如本段落的标准。In some embodiments, the lipid nanoparticle formulations are produced using lipid reagents comprising less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2% or 0.1% of the total reactive impurity (e.g., aldehyde) content. In some embodiments, the lipid nanoparticle formulations are produced using lipid reagents comprising less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2% or 0.1% of any single reactive impurity (e.g., aldehyde) species. In some embodiments, lipid nanoparticle formulations are produced using lipid reagents that include: (i) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2% or 0.1% of total reactive impurities (e.g., aldehydes) content; and (ii) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2% or 0.1% of any single reactive impurity (e.g., aldehyde) material. In some embodiments, lipid nanoparticle formulations are produced using a variety of lipid reagents, and each of the multiple lipid reagents independently meets one or more criteria described in this paragraph. In some embodiments, each of the multiple lipid reagents meets the same criteria, such as the criteria of this paragraph.

在一些实施例中，脂质纳米颗粒配制品包含小于5％、4％、3％、2％、1％、0.9％、0.8％、0.7％、0.6％、0.5％、0.4％、0.3％、0.2％或0.1％的总反应性杂质(例如醛)含量。在一些实施例中，脂质纳米颗粒配制品包含小于5％、4％、3％、2％、1％、0.9％、0.8％、0.7％、0.6％、0.5％、0.4％、0.3％、0.2％或0.1％的任何单一反应性杂质(例如醛)物质。在一些实施例中，脂质纳米颗粒配制品包含：(i)小于5％、4％、3％、2％、1％、0.9％、0.8％、0.7％、0.6％、0.5％、0.4％、0.3％、0.2％或0.1％的总反应性杂质(例如醛)含量；和(ii)小于5％、4％、3％、2％、1％、0.9％、0.8％、0.7％、0.6％、0.5％、0.4％、0.3％、0.2％或0.1％的任何单一反应性杂质(例如醛)物质。In some embodiments, the lipid nanoparticle formulation comprises less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2% or 0.1% of the total reactive impurity (e.g., aldehyde) content. In some embodiments, the lipid nanoparticle formulation comprises less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2% or 0.1% of any single reactive impurity (e.g., aldehyde) species. In some embodiments, the lipid nanoparticle formulation comprises: (i) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content; and (ii) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.

在一些实施例中，用于如本文所述的脂质纳米颗粒或其配制品的一种或多种或任选地所有脂质试剂包含小于5％、4％、3％、2％、1％、0.9％、0.8％、0.7％、0.6％、0.5％、0.4％、0.3％、0.2％或0.1％的总反应性杂质(例如醛)含量。在一些实施例中，用于如本文所述的脂质纳米颗粒或其配制品的一种或多种或任选地所有脂质试剂包含小于5％、4％、3％、2％、1％、0.9％、0.8％、0.7％、0.6％、0.5％、0.4％、0.3％、0.2％或0.1％的任何单一反应性杂质(例如醛)物质。在一些实施例中，用于本文所述的脂质纳米颗粒或其配制品的一种或多种或任选地所有脂质试剂包含：(i)小于5％、4％、3％、2％、1％、0.9％、0.8％、0.7％、0.6％、0.5％、0.4％、0.3％、0.2％或0.1％的总反应性杂质(例如醛)含量；和(ii)小于5％、4％、3％、2％、1％、0.9％、0.8％、0.7％、0.6％、0.5％、0.4％、0.3％、0.2％或0.1％的任何单一反应性杂质(例如醛)物质。In some embodiments, one or more or optionally all lipid reagents used for lipid nanoparticles or their formulations as described herein comprise less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2% or 0.1% of total reactive impurity (e.g., aldehyde) content. In some embodiments, one or more or optionally all lipid reagents used for lipid nanoparticles or their formulations as described herein comprise less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2% or 0.1% of any single reactive impurity (e.g., aldehyde) substance. In some embodiments, one or more or optionally all lipid agents used in the lipid nanoparticles or formulations thereof described herein comprise: (i) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content; and (ii) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.

在一些实施例中，总醛含量和/或任何单一反应性杂质(例如醛)种类的量通过液相色谱法(LC)，例如与串联质谱法(MS/MS)联用，例如根据PCT/US21/20948的实例40中所述的方法来确定。在一些实施例中，反应性杂质(例如醛)含量和/或反应性杂质(例如醛)物质的量通过检测与例如脂质试剂中反应性杂质(例如醛)的存在相关的核酸分子(例如RNA分子，例如如本文所述)的一个或多个化学修饰来确定。在一些实施例中，反应性杂质(例如醛)含量和/或反应性杂质(例如醛)种类的量通过检测例如脂质试剂中与反应性杂质(例如醛)的存在相关联的核苷酸或核苷(例如核糖核苷酸或核糖核苷，例如包含在如本文所述的模板核酸中或从其分离)的一个或多个化学修饰来确定，例如，根据PCT/US21/20948的实例41中所述的方法来确定。在实施例中，核酸分子、核苷酸或核苷的化学修饰通过测定一个或多个修饰的核苷酸或核苷的存在来检测，例如使用LC-MS/MS分析，例如，根据PCT/US21/20948的实例41中所述的方法来检测。In some embodiments, the total aldehyde content and/or the amount of any single reactive impurity (e.g., aldehyde) species is determined by liquid chromatography (LC), e.g., coupled with tandem mass spectrometry (MS/MS), e.g., according to the method described in Example 40 of PCT/US21/20948. In some embodiments, the reactive impurity (e.g., aldehyde) content and/or the amount of reactive impurity (e.g., aldehyde) species is determined by detecting one or more chemical modifications of nucleic acid molecules (e.g., RNA molecules, e.g., as described herein) associated with the presence of reactive impurities (e.g., aldehydes), e.g., in lipid reagents. In some embodiments, the reactive impurity (e.g., aldehyde) content and/or the amount of reactive impurity (e.g., aldehyde) species is determined by detecting one or more chemical modifications of nucleotides or nucleosides (e.g., ribonucleotides or ribonucleosides, e.g., contained in or isolated from a template nucleic acid as described herein), e.g., in lipid reagents associated with the presence of reactive impurities (e.g., aldehydes), e.g., according to the method described in Example 41 of PCT/US21/20948. In an embodiment, chemical modification of a nucleic acid molecule, nucleotide or nucleoside is detected by assaying for the presence of one or more modified nucleotides or nucleosides, e.g., using LC-MS/MS analysis, e.g., according to the method described in Example 41 of PCT/US21/20948.

在一些实施例中，本文所述的核酸(例如，RNA)(例如，模板核酸或编码基因修饰多肽的核酸)不包含醛修饰，或包含少于预选量的醛修饰。在一些实施例中，平均每1000个核苷酸，核酸具有少于50、20、10、5、2或1个醛修饰，例如，其中两个核苷酸的单个交联是单个醛修饰。在一些实施例中，醛修饰是RNA加合物(例如脂质-RNA加合物)。在一些实施例中，醛修饰的核苷酸是碱基之间的交联。在一些实施例中，本文所述的核酸(例如RNA)在核苷酸之间包含少于50、20、10、5、2或1个交联。In some embodiments, the nucleic acids (e.g., RNA) described herein (e.g., template nucleic acids or nucleic acids encoding gene-modified polypeptides) do not comprise aldehyde modifications, or comprise less than a preselected amount of aldehyde modifications. In some embodiments, the nucleic acids have less than 50, 20, 10, 5, 2, or 1 aldehyde modifications per 1000 nucleotides on average, for example, where a single crosslink of two nucleotides is a single aldehyde modification. In some embodiments, the aldehyde modification is an RNA adduct (e.g., a lipid-RNA adduct). In some embodiments, the aldehyde-modified nucleotides are crosslinks between bases. In some embodiments, the nucleic acids (e.g., RNA) described herein comprise less than 50, 20, 10, 5, 2, or 1 crosslinks between nucleotides.

在一些实施例中，通过添加靶向结构域将LNP定向至特定组织。例如，可以将生物配体展示在LNP的表面，以增强与展示同源受体的细胞的相互作用，从而推动与细胞表达受体的组织的相关联和向其中的载物递送。在一些实施例中，生物配体可以是驱动递送至肝的配体，例如展示GalNAc的LNP促使核酸载物递送至展示无唾液酸糖蛋白受体(ASGPR)的肝细胞。Akinc等人Mol Ther[分子疗法]18(7)：1357-1364(2010)的工作传授了将三价GalNAc配体与PEG-脂质缀合(GalNAc-PEG-DSG)以产生依赖于ASGPR的LNP以获得可观察的LNP载物效应(参见例如其中的图6)。其他展示配体的LNP配制品，例如掺入叶酸、转铁蛋白或抗体的配制品，在WO 2017223135中进行了讨论，其通过援引以其全文并入本文，此外还有在其中使用的参考文献也并入本文：即，Kolhatkar等人，Curr Drug Discov Technol [当代药物发现技术].2011 8：197-206；Musacchio和Torchilin，Front Biosci.[生物科学前沿]201116：1388-1412；Yu等人，Mol Membr Biol.[分子膜生物学]2010 27：286-298；Patil等人，Crit Rev Ther Drug Carrier Syst[治疗性药物载剂系统的重要评论].2008 25：1-61；Benoit等人，Biomacromolecules[生物大分子].2011 12：2708-2714；Zhao等人，ExpertOpin Drug Deliv[药物递送专家观点].2008 5：309-319；Akinc等人，Mol Ther[分子疗法].2010 18：1357-1364；Srinivasan等人，Methods Mol Biol[分子生物学方法].2012820：105-116；Ben-Arie等人，Methods Mol Biol[分子生物学方法].2012 757：497-507；Peer 2010 J Control Release[控释杂志].20：63-68；Peer等人，Proc Natl Acad Sci US A.[美国国家科学院院刊]2007 104：4095-4100；Kim等人，Methods Mol Biol.[分子生物学方法]2011 721：339-353；Subramanya等人，Mol Ther[分子疗法].2010 18：2028-2037；Song等人，Nat Biotechnol.[自然生物技术]2005 23：709-717；Peer等人，Science[科学].2008 319：627-630；以及Peer和Lieberman，Gene Ther[基因疗法].2011 18：1127-1133。In some embodiments, LNPs are directed to specific tissues by adding targeting domains. For example, biological ligands can be displayed on the surface of LNPs to enhance interaction with cells displaying cognate receptors, thereby promoting association with tissues in which cells express receptors and cargo delivery therein. In some embodiments, the biological ligand can be a ligand that drives delivery to the liver, for example, LNPs displaying GalNAc promote delivery of nucleic acid cargo to hepatocytes displaying asialoglycoprotein receptors (ASGPR). The work of Akinc et al. Mol Ther [Molecular Therapy] 18(7): 1357-1364 (2010) teaches conjugating trivalent GalNAc ligands to PEG-lipids (GalNAc-PEG-DSG) to produce LNPs that rely on ASGPR to obtain observable LNP cargo effects (see, e.g., FIG. 6 therein). Other LNP formulations displaying ligands, such as those incorporating folic acid, transferrin, or antibodies, are discussed in WO 2017223135, which is incorporated herein by reference in its entirety, along with the references used therein: namely, Kolhatkar et al., Curr Drug Discov Technol. 2011 8:197-206; Musacchio and Torchilin, Front Biosci. 2011 16:1388-1412; Yu et al., Mol Membr Biol. 2010 27:286-298; Patil et al., Crit Rev Ther Drug Carrier Syst. 2008 25:1-61; Benoit et al., Biomacromolecules. 2011 12:2708-2714; Zhao et al., Expert Opin Drug Deliv. 2008 5:309-319; Akinc et al., Mol Ther. 2010 18:1357-1364; Srinivasan et al., Methods Mol Biol. 2012 820:105-116; Ben-Arie et al., Methods Mol Biol. 2012 757:497-507; Peer 2010 J Control Release. 20:63-68; Peer et al., Proc Natl Acad Sci USA. 2007 104:4095-4100; Kim et al., Methods Mol Biol. 2011 721:339-353; Subramanya et al., Mol Ther. 2010 18:2028-2037; Song et al., Nat Biotechnol. 2005 23:709-717; Peer et al., Science. 2008 319:627-630; and Peer and Lieberman, Gene Ther. 2011 18:1127-1133.

在一些实施例中，通过将选择性器官靶向(Selective ORgan Targeting，SORT)分子添加至包含传统组分(例如可电离的阳离子脂质、两亲性磷脂、胆固醇和聚(乙二醇)(PEG))的配制品中来针对组织特异性活性对LNP进行选择。Cheng等人Nat Nanotechnol[自然纳米技术]15(4)：313-320(2020)的传授内容证明，添加补充的“SORT”组分可根据SORT分子的百分比和生物物理特性精确地改变体内RNA递送谱并介导组织特异性(例如，肺、肝、脾脏)基因递送和编辑。In some embodiments, LNPs are selected for tissue-specific activity by adding Selective ORgan Targeting (SORT) molecules to formulations containing traditional components such as ionizable cationic lipids, amphiphilic phospholipids, cholesterol, and poly(ethylene glycol) (PEG). The teachings of Cheng et al. Nat Nanotechnol 15(4): 313-320 (2020) demonstrate that the addition of supplemental "SORT" components can precisely alter the in vivo RNA delivery profile and mediate tissue-specific (e.g., lung, liver, spleen) gene delivery and editing based on the percentage and biophysical properties of the SORT molecules.

在一些实施例中，LNP包含生物可降解的可电离脂质。在一些实施例中，LNP包含(9Z，12Z)-3-((4，4-双(辛氧基)丁酰基)氧基)-2-((((3-(二乙基氨基)丙氧基)羰基)氧基)甲基)丙基十八碳-9，12-二烯酸酯，也称为3-((4，4-双(辛氧基)丁酰基)氧基)-2-((((3-(二乙基氨基)丙氧基)羰基)氧基)甲基)丙基(9Z，12Z)-十八碳-9，12-二烯酸酯)或另一种可电离脂质。参见，例如WO 2019/067992、WO/2017/173054、WO 2015/095340和WO 2014/136086，以及其中提供的参考文献的脂质。在一些实施例中，在LNP脂质的上下文中术语阳离子和可电离是可互换的，例如，其中可电离脂质根据pH是阳离子的。In certain embodiments, LNP comprises biodegradable ionizable lipids. In certain embodiments, LNP comprises (9Z, 12Z)-3-((4,4-bis(octyloxy)butyryl)oxy)-2-((((3-(diethylamino)propoxy)carbonyl)oxy)methyl)propyl octadecane-9,12-dienoate, also referred to as 3-((4,4-bis(octyloxy)butyryl)oxy)-2-((((3-(diethylamino)propoxy)carbonyl)oxy)methyl)propyl (9Z, 12Z)-octadecane-9,12-dienoate) or another ionizable lipid. See, e.g., WO 2019/067992, WO/2017/173054, WO 2015/095340 and WO 2014/136086, and the lipids of the references provided therein. In some embodiments, the terms cationic and ionizable are interchangeable in the context of LNP lipids, for example, where the ionizable lipid is cationic depending on pH.

在一些实施例中，本文所述的LNP包含表19中所述的脂质。In some embodiments, the LNPs described herein comprise the lipids described in Table 19.

表19：示例性脂质Table 19: Exemplary lipids

在一些实施例中，可以将基因修饰系统的多个组分制备为单一LNP配制品，例如，LNP配制品包含编码基因修饰多肽的mRNA和RNA模板。可以改变核酸组分的比率以便最大化治疗剂的特性。在一些实施例中，RNA模板与编码基因修饰多肽的mRNA的比率为按摩尔比计约1∶1至100∶1，例如约1∶1至20∶1、约20∶1至40∶1、约40∶1至60∶1、约60∶1至80∶1、或约80∶1至100∶1。在其他实施例中，可以由单独的配制品制备多种核酸的系统，例如，包含模板RNA的一种LNP配制品和包含编码基因修饰多肽的mRNA的第二LNP配制品。在一些实施例中，该系统可以包含配制到LNP中的多于两种核酸组分。在一些实施例中，该系统可以包含蛋白质(例如，基因修饰多肽)以及配制到至少一种LNP配制品中的模板RNA。In some embodiments, multiple components of the gene modification system can be prepared as a single LNP formulation, for example, the LNP formulation includes mRNA and RNA templates encoding gene modification polypeptides. The ratio of nucleic acid components can be changed to maximize the characteristics of the therapeutic agent. In some embodiments, the ratio of RNA template to mRNA encoding gene modification polypeptide is about 1: 1 to 100: 1 in molar ratio, for example, about 1: 1 to 20: 1, about 20: 1 to 40: 1, about 40: 1 to 60: 1, about 60: 1 to 80: 1 or about 80: 1 to 100: 1. In other embodiments, a system of multiple nucleic acids can be prepared by a separate formulation, for example, a LNP formulation comprising template RNA and a second LNP formulation comprising mRNA encoding gene modification polypeptides. In some embodiments, the system can include more than two nucleic acid components formulated into LNP. In some embodiments, the system can include a protein (e.g., gene modification polypeptide) and a template RNA formulated into at least one LNP formulation.

在一些实施例中，LNP配制品的平均LNP直径可以在数十nm和数百nm之间，例如通过动态光散射(DLS)测量的。在一些实施例中，LNP配制品的平均LNP直径可以为约40nm至约150nm，如约40nm、45nm、50nm、55nm、60nm、65nm、70nm、75nm、80nm、85nm、90nm、95nm、100nm、105nm、110nm、115nm、120nm、125nm、130nm、135nm、140nm、145nm或150nm。在一些实施例中，LNP配制品的平均LNP直径可为约50nm至约100nm、约50nm至约90nm、约50nm至约80nm、约50nm至约70nm、约50nm至约60nm、约60nm至约100nm、约60nm至约90nm、约60nm至约80nm、约60nm至约70nm、约70nm至约100nm、约70nm至约90nm、约70nm至约80nm、约80nm至约100nm、约80nm至约90nm或约90nm至约100nm。在一些实施例中，LNP配制品的平均LNP直径可为约70nm至约100nm。在特定实施例中，LNP配制品的平均LNP直径可为约80nm。在一些实施例中，LNP配制品的平均LNP直径可为约100nm。在一些实施例中，LNP配制品的平均LNP直径范围为约1mm至约500mm、约5mm至约200mm、约10mm至约100mm、约20mm至约80mm、约25mm至约60mm、约30mm至约55mm、约35mm至约50mm，或约38mm至约42mm。In some embodiments, the average LNP diameter of the LNP formulation can be between tens of nm and hundreds of nm, for example, as measured by dynamic light scattering (DLS). In some embodiments, the average LNP diameter of the LNP formulation can be from about 40 nm to about 150 nm, such as about 40 nm, 45 nm, 50 nm, 55 nm, 60 nm, 65 nm, 70 nm, 75 nm, 80 nm, 85 nm, 90 nm, 95 nm, 100 nm, 105 nm, 110 nm, 115 nm, 120 nm, 125 nm, 130 nm, 135 nm, 140 nm, 145 nm, or 150 nm. In some embodiments, the average LNP diameter of LNP formulations may be from about 50nm to about 100nm, from about 50nm to about 90nm, from about 50nm to about 80nm, from about 50nm to about 70nm, from about 50nm to about 60nm, from about 60nm to about 100nm, from about 60nm to about 90nm, from about 60nm to about 80nm, from about 60nm to about 70nm, from about 70nm to about 100nm, from about 70nm to about 90nm, from about 70nm to about 80nm, from about 80nm to about 100nm, from about 80nm to about 90nm, or from about 90nm to about 100nm. In some embodiments, the average LNP diameter of LNP formulations may be from about 70nm to about 100nm. In specific embodiments, the average LNP diameter of LNP formulations may be about 80nm. In some embodiments, the average LNP diameter of LNP formulations may be about 100nm. In some embodiments, the LNP formulation has an average LNP diameter ranging from about 1 mm to about 500 mm, about 5 mm to about 200 mm, about 10 mm to about 100 mm, about 20 mm to about 80 mm, about 25 mm to about 60 mm, about 30 mm to about 55 mm, about 35 mm to about 50 mm, or about 38 mm to about 42 mm.

在一些情况下，LNP可以是相对均质的。多分散性指数可用于指示LNP的均质性，例如脂质纳米颗粒的粒度分布。小的(例如，小于0.3)多分散性指数通常指示窄的粒度分布。LNP的多分散性指数可为约0至约0.25，如0.01、0.02、0.03、0.04、0.05、0.06、0.07、0.08、0.09、0.10、0.11、0.12、0.13、0.14、0.15、0.16、0.17、0.18、0.19、0.20、0.21、0.22、0.23、0.24或0.25。在一些实施例中，LNP的多分散性指数可为约0.10至约0.20。In some cases, LNP can be relatively homogeneous.Polydispersity index can be used for indicating the homogeneity of LNP, for example the size distribution of lipid nanoparticles.Small (for example, less than 0.3) polydispersity index usually indicates narrow size distribution.The polydispersity index of LNP can be about 0 to about 0.25, such as 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24 or 0.25.In certain embodiments, the polydispersity index of LNP can be about 0.10 to about 0.20.

LNP的ζ电位可用于指示组合物的电动电位。在一些实施例中，ζ电位可以描述LNP的表面电荷。具有相对低电荷(正电荷或负电荷)的脂质纳米颗粒通常是期望的，因为更高电荷的物质可能不理想地与体内的细胞、组织和其他元素相互作用。在一些实施例中，LNP的ζ电位可为约-10mV至约+20mV、约-10mV至约+15mV、约-10mV至约+10mV、约-10mV至约+5mV、约-10mV至约0mV、约-10mV至约-5mV、约-5mV至约+20mV、约-5mV至约+15mV、约-5mV至约+10mV、约-5mV至约+5mV、约-5mV至约0mV、约0mV至约+20mV、约0mV至约+15mV、约0mV至约+10mV、约0mV至约+5mV、约+5mV至约+20mV、约+5mV至约+15mV或约+5mV至约+10mV。The zeta potential of LNP can be used to indicate the electrokinetic potential of the composition. In certain embodiments, the zeta potential can describe the surface charge of LNP. Lipid nanoparticles with relatively low charge (positive or negative charge) are generally desirable because higher charged materials may not interact with cells, tissues and other elements in the body undesirably. In some embodiments, the zeta potential of the LNP may be from about -10 mV to about +20 mV, from about -10 mV to about +15 mV, from about -10 mV to about +10 mV, from about -10 mV to about +5 mV, from about -10 mV to about 0 mV, from about -10 mV to about -5 mV, from about -5 mV to about +20 mV, from about -5 mV to about +15 mV, from about -5 mV to about +10 mV, from about -5 mV to about +5 mV, from about -5 mV to about 0 mV, from about 0 mV to about +20 mV, from about 0 mV to about +15 mV, from about 0 mV to about +10 mV, from about 0 mV to about +5 mV, from about 0 mV to about +20 mV, from about 0 mV to about +15 mV, from about 0 mV to about +10 mV, from about 0 mV to about +5 mV, from about +5 mV to about +20 mV, from about +5 mV to about +15 mV, or from about +5 mV to about +10 mV.

蛋白质和/或核酸(例如，基因修饰多肽或编码该多肽的mRNA)的包封效率描述了相对于所提供的初始量，在制备后被包封或以其他方式与LNP相关联的蛋白质和/或核酸的量。包封效率理想的是较高(例如，接近100％)。包封效率可以例如通过比较在用一种或多种有机溶剂或去垢剂破碎脂质纳米颗粒之前和之后含有脂质纳米颗粒的溶液中蛋白质或核酸的量来测量。阴离子交换树脂可用于测量溶液中游离蛋白质或核酸(例如RNA)的量。荧光可用于测量溶液中游离蛋白质和/或核酸(例如RNA)的量。对于本文所述的脂质纳米颗粒，蛋白质和/或核酸的包封效率可以是至少50％，例如50％、55％、60％、65％、70％、75％、80％、85％、90％、91％、92％、93％、94％、95％、96％、97％、98％、99％或100％。在一些实施例中，包封效率可以是至少80％。在一些实施例中，包封效率可以是至少90％。在一些实施例中，包封效率可以是至少95％。The encapsulation efficiency of protein and/or nucleic acid (e.g., genetically modified polypeptide or mRNA encoding the polypeptide) describes the amount of protein and/or nucleic acid that is encapsulated or otherwise associated with LNP after preparation relative to the initial amount provided. Encapsulation efficiency is ideally higher (e.g., close to 100%). Encapsulation efficiency can be measured, for example, by comparing the amount of protein or nucleic acid in a solution containing lipid nanoparticles before and after the lipid nanoparticles are broken with one or more organic solvents or detergents. Anion exchange resins can be used to measure the amount of free protein or nucleic acid (e.g., RNA) in a solution. Fluorescence can be used to measure the amount of free protein and/or nucleic acid (e.g., RNA) in a solution. For lipid nanoparticles described herein, the encapsulation efficiency of proteins and/or nucleic acids can be at least 50%, for example 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%. In some embodiments, the encapsulation efficiency can be at least 80%. In some embodiments, the encapsulation efficiency can be at least 90%. In some embodiments, the encapsulation efficiency can be at least 95%.

LNP可以任选地包含一层或多层包衣。在一些实施例中，LNP可以配制在具有包衣的胶囊、膜或片剂中。包含本文所述的组合物的胶囊、膜或片剂可具有任何可用的尺寸、拉伸强度、硬度或密度。The LNP may optionally include one or more layers of coating. In some embodiments, the LNP may be formulated in a capsule, film or tablet having a coating. The capsule, film or tablet comprising the composition described herein may have any available size, tensile strength, hardness or density.

另外的示例性脂质、配制品、方法和LNP表征由WO 2020061457传授，其通过援引以其全文并入本文。Additional exemplary lipids, formulations, methods, and LNP characterizations are taught by WO 2020061457, which is incorporated herein by reference in its entirety.

在一些实施例中，使用Lipofectamine MessengerMax(赛默飞世尔公司(ThermoFisher))或TransIT-mRNA转染试剂(米卢斯生物公司(Mirus Bio))进行体外或离体细胞脂质转染。在某些实施例中，使用GenVoy_ILM可电离脂质混合物(精密纳米系统(PrecisionNanoSystems))配制LNP。在某些实施例中，使用2，2-二亚油烯基-4-二甲基氨基乙基-[1，3]-二氧戊环(DLin-KC2-DMA)或二亚油烯基甲基-4-二甲基氨基丁酸酯(DLin-MC3-DMA或MC3)配制LNP，其配制和体内用途在Jayaraman等人Angew Chem Int Ed Engl[德国应用化学]51(34)：8529-8533(2012)中传授，其通过援引以其全文并入本文。In some embodiments, in vitro or ex vivo cell lipid transfection is performed using Lipofectamine MessengerMax (ThermoFisher) or TransIT-mRNA transfection reagent (Mirus Bio). In certain embodiments, LNPs are formulated using GenVoy_ILM ionizable lipid mixtures (Precision NanoSystems). In certain embodiments, LNPs are formulated using 2,2-dilinoleyl-4-dimethylaminoethyl-[1,3]-dioxolane (DLin-KC2-DMA) or dilinoleylmethyl-4-dimethylaminobutyrate (DLin-MC3-DMA or MC3), the formulation and in vivo use of which are taught in Jayaraman et al. Angew Chem Int Ed Engl [German Applied Chemistry] 51(34): 8529-8533 (2012), which is incorporated herein by reference in its entirety.

优化用于递送CRISPR-Cas系统(例如Cas9-gRNA RNP、gRNA、Cas9 mRNA)的LNP配制品在两者均通过援引并入的WO 2019067992和WO 2019067910中描述。LNP formulations optimized for delivery of CRISPR-Cas systems (e.g., Cas9-gRNA RNPs, gRNA, Cas9 mRNA) are described in WO 2019067992 and WO 2019067910, both of which are incorporated by reference.

可用于递送核酸的另外的特定LNP配制品在两者均通过援引并入的US 8158601和US 8168775中描述，其包括帕替西兰(patisiran)中使用的以名称ONPATTRO销售的配制品。Additional specific LNP formulations useful for delivery of nucleic acids are described in US 8,158,601 and US 8,168,775, both of which are incorporated by reference, including the formulation sold under the name ONPATTRO used in patisiran.

基因修饰LNP的示例性给药可包括约0.1、0.25、0.3、0.5、1、2、3、4、5、6、8、10或100mg/kg(RNA)。包含编码系统的一种或多种组分的核酸的AAV的示例性给药可包括约10¹¹、10¹²、10¹³和10¹⁴vg/kg的MOI。Exemplary dosing of genetically modified LNPs may include about 0.1, 0.25, 0.3, 0.5, 1, 2, 3, 4, 5, 6, 8, 10, or 100 mg/kg (RNA). Exemplary dosing of AAVs comprising nucleic acids encoding one or more components of the system may include MOIs of about 10 ¹¹ , 10 ¹² , 10 ¹³ , and 10 ¹⁴ vg/kg.

试剂盒、制品和药物组合物Kits, preparations and pharmaceutical compositions

在一方面，本披露提供了一种试剂盒，其包含基因修饰多肽或基因修饰系统，例如，如本文所述。在一些实施例中，试剂盒包含基因修饰多肽(或编码多肽的核酸)和模板RNA(或编码模板RNA的DNA)。在一些实施例中，该试剂盒进一步包含用于将系统引入细胞的试剂，例如转染试剂、LNP等。在一些实施例中，该试剂盒适用于本文所述的任何方法。在一些实施例中，该试剂盒包含一种或多种元件、组合物(例如，药物组合物)、基因修饰多肽和/或基因修饰系统，或其功能片段或组分，它们例如布置在制品中。在一些实施例中，该试剂盒包含其使用说明书。On the one hand, the present disclosure provides a kit comprising a genetically modified polypeptide or a genetically modified system, for example, as described herein. In some embodiments, the kit comprises a genetically modified polypeptide (or a nucleic acid encoding a polypeptide) and a template RNA (or a DNA encoding a template RNA). In some embodiments, the kit further comprises reagents for introducing the system into cells, such as transfection reagents, LNPs, etc. In some embodiments, the kit is suitable for any method described herein. In some embodiments, the kit comprises one or more elements, compositions (e.g., pharmaceutical compositions), genetically modified polypeptides and/or genetically modified systems, or functional fragments or components thereof, which are, for example, arranged in articles. In some embodiments, the kit comprises instructions for use thereof.

在一方面，本披露提供了一种制品，例如，其中布置有本文所述的试剂盒或其组分。In one aspect, the disclosure provides an article of manufacture, eg, having disposed therein a kit described herein or components thereof.

在一方面，本披露提供了一种药物组合物，其包含基因修饰多肽或基因修饰系统，例如，如本文所述。在一些实施例中，药物组合物进一步包含药学上可接受的载剂或赋形剂。在一些实施例中，药物组合物包含模板RNA和/或编码多肽的RNA。在实施例中，药物组合物具有以下特征中的一个或多个(例如，1、2、3或4个)：In one aspect, the present disclosure provides a pharmaceutical composition comprising a genetically modified polypeptide or a genetically modified system, for example, as described herein. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier or excipient. In some embodiments, the pharmaceutical composition comprises a template RNA and/or an RNA encoding a polypeptide. In an embodiment, the pharmaceutical composition has one or more (e.g., 1, 2, 3, or 4) of the following features:

(a)相对于模板RNA和/或编码多肽的RNA少于1％(例如少于0.5％、0.4％、0.3％、0.2％或0.1％)的DNA模板，例如，以摩尔计；(a) less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2% or 0.1%) of DNA template relative to template RNA and/or RNA encoding a polypeptide, e.g., on a molar basis;

(b)相对于模板RNA和/或编码多肽的RNA少于1％(例如少于0.5％、0.4％、0.3％、0.2％或0.1％)的未加帽RNA，例如，以摩尔计；(b) less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2% or 0.1%) uncapped RNA relative to template RNA and/or RNA encoding a polypeptide, e.g., on a molar basis;

(c)相对于模板RNA和/或编码多肽的RNA少于1％(例如少于0.5％、0.4％、0.3％、0.2％或0.1％)的部分长度RNA，例如，以摩尔计；(c) less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2% or 0.1%) of partial-length RNA relative to the template RNA and/or RNA encoding the polypeptide, e.g., on a molar basis;

(d)基本上缺乏未反应的帽二核苷酸。(d) Substantially lacking unreacted cap dinucleotide.

化学、制造和控制(CMC)Chemistry, Manufacturing and Controls (CMC)

在例如以下文献中描述了蛋白治疗剂的纯化：Franks，Protein Biotechnology：Isolation，Characterization，and Stabilization[蛋白生物技术：分离、表征、和稳定化]，Humana Press[胡玛纳出版社](2013)；以及Cutler，Protein PurificationProtocols[蛋白纯化方案](Methods in Molecular Biology[分子生物学方法])，HumanaPress[胡玛纳出版社](2010)。Purification of protein therapeutics is described, for example, in Franks, Protein Biotechnology: Isolation, Characterization, and Stabilization, Humana Press (2013); and Cutler, Protein Purification Protocols (Methods in Molecular Biology), Humana Press (2010).

在一些实施例中，基因修饰系统、多肽和/或模板核酸(例如，模板RNA)符合某些质量标准。在一些实施例中，通过本文所述的方法产生的基因修饰系统、多肽和/或模板核酸(例如，模板RNA)符合某些质量标准。因此，在一些方面，本披露涉及制造符合某些质量标准的基因修饰系统、多肽和/或模板核酸(例如，模板RNA)的方法，例如，其中所述质量标准已测定。在一些方面，本披露还涉及在基因修饰系统、多肽和/或模板核酸(例如，模板RNA)中测定所述质量标准的方法。在一些实施例中，质量标准包括但不限于以下中的一项或多项(例如，1、2、3、4、5、6、7、8、9、10、11、或12项)：In some embodiments, the genetic modification system, polypeptide and/or template nucleic acid (e.g., template RNA) meets certain quality standards. In some embodiments, the genetic modification system, polypeptide and/or template nucleic acid (e.g., template RNA) produced by the methods described herein meets certain quality standards. Therefore, in some aspects, the disclosure relates to methods for manufacturing genetic modification systems, polypeptides and/or template nucleic acids (e.g., template RNA) that meet certain quality standards, for example, wherein the quality standards have been determined. In some aspects, the disclosure also relates to methods for determining the quality standards in genetic modification systems, polypeptides and/or template nucleic acids (e.g., template RNA). In some embodiments, quality standards include, but are not limited to, one or more of the following (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 items):

(i)模板RNA的长度，例如，模板RNA的长度是否大于参考长度或在参考长度范围内，例如是否存在的模板RNA中的至少80％、85％、90％、95％、96％、97％、98％或99％的长度大于100、125、150、175或200个核苷酸；(i) the length of the template RNA, e.g., whether the length of the template RNA is greater than or within a reference length, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% of the template RNAs present are greater than 100, 125, 150, 175 or 200 nucleotides in length;

(ii)模板RNA上聚A尾的存在、不存在和/或长度，例如，是否存在的模板RNA中的至少80％、85％、90％、95％、96％、97％、98％或99％含有聚A尾(例如，长度为至少5、10、20、30、50、70、100个核苷酸的聚A尾)；(ii) the presence, absence and/or length of a poly A tail on the template RNA, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the template RNA present contains a poly A tail (e.g., a poly A tail of at least 5, 10, 20, 30, 50, 70, 100 nucleotides in length);

(iii)模板RNA上5’帽的存在、不存在和/或类型，例如，是否存在的模板RNA中的至少80％、85％、90％、95％、96％、97％、98％或99％含有5’帽，例如，该帽是否是7-甲基鸟苷帽，例如O-Me-m7G帽；(iii) the presence, absence and/or type of a 5' cap on the template RNA, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% of the template RNAs present contain a 5' cap, e.g., whether the cap is a 7-methylguanosine cap, e.g., an O-Me-m7G cap;

(iv)该模板RNA中一个或多个经修饰核苷酸(例如，选自假尿苷、二氢尿苷、肌苷、7-甲基鸟苷、1-N-甲基假尿苷(1-Me-Ψ)、5-甲氧基尿苷(5-MO-U)、5-甲基胞苷(5mC)或锁核苷酸)的存在、不存在和/或类型，例如，是否存在的模板RNA中的至少80％、85％、90％、95％、96％、97％、98％或99％含有一个或多个经修饰核苷酸；(iv) the presence, absence and/or type of one or more modified nucleotides (e.g., selected from pseudouridine, dihydrouridine, inosine, 7-methylguanosine, 1-N-methylpseudouridine (1-Me-Ψ), 5-methoxyuridine (5-MO-U), 5-methylcytidine (5mC) or locked nucleotide) in the template RNA, for example, whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% of the template RNA present contains one or more modified nucleotides;

(v)模板RNA的稳定性(例如，随着时间的推移和/或在预先选择的条件下)，例如是否至少80％、85％、90％、95％、96％、97％、98％或99％的模板RNA在稳定性测试后保持完整(例如，长度大于100、125、150、175或200个核苷酸)；(v) stability of the template RNA (e.g., over time and/or under pre-selected conditions), such as whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the template RNA remains intact (e.g., greater than 100, 125, 150, 175, or 200 nucleotides in length) after stability testing;

(vi)模板RNA在用于修饰DNA的系统中的效力，例如，在测定包含该模板RNA的系统的效力之后，是否至少1％的靶位点被修饰；(vi) the efficacy of the template RNA in a system for modifying DNA, e.g., whether at least 1% of the target sites are modified after determining the efficacy of a system comprising the template RNA;

(vii)多肽、第一多肽或第二多肽的长度，例如，该多肽、第一多肽或第二多肽的长度是否超出参考长度或在参考长度范围内，例如是否存在的至少80％、85％、90％、95％、96％、97％、98％、或99％的多肽、第一多肽或第二多肽的长度大于600、650、700、750、800、850、900、950、1000、1050、1100、1150、1200、1250、1300、1350、1400、1450、1500、1600、1700、1800、1900、或2000个氨基酸(并且任选地，长度不超过2500、2000、1500、1400、1300、1200、1100、1000、900、800、700、或600个氨基酸)；(vii) the length of the polypeptide, the first polypeptide, or the second polypeptide, for example, whether the length of the polypeptide, the first polypeptide, or the second polypeptide exceeds the reference length or is within the reference length range, for example, whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the polypeptides, the first polypeptide, or the second polypeptide are greater than 600, 650, 700, 750, 800, 850, 900, 950, 1000, or more in length. , 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1600, 1700, 1800, 1900, or 2000 amino acids (and optionally, no more than 2500, 2000, 1500, 1400, 1300, 1200, 1100, 1000, 900, 800, 700, or 600 amino acids in length);

(viii)多肽、第一多肽或第二多肽上翻译后修饰的存在、不存在和/或类型，例如是否至少80％、85％、90％、95％、96％、97％、98％、或99％的多肽、第一多肽或第二多肽含有磷酸化、甲基化、乙酰化、肉豆蔻酰化、棕榈酰化、异戊二烯化、glipyatyon或脂酰化，或其任何组合；(viii) the presence, absence and/or type of post-translational modification on the polypeptide, the first polypeptide or the second polypeptide, such as whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the polypeptide, the first polypeptide or the second polypeptide contains phosphorylation, methylation, acetylation, myristoylation, palmitoylation, prenylation, glipyatyon or fatty acylation, or any combination thereof;

(ix)多肽、第一多肽或第二多肽中一种或多种人工、合成或非典型氨基酸(例如，选自鸟氨酸、β-丙氨酸、GABA、δ-氨基乙酰丙酸、PABA、D-氨基酸(例如，D-丙氨酸或D-谷氨酸)、氨基异丁酸、脱氢丙氨酸、胱硫醚、羊毛硫氨酸、甲烯胱氨酸、二氨基庚二酸、高丙氨酸、正缬氨酸、正亮氨酸、高正亮氨酸(Homonorleucine)、高丝氨酸、O-甲基-高丝氨酸和O-乙基-高丝氨酸、乙硫氨酸、硒代半胱氨酸、硒代高半胱氨酸、硒代甲硫氨酸、硒代乙硫氨酸、碲代半胱氨酸或碲代甲硫氨酸)的存在、不存在和/或类型，例如是否存在的至少80％、85％、90％、95％、96％、97％、98％或99％的多肽、第一多肽或第二多肽含有一个或多个人工、合成或非典型氨基酸；(ix) the presence, absence and/or type of one or more artificial, synthetic or atypical amino acids (e.g., selected from ornithine, β-alanine, GABA, δ-aminolevulinic acid, PABA, D-amino acids (e.g., D-alanine or D-glutamic acid), aminoisobutyric acid, dehydroalanine, cystathionine, lanthionine, methionine, diaminopimelate, homoalanine, norvaline, norleucine, homonorleucine (Homonorleucine), homoserine, O-methyl-homoserine and O-ethyl-homoserine, ethionine, selenocysteine, selenohomocysteine, selenomethionine, selenoethionine, tellurocysteine or telluromethionine) in the polypeptides, the first polypeptide or the second polypeptide, such as whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% of the polypeptides, the first polypeptide or the second polypeptide contain the one or more artificial, synthetic or atypical amino acids;

(x)多肽、第一多肽或第二多肽的稳定性(例如，随着时间的推移和/或在预选条件下)，例如是否至少80％、85％、90％、95％、96％、97％、98％或99％的多肽、第一多肽或第二多肽在稳定性测试后保持完整(例如，长度大于600、650、700、750、800、850、900、950、1000、1050、1100、1150、1200、1250、1300、1350、1400、1450、1500、1600、1700、1800、1900、或2000个氨基酸(并且任选地，长度不超过2500、2000、1500、1400、1300、1200、1100、1000、900、800、700、或600个氨基酸))；(x) the stability of the polypeptide, the first polypeptide, or the second polypeptide (e.g., over time and/or under preselected conditions), such as whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the polypeptide, the first polypeptide, or the second polypeptide remains intact after the stability test (e.g., a polypeptide greater than 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 120 ... 0, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1600, 1700, 1800, 1900, or 2000 amino acids in length (and optionally, no more than 2500, 2000, 1500, 1400, 1300, 1200, 1100, 1000, 900, 800, 700, or 600 amino acids in length);

(xi)该多肽、第一多肽或第二多肽在用于修饰DNA的系统中的效力，例如在测定了包含该多肽、第一多肽或第二多肽的系统的效力之后是否至少1％的靶位点被修饰；或(xi) the efficacy of the polypeptide, the first polypeptide, or the second polypeptide in a system for modifying DNA, such as whether at least 1% of the target sites are modified after determining the efficacy of a system comprising the polypeptide, the first polypeptide, or the second polypeptide; or

(xii)热原、病毒、真菌、细菌病原体或宿主细胞蛋白中的一种或多种的存在、不存在、和/或水平，例如，系统是否不含或基本上不含热原、病毒、真菌、细菌病原体或宿主细胞蛋白污染。(xii) the presence, absence, and/or level of one or more of pyrogens, viruses, fungi, bacterial pathogens, or host cell proteins, e.g., whether the system is free or substantially free of pyrogens, viruses, fungi, bacterial pathogens, or host cell protein contamination.

在一些实施例中，本文所述的系统或药物组合物不含内毒素。In some embodiments, the systems or pharmaceutical compositions described herein are free of endotoxin.

在一些实施例中，对热原、病毒、真菌、细菌病原体和/或宿主细胞蛋白中的一种或多种的存在、不存在、和/或水平进行确定。在实施例中，对系统是否不含或基本上不含热原、病毒、真菌、细菌病原体和/或宿主细胞蛋白污染进行确定。In some embodiments, the presence, absence, and/or level of one or more of a pyrogen, a virus, a fungus, a bacterial pathogen, and/or a host cell protein is determined. In embodiments, it is determined whether the system is free of or substantially free of pyrogen, virus, fungus, bacterial pathogen, and/or host cell protein contamination.

在一些实施例中，如本文所述的药物组合物或系统具有以下特征中的一项或多项(例如，1、2、3或4项)：In some embodiments, a pharmaceutical composition or system as described herein has one or more (e.g., 1, 2, 3, or 4) of the following features:

(c)相对于模板RNA和/或编码多肽的RNA少于1％(例如少于0.5％、0.4％、0.3％、0.2％或0.1％)的部分长度RNA，例如，以摩尔计；(c) less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2% or 0.1%) of partial-length RNA relative to the template RNA and/or the RNA encoding the polypeptide, e.g., on a molar basis;

实例Examples

实例1：筛选纠正与人细胞基因组着陆垫中突变相关的镰状细胞病的模板RNA的构型Example 1: Screening for template RNA conformations that correct sickle cell disease associated with mutations in the human cell genome landing pad

该实例描述了使用基因修饰系统来定量模板RNA纠正HBB：E6V突变(也称为E7V或HbS变体；NC_000011.10：g.5227002T＞A)的活性，该基因修饰系统含有基因修饰多肽和模板RNA，该模板RNA包含不同长度的异源对象序列和PBS序列。在本实例中，模板RNA含有：This example describes the use of a gene modification system to quantify the activity of a template RNA to correct the HBB:E6V mutation (also known as E7V or HbS variant; NC_000011.10:g.5227002T>A), the gene modification system comprising a gene modification polypeptide and a template RNA comprising a heterologous subject sequence and a PBS sequence of varying lengths. In this example, the template RNA comprises:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

表1-4中描述的一种或多种模板RNA可以如本实例中所述进行测试。异源对象序列和PBS序列设计用于通过基因编辑在突变位点用“T”核苷酸替换“A”核苷酸来纠正着陆垫中的SCD突变，以逆转相应蛋白质中的E6V突变。One or more template RNAs described in Tables 1-4 can be tested as described in this example. The heterologous subject sequence and PBS sequence are designed to correct the SCD mutation in the landing pad by replacing the "A" nucleotide with a "T" nucleotide at the mutation site through gene editing to reverse the E6V mutation in the corresponding protein.

建立细胞系，使其具有“着陆垫”或稳定的整合体，模拟包含E6V突变位点和侧翼序列的HBB基因区域。在一些实施例中，用于筛选的细胞系可相对于患者或参考序列(例如hg38人类基因组参考序列)含有HBB基因座中一个或多个另外的SNP，且含有靶突变的着陆垫任选地经设计以携带一个或多个非致病性SNP以匹配内源细胞系HBB基因座，例如经设计以携带重现HEK293T细胞中的内源HBB基因座中所存在的SNP的突变。不希望受实例的限制，应当理解，发现相对于参考序列在含有另外的SNP的位点处成功编辑靶突变的模板RNA序列将不同于与该另外的SNP重叠的任何区中的治疗性模板RNA。例如，其中基因组着陆垫含有靶突变(对应于由DNA取代NC_000011.10：g.5227002T＞A引起的内源E6V突变)及原间隔子中相对于hg38的另外的取代(对应于HEK293T细胞中的内源HBB基因座处的NC_000011.10：g.5227013T＞C突变)的基于HEK293T的筛选测定的成功模板RNA可提供候选组成，其中对应的治疗性模板RNA将因此在相对于筛选模板RNA的对应间隔区的间隔区中具有取代(C＞T)，以便使得能够治疗性纠正在缺乏另外的取代的靶位点处，例如在包含致病性E6V突变但其他方面匹配hg38参考序列的靶位点处的E6V突变。在本实例中，含有靶位点着陆垫(其包含在原间隔子区中具有另外的T＞C取代的致病性突变)的筛选细胞系可能使用包含间隔子序列的筛选模板RNA纠正，而对应的治疗性模板RNA可能包含间隔子序列其中带下划线的核苷酸指示改变以匹配筛选细胞靶序列或hg38靶序列的位置。在一些实施例中，间隔子、PBS和/或RT模板区可能需要以该方式调节以考虑筛选与参考靶序列之间的任何差异。还预期给定患者或患者群体除了致病性E6V突变之外可在靶基因座处相对于hg38具有一个或多个SNP，且因此可使用候选模板RNA分子的类似适应性来生成对患者或患者群体具特异性的模板RNA序列。Cell lines are established with a "landing pad" or stable integrant that mimics the region of the HBB gene containing the E6V mutation site and flanking sequences. In some embodiments, the cell line used for screening may contain one or more additional SNPs in the HBB locus relative to a patient or reference sequence (e.g., hg38 human genome reference sequence), and the landing pad containing the target mutation is optionally designed to carry one or more non-pathogenic SNPs to match the endogenous cell line HBB locus, for example, designed to carry mutations that recapitulate SNPs present in the endogenous HBB locus in HEK293T cells. Without wishing to be limited by example, it should be understood that a template RNA sequence that is found to successfully edit a target mutation at a site containing an additional SNP relative to a reference sequence will be different from the therapeutic template RNA in any region overlapping with the additional SNP. For example, a successful template RNA for a HEK293T-based screening assay in which a genomic landing pad contains a target mutation (corresponding to an endogenous E6V mutation caused by the DNA substitution NC_000011.10:g.5227002T>A) and an additional substitution in the protospacer relative to hg38 (corresponding to the NC_000011.10:g.5227013T>C mutation at the endogenous HBB locus in HEK293T cells) can provide a candidate composition, wherein the corresponding therapeutic template RNA will therefore have a substitution (C>T) in the spacer region relative to the corresponding spacer region of the screening template RNA, so as to enable therapeutic correction of an E6V mutation at a target site that lacks the additional substitution, such as at a target site that contains a pathogenic E6V mutation but otherwise matches the hg38 reference sequence. In this example, a screening cell line containing a target site landing pad that contains a pathogenic mutation with an additional T>C substitution in the protospacer region may be used to contain a spacer sequence The screening template RNA is corrected, and the corresponding therapeutic template RNA may contain a spacer sequence Where underlined nucleotides indicate positions that were altered to match the screening cell target sequence or the hg38 target sequence. In some embodiments, the spacer, PBS, and/or RT template regions may need to be adjusted in this manner to account for any differences between the screening and reference target sequences. It is also contemplated that a given patient or patient population may have one or more SNPs at the target locus relative to hg38 in addition to the pathogenic E6V mutation, and thus similar adaptability of the candidate template RNA molecules may be used to generate template RNA sequences specific to the patient or patient population.

着陆垫的DNA被化学合成并克隆到pLenti-N-tGFP载体中。将克隆到慢病毒表达载体中的着陆垫序列通过着陆垫的桑格测序进行确认和序列验证。根据制造商的说明，使用Lipofectamine2000^TM将经过序列验证的质粒(9μg)以及慢病毒包装混合物(9μg，博塞塔公司)转染到包装细胞系LentiX-293T(宝生物工程株式会社(Takara Bio))中。将转染的细胞在37℃，5％CO₂下孵育48小时(包括在24小时更换一次培养基)，然后从细胞培养皿中收集含有病毒颗粒的培养基。将收集的培养基通过0.2μm过滤器过滤以去除细胞碎片并准备转导HEK293T细胞。将含有病毒的培养基在DMEM中稀释并与聚凝胺混合以制备用于转导HEK293T细胞的稀释系列，其中聚凝胺的最终浓度为8μg/ml。HEK293T细胞在含有病毒的培养基中生长48小时，然后用新鲜培养基分裂。分裂的细胞生长至汇合，通过经由流式细胞术和ddPCR检测基因组整合的慢病毒(含有GFP和HBB：E6V着陆垫)的GFP表达来测量不同病毒稀释度的转导效率。The DNA of the landing pad was chemically synthesized and cloned into the pLenti-N-tGFP vector. The landing pad sequence cloned into the lentiviral expression vector was confirmed and sequence-verified by Sanger sequencing of the landing pad. Sequence-verified plasmids (9 μg) and lentiviral packaging mixtures (9 μg, Bosetta) were transfected into the packaging cell line LentiX-293T (Takara Bio) using Lipofectamine2000 ^TM according to the manufacturer's instructions. The transfected cells were incubated at 37°C, 5% _CO2 for 48 hours (including changing the culture medium once every 24 hours), and then the culture medium containing viral particles was collected from the cell culture dish. The collected culture medium was filtered through a 0.2 μm filter to remove cell debris and prepare for transduction of HEK293T cells. The culture medium containing the virus was diluted in DMEM and mixed with polybrene to prepare a dilution series for transduction of HEK293T cells, wherein the final concentration of polybrene was 8 μg/ml. HEK293T cells were grown in virus-containing medium for 48 hours and then split with fresh medium. Split cells were grown to confluence and transduction efficiency of different virus dilutions was measured by detecting GFP expression of genomic integrated lentivirus (containing GFP and HBB:E6V landing pad) via flow cytometry and ddPCR.

将包含以下的基因修饰系统：(i)本文所述的相容性基因修饰多肽，例如其具有：表11的NLS、具有表8的序列的相容性Cas9结构域、表10的接头、表6的RT序列(例如，MLVMS_P03355_PLV919)和表11的第二NLS以及(ii)表1-4中任一个的模板RNA转染到HEK293T着陆垫细胞系中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，1μg基因修饰多肽mRNA与10μM模板RNA相结合。将mRNA和模板RNA添加到含有250,000个HEK293T着陆垫细胞的25μL SF缓冲液中，并使用程序DS-150对细胞进行核转染。核转染后，将细胞在37℃、5％CO₂下培养3天，然后进行细胞裂解和基因组DNA提取。为了分析基因编辑活性，使用HBB：E6V位点侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。在一些实施例中，测定将指示样品中至少10％、20％、30％、40％、50％、60％或70％拷贝的HBB基因转化为所期望的野生型序列。A gene modification system comprising (i) a compatible gene modification polypeptide described herein, e.g., having: an NLS of Table 11, a compatible Cas9 domain having a sequence of Table 8, a linker of Table 10, an RT sequence of Table 6 (e.g., MLVMS_P03355_PLV919), and a second NLS of Table 11, and (ii) a template RNA of any one of Tables 1-4 is transfected into a HEK293T landing pad cell line. The gene modification polypeptide and template RNA are delivered in RNA form by nuclear transfection. In particular, 1 μg of gene modification polypeptide mRNA is combined with 10 μM template RNA. The mRNA and template RNA are added to 25 μL SF buffer containing 250,000 HEK293T landing pad cells, and the cells are nuclear transfected using program DS-150. After nuclear transfection, the cells are cultured at 37°C, 5% _CO2 for 3 days, followed by cell lysis and genomic DNA extraction. To analyze gene editing activity, primers flanking the HBB:E6V site were used to amplify across the locus. Amplicons were analyzed by short read sequencing using Illumina MiSeq. In some embodiments, the assay converts at least 10%, 20%, 30%, 40%, 50%, 60%, or 70% of the copies of the HBB gene in the indicated sample to the desired wild-type sequence.

实例2：通过HEK293T和U20S细胞中的池化筛选进行基因修饰多肽选择Example 2: Selection of gene-modified peptides by pooled screening in HEK293T and U20S cells

该实例描述了使用RNA基因修饰系统来对人基因组中的编码序列进行靶向编辑。更特别地，本实例描述了用基因修饰候选物文库感染HEK293T和U2OS细胞，随后转染模板指导RNA(tgRNA)以在细胞中进行体外基因修饰，例如，作为通过池化筛选方法评价新的基因修饰多肽在人细胞中的编辑活性的手段。This example describes the use of an RNA gene modification system to target editing of coding sequences in the human genome. More specifically, this example describes infection of HEK293T and U2OS cells with a library of gene modification candidates, followed by transfection of template guide RNA (tgRNA) for in vitro gene modification in cells, e.g., as a means of evaluating the editing activity of new gene modification polypeptides in human cells by a pooled screening approach.

本文测定的基因修饰多肽文库候选物各自包含：1)含有N863A突变的化脓性链球菌(Spy)Cas9切口酶，其灭活一个核酸内切酶活性位点；2)表10所描绘的122个肽接头之一；和3)来自表6的逆转录病毒来源的逆转录酶(RT)结构域。如果预期所利用的特定的逆转录病毒RT结构域作为单体发挥作用，则选择这些结构域。对于每个选定的RT结构域，测试了野生型序列，以及在主要野生型序列中安装了点突变的版本。特别地，测试了143个RT结构域，为野生型或包含各种突变。总共测试了17,446个Cas-接头-RT基因修饰多肽。The gene modification polypeptide library candidates determined herein each contain: 1) a Streptococcus pyogenes (Spy) Cas9 nickase containing an N863A mutation, which inactivates an endonuclease active site; 2) one of the 122 peptide linkers described in Table 10; and 3) a reverse transcriptase (RT) domain from a retroviral source in Table 6. If the specific retroviral RT domain utilized is expected to function as a monomer, these domains are selected. For each selected RT domain, the wild-type sequence was tested, as well as a version in which a point mutation was installed in the main wild-type sequence. In particular, 143 RT domains were tested, either wild-type or containing various mutations. A total of 17,446 Cas-linker-RT gene modification polypeptides were tested.

此处描述的系统是双组分系统，其包含：1)在慢病毒盒内编码人密码子优化的基因修饰多肽文库候选物的表达质粒，2)表达非编码tgRNA序列的tgRNA表达质粒，该序列被Cas识别并将其定位于目的基因组基因座，并且还通过由U6启动子驱动的RT结构域作为将所期望编辑逆转录到基因组中的模板。慢病毒盒包含：(i)用于在哺乳动物细胞中表达的CMV启动子；(ii)如所示的基因修饰多肽文库候选物；(iii)自切割T2A多肽；(iv)能够在哺乳动物细胞中进行选择的嘌呤霉素抗性基因；以及(v)聚A尾终止信号。The system described here is a two-component system comprising: 1) an expression plasmid encoding a human codon-optimized gene-modified polypeptide library candidate within a lentiviral cassette, 2) a tgRNA expression plasmid expressing a non-coding tgRNA sequence that is recognized by Cas and localized to the genomic locus of interest and also serves as a template for reverse transcription of the desired edit into the genome via an RT domain driven by a U6 promoter. The lentiviral cassette comprises: (i) a CMV promoter for expression in mammalian cells; (ii) a gene-modified polypeptide library candidate as indicated; (iii) a self-cleaving T2A polypeptide; (iv) a puromycin resistance gene that enables selection in mammalian cells; and (v) a poly A tail termination signal.

为了制备表达基因修饰多肽文库候选物的细胞池，用基因修饰候选质粒文库的池化慢病毒制剂转导HEK293T或U2OS细胞。将HEK293Lenti-X细胞接种于15cm平板中(12 x10⁶个细胞)后再进行慢病毒质粒转染。使用慢病毒包装混合物(博塞塔公司，27ug)进行慢病毒质粒转染，并根据制造商的方案在第二天使用Lipofectamine 2000和Opti-MEM培养基对基因修饰候选文库的质粒DNA(27ug)进行转染。通过第二天的完全培养基更换去除细胞外DNA，并且在48小时后收获含有病毒的培养基。慢病毒培养基使用Lenti-x浓缩液(宝生物科学公司)浓缩，制备5mL慢病毒等分试样并储存于-80℃。慢病毒滴度测定是通过在嘌呤霉素选择后计数菌落形成单位来进行的。将携带表达BFP的基因组着陆垫的HEK293T或U2OS细胞以6x107个细胞接种在培养板中，并以0.3的感染复数(MOI)进行转导，以最大程度地减少每个细胞的多重感染。感染后48小时加入嘌呤霉素(2.5ug/mL)以选择受感染的细胞。将细胞在嘌呤霉素选择下保持至少7天，然后扩大以进行tgRNA电穿孔。To prepare a cell pool expressing a candidate for a gene-modified polypeptide library, HEK293T or U2OS cells were transduced with a pooled lentiviral preparation of a candidate plasmid library for gene modification. HEK293 Lenti-X cells were seeded in 15 cm plates (12 x10 ⁶ cells) before lentiviral plasmid transfection. Lentiviral plasmid transfection was performed using a lentiviral packaging mixture (Bosetta, 27 ug), and plasmid DNA (27 ug) of the candidate library for gene modification was transfected using Lipofectamine 2000 and Opti-MEM medium on the second day according to the manufacturer's protocol. Extracellular DNA was removed by replacing the complete medium on the second day, and the virus-containing medium was harvested after 48 hours. The lentiviral culture medium was concentrated using Lenti-x concentrate (Takara Biosciences), and 5 mL lentiviral aliquots were prepared and stored at -80 °C. Lentiviral titer determination was performed by counting colony forming units after puromycin selection. HEK293T or U2OS cells carrying a genomic landing pad expressing BFP were seeded in culture plates at 6x107 cells and transduced at a multiplicity of infection (MOI) of 0.3 to minimize multiple infections per cell. Puromycin (2.5ug/mL) was added 48 hours after infection to select infected cells. Cells were maintained under puromycin selection for at least 7 days and then expanded for tgRNA electroporation.

为了确定测定中基因修饰文库候选物的基因组编辑能力，然后将感染的表达BFP的HEK293T或U2OS细胞通过电穿孔以250,000个细胞/孔用200ng tgRNA(g4或g10)质粒进行转染，该质粒设计用于将BFP转化为GFP，其中细胞计数足够实现每个文库候选物＞1000x的覆盖率。To determine the genome editing capabilities of the gene modification library candidates in the assay, infected HEK293T or U2OS cells expressing BFP were then transfected by electroporation at 250,000 cells/well with 200 ng of tgRNA (g4 or g10) plasmid designed to convert BFP to GFP, with cell counts sufficient to achieve >1000x coverage for each library candidate.

g4tgRNA(5′至3′)如下：20个核苷酸的间隔子区(GCCGAAGCACTGCACGCCGT；SEQ IDNO：11,011)、支架区(GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC；SEQ ID NO：11,012)、模板区(其编码将BFP变为GFP的单碱基对取代(粗体)和在SpyCas9 PAM中引入同义点突变(NGG到NCG)以防止功能性基因修饰反应完成后基因修饰多肽的重新接合的PAM失活(下划线)) 和13个核苷酸的PBS(GCGTGCAGTGCTT；SEQ ID NO：11,014)。The g4tgRNA (5′ to 3′) is as follows: a 20-nucleotide spacer region (GCCGAAGCACTGCACGCCGT; SEQ ID NO: 11,011), a scaffold region (GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC; SEQ ID NO: 11,012), a template region encoding a single base pair substitution that changes BFP to GFP (bold) and a synonymous point mutation (NGG to NCG) introduced in the SpyCas9 PAM to prevent reengagement of the gene modification polypeptide after the functional gene modification reaction is completed (underlined). and a 13-nucleotide PBS (GCGTGCAGTGCTT; SEQ ID NO: 11,014).

类似地，g10 tgRNA(5′至3′)如下：20个核苷酸的间隔子区(AGAAGTCGTGCTGCTTCATG；SEQ ID NO：11,015)、支架区(GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC；SEQ ID NO：11,016)、模板区(其编码将BFP变为GFP的单碱基对取代(粗体)和在SpyCas9 PAM中引入同义点突变(NGG到NGA)以防止功能性基因修饰反应完成后基因修饰多肽的重新接合的PAM失活(下划线)) 和13个核苷酸的PBS(GAAGCAGCACGAC；SEQ ID NO：11,018)。Similarly, the g10 tgRNA (5′ to 3′) is as follows: a 20-nucleotide spacer region (AGAAGTCGTGCTGCTTCATG; SEQ ID NO: 11,015), a scaffold region (GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC; SEQ ID NO: 11,016), a template region encoding a single base pair substitution that changes BFP to GFP (bold) and a synonymous point mutation (NGG to NGA) in the SpyCas9 PAM to prevent re-engagement of the functional gene modification polypeptide after completion of the gene modification reaction ( underlined ). and a 13-nucleotide PBS (GAAGCAGCACGAC; SEQ ID NO: 11,018).

为了评估测定中各种构建体的基因组编辑能力，在电穿孔后6-7天通过荧光激活细胞分选(FACS)对细胞进行分选，以检测GFP表达。对细胞进行分选和收获，分为未编辑(BFP+)细胞、已编辑(GFP+)细胞和不完美编辑(BFP-、GFP-)细胞的不同群体。还收获未分选的细胞样品作为输入群体，以确定分析过程中的富集。To evaluate the genome editing capabilities of various constructs in the assay, cells were sorted by fluorescence activated cell sorting (FACS) 6-7 days after electroporation to detect GFP expression. Cells were sorted and harvested into different populations of unedited (BFP+) cells, edited (GFP+) cells, and imperfectly edited (BFP-, GFP-) cells. Unsorted cell samples were also harvested as input populations to determine enrichment during analysis.

为了确定哪些基因修饰文库候选物在测定中具有基因组编辑能力，从分选的和未分选的细胞群体中收获基因组DNA(gDNA)，并通过对每个群体中的基因修饰文库候选物进行测序进行分析。简言之，使用针对慢病毒盒的特异性引物从基因组中扩增基因修饰序列，在第二轮PCR中进行扩增以稀释基因组DNA，然后根据制造商的方案使用牛津纳米孔测序技术(Oxford Nanopore Sequencing Technology)进行测序。To determine which gene modification library candidates have genome editing capabilities in the assay, genomic DNA (gDNA) was harvested from sorted and unsorted cell populations and analyzed by sequencing the gene modification library candidates in each population. Briefly, the gene modification sequence was amplified from the genome using specific primers for the lentiviral cassette, amplified in a second round of PCR to dilute the genomic DNA, and then sequenced using Oxford Nanopore Sequencing Technology according to the manufacturer's protocol.

在对测序读段进行质量控制之后，将至少1500个并且不超过3200个核苷酸的读段映射到基因修饰多肽文库序列，并且那些与文库序列至少80％匹配的读段被认为已成功与给定候选物比对。为了识别能够在测定中进行基因编辑的基因修饰候选物，将编辑群体中每个文库候选物的读段计数与初始未分选群体中的读段计数进行比较。为了该池化筛选的目的，具有基因组编辑能力的基因修饰候选物被选择为相对于未分选的(输入)细胞在经转化的(GFP+)群体中富集的那些候选物，并且其中该富集被确定为等于或高于参考(元件ID号：17380)的富集水平。After quality control of the sequencing reads, reads of at least 1500 and no more than 3200 nucleotides are mapped to the gene modification polypeptide library sequence, and those reads that match at least 80% of the library sequence are considered to have been successfully aligned with a given candidate. In order to identify gene modification candidates capable of gene editing in the assay, the read counts of each library candidate in the editing population are compared with the read counts in the initial unsorted population. For the purpose of this pooled screening, gene modification candidates with genome editing capabilities are selected as those candidates that are enriched in the transformed (GFP+) population relative to unsorted (input) cells, and where the enrichment is determined to be equal to or higher than the enrichment level of the reference (element ID number: 17380).

大量基因修饰多肽候选物被确定在GFP+细胞群体中富集。例如，在测试的17,446个候选物中，超过3,300个在GFP+分选群体中表现出富集(相对于未分选的群体)，该富集至少与类似实验条件下的参考值相当(使用g4 tgRNA的HEK293T；使用g10tgRNA的HEK293T细胞；或使用g4 tgRNA的U2OS细胞)，在表D中示出。尽管也使用g10tgRNA在U2OS细胞中测试了17,446个候选物，但在该实验条件下，池化筛选并未产生相对于未分选(输入)细胞在经转化的(GFP+)群体中富集的候选物；需要进一步研究来解释这些结果。A large number of gene-modified polypeptide candidates were determined to be enriched in the GFP+ cell population. For example, of the 17,446 candidates tested, more than 3,300 showed enrichment in the GFP+ sorted population (relative to the unsorted population), which was at least comparable to the reference values under similar experimental conditions (HEK293T using g4 tgRNA; HEK293T cells using g10tgRNA; or U2OS cells using g4 tgRNA), as shown in Table D. Although 17,446 candidates were also tested in U2OS cells using g10tgRNA, under this experimental condition, the pooled screening did not generate candidates that were enriched in the transformed (GFP+) population relative to the unsorted (input) cells; further studies are needed to explain these results.

表D.筛选的接头和RT序列的组合.该表中每个RT的氨基酸序列提供于表6.Table D. Combinations of linkers and RT sequences screened. The amino acid sequence of each RT in this table is provided in Table 6.

实例3：筛选将SCD突变安装于人细胞中的内源HBB基因中的模板RNA的构型Example 3: Screening for template RNA configurations that install SCD mutations in the endogenous HBB gene in human cells

该实例描述了使用含有基因修饰多肽和模板RNA的基因修饰系统来鉴定用于编辑人细胞中的内源HBB基因的有利构型，该模板RNA包含不同长度的异源对象序列和引物结合位点序列。在本实例中，模板RNA含有：This example describes the use of a gene modification system containing a gene modification polypeptide and a template RNA to identify favorable configurations for editing the endogenous HBB gene in human cells, the template RNA comprising heterologous target sequences of varying lengths and primer binding site sequences. In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

模板RNA设计成含有8-17nt PBS序列及9-20nt异源对象序列。两个不同gRNA间隔子序列用于靶向接近内源HBB基因组位点中的SCD突变的位点。异源对象序列和PBS序列经设计以通过使用本文所述的基因修饰系统在突变位点处用“T”核苷酸替换“A”核苷酸来将SCD突变(E6V突变)安装至内源基因中。所用模板RNA序列为表A(HBB5序列)及表B(HBB8序列)中所示的那些序列，以下情况除外。首先，RT模板序列的突变区经设计以安装突变(A→T)而非纠正回野生型序列。特别地，使用模板HBB5安装SCD的RT模板区包含以下序列的至少一部分：安装RT模板(PAM-杀灭)：AACGGCAGACTTCTCTACAG(SEQ ID NO：21672)，其中无PAM-杀灭等效物将为：安装RT模板(无PAM-杀灭)：AACGGCAGACTTCTCCACAG(SEQ ID NO：21673)。另外，HBB8间隔子的安装形式具有以下序列，其由于落入靶原间隔子内的SCD突变而不同于HBB8突变纠正间隔子，从而产生相对于无SCD突变：GTAACGGCAGACTTCTCCTC(SEQ ID NO：21674)的WT序列的单一nt差异。特别地，使用模板HBB8安装SCD突变的RT模板区包含以下序列的至少一部分：安装RT模板(293T SNP)：TGGTGCACCTGACTCCTGTG(SEQ ID NO：21676)，其中缺乏293T SNP且靶向hg38参考序列的等效模板将为：安装RT模板(无SNP)：TGGTGCATCTGACTCCTGTG(SEQ ID NO：21675)。The template RNA was designed to contain an 8-17nt PBS sequence and a 9-20nt heterologous subject sequence. Two different gRNA spacer sequences were used to target sites close to the SCD mutation in the endogenous HBB genomic site. The heterologous subject sequence and PBS sequence were designed to install the SCD mutation (E6V mutation) into the endogenous gene by replacing the "A" nucleotide with a "T" nucleotide at the mutation site using the gene modification system described herein. The template RNA sequences used were those shown in Table A (HBB5 sequence) and Table B (HBB8 sequence), except for the following. First, the mutation region of the RT template sequence was designed to install the mutation (A→T) rather than correcting back to the wild-type sequence. In particular, the RT template region for installing the SCD using template HBB5 comprises at least a portion of the following sequence: Installing RT Template (PAM-killing): AACGGCAGACTTCTCTACAG (SEQ ID NO: 21672), where the no PAM-killing equivalent would be: Installing RT Template (no PAM-killing): AACGGCAGACTTCTCCACAG (SEQ ID NO: 21673). Additionally, the installed version of the HBB8 spacer has the following sequence, which differs from the HBB8 mutation-correcting spacer due to the SCD mutation falling within the target protospacer, resulting in a single nt difference relative to the WT sequence without the SCD mutation: GTAACGGCAGACTTCTCCTC (SEQ ID NO: 21674). In particular, the RT template region for installing the SCD mutation using template HBB8 comprises at least a portion of the following sequence: Installation RT template (293T SNP): TGGTGCACCTGACTCCTGTG (SEQ ID NO: 21676), wherein the equivalent template lacking the 293T SNP and targeting the hg38 reference sequence would be: Installation RT template (no SNP): TGGTGCATCTGACTCCTGTG (SEQ ID NO: 21675).

将包含基因修饰多肽(参见表C)和模板RNA的基因修饰系统转染至HEK293T细胞中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，1μg基因修饰多肽mRNA与10μM模板RNA相结合。将mRNA和模板RNA添加到含有250,000个HEK293T细胞的25μL SF缓冲液中，并使用程序DS-150对细胞进行核转染。核转染后，将细胞在37℃、5％CO₂下培养3天，然后进行细胞裂解和基因组DNA提取。为了分析基因编辑活性，使用HBB基因组靶位点侧翼的引物跨基因座进行扩增。使用IlluminaMiSeq通过短读段测序分析扩增子。在具有9-12ntPBS序列和13-16nt异源对象序列的构型中检测到具有高编辑效率的基因编辑活性。这些结果指示包含本文所述的gRNA间隔子和gRNA支架的模板RNA成功地将基因修饰多肽导引至人细胞中的内源HBB基因，以便进行特异性基因编辑。结果示于表E中。A gene modification system comprising a gene modification polypeptide (see Table C) and a template RNA was transfected into HEK293T cells. The gene modification polypeptide and template RNA were delivered in RNA form by nuclear transfection. In particular, 1 μg of gene modification polypeptide mRNA was combined with 10 μM template RNA. mRNA and template RNA were added to 25 μL SF buffer containing 250,000 HEK293T cells, and the cells were nuclear transfected using program DS-150. After nuclear transfection, the cells were cultured at 37 ° C, 5% CO ₂ for 3 days, followed by cell lysis and genomic DNA extraction. In order to analyze gene editing activity, primers flanking the HBB genomic target site were used to amplify across the locus. The amplicons were analyzed by short read sequencing using Illumina MiSeq. Gene editing activity with high editing efficiency was detected in a configuration with a 9-12nt PBS sequence and a 13-16nt heterologous object sequence. These results indicate that the template RNA comprising the gRNA spacer and gRNA scaffold described herein successfully guides the gene-modifying polypeptide to the endogenous HBB gene in human cells for specific gene editing. The results are shown in Table E.

尽管该实验表明安装突变而非纠正突变，但其指示编辑可在天然HBB基因座处进行。Although this experiment demonstrated installation of mutations rather than correction of mutations, it indicates that editing can be performed at the native HBB locus.

表E.用于安装突变的HBB5和HBB8序列。各列从左到右指示：1)HBB5模板RNA的名称；2)描绘为RNA的全HBB5模板RNA序列，进一步显示如实例3中所用的化学修饰；3)第2列中如实例3中所定义的模板RNA的观察活性；4)HBB8模板RNA的名称；5)描绘为RNA的全HBB8模板RNA序列，进一步显示如实例3中所用的化学修饰；6)第5列中如实例3中所定义的模板RNA的观察活性。Table E. HBB5 and HBB8 sequences used to install mutations. The columns indicate from left to right: 1) the name of the HBB5 template RNA; 2) the full HBB5 template RNA sequence depicted as RNA, further showing chemical modifications as used in Example 3; 3) the observed activity of the template RNA as defined in Example 3 in column 2; 4) the name of the HBB8 template RNA; 5) the full HBB8 template RNA sequence depicted as RNA, further showing chemical modifications as used in Example 3; 6) the observed activity of the template RNA as defined in Example 3 in column 5.

实例4：筛选纠正人细胞基因组着陆垫中SCD突变的模板RNA的构型Example 4: Screening for template RNA configurations that correct SCD mutations in human cell genome landing pads

该实例描述了使用基因修饰系统来鉴定用于纠正SCD突变的有利构型，该基因修饰系统含有基因修饰多肽和模板RNA，该模板RNA包含不同长度的异源对象序列和PBS序列。在本实例中，模板RNA含有：This example describes the use of a gene modification system to identify favorable configurations for correcting SCD mutations, the gene modification system comprising a gene modification polypeptide and a template RNA comprising heterologous subject sequences and PBS sequences of varying lengths. In this example, the template RNA comprises:

(1) gRNA间隔子；(1) gRNA spacer;

(2) gRNA支架；(2) gRNA scaffold;

(3) 异源对象序列；以及(3) heterogeneous object sequences; and

(4) 引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

模板RNA设计成含有8-17nt PBS序列及9-20nt异源对象序列(表A和B)。使用指定为HBB5(参见表A)和HBB8(参见表B)的两个不同的gRNA间隔子序列来靶向接近人细胞中的常规基因组着陆垫中的SCD突变的位点。异源对象序列和PBS序列经设计以通过使用本文所述的基因修饰系统在突变位点处用“A”核苷酸替换“T”核苷酸来纠正着陆垫中的SCD突变。The template RNA was designed to contain an 8-17nt PBS sequence and a 9-20nt heterologous subject sequence (Tables A and B). Two different gRNA spacer sequences, designated HBB5 (see Table A) and HBB8 (see Table B), were used to target sites close to SCD mutations in conventional genomic landing pads in human cells. The heterologous subject sequence and PBS sequence were designed to correct the SCD mutation in the landing pad by replacing "T" nucleotides with "A" nucleotides at the mutation site using the gene modification system described herein.

建立细胞系，使其具有“着陆垫”或稳定的整合体，模拟含有SCD突变位点侧翼的序列的HBB基因区域。着陆垫的DNA被化学合成并克隆到pLenti-N-tGFP载体中。对克隆到慢病毒表达载体中的着陆垫进行确认并通过着陆垫的桑格测序对序列进行验证。根据制造商的说明，使用Lipofectamine2000^TM将经过序列验证的质粒(9μg)以及慢病毒包装混合物(9μg，获自博塞塔公司)转染到包装细胞系LentiX-293T(宝生物工程株式会社)中。将转染的细胞在37℃、5％CO₂下孵育48小时(包括在24小时更换一次培养基)，然后从细胞培养皿中收集含有病毒颗粒的培养基。将收集的培养基通过0.2μm过滤器过滤以去除细胞碎片并准备转导HEK293T细胞。将含有病毒的培养基在DMEM中稀释并与聚凝胺混合以制备用于转导HEK293T细胞的稀释系列，其中聚凝胺的最终浓度为8μg/ml。HEK293T细胞在含有病毒的培养基中生长48小时，然后用新鲜培养基分裂。分裂的细胞生长至汇合，通过经由流式细胞术和ddPCR检测基因组整合的慢病毒(含有GFP和HBB着陆垫)的GFP表达来测量不同病毒稀释度的转导效率。Cell lines were established with a "landing pad" or stable integrant that mimics a region of the HBB gene containing sequences flanking the SCD mutation site. The DNA for the landing pad was chemically synthesized and cloned into the pLenti-N-tGFP vector. The landing pad cloned into the lentiviral expression vector was confirmed and the sequence was verified by Sanger sequencing of the landing pad. Sequence-verified plasmids (9 μg) and lentiviral packaging mix (9 μg, obtained from Bosetta) were transfected into the packaging cell line LentiX-293T (Takara Biotech Co., Ltd.) using Lipofectamine2000 ^TM according to the manufacturer's instructions. The transfected cells were incubated at 37°C, 5% _CO2 for 48 hours (including a medium change every 24 hours), and the medium containing viral particles was then collected from the cell culture dishes. The collected medium was filtered through a 0.2 μm filter to remove cell debris and prepared for transduction of HEK293T cells. The virus-containing medium was diluted in DMEM and mixed with polybrene to prepare a dilution series for transduction of HEK293T cells, wherein the final concentration of polybrene was 8 μg/ml. HEK293T cells were grown in medium containing virus for 48 hours and then split with fresh medium. The split cells were grown to confluence, and the transduction efficiency of different virus dilutions was measured by detecting GFP expression of genomic integrated lentivirus (containing GFP and HBB landing pad) via flow cytometry and ddPCR.

将包含基因修饰多肽(参见表C)和模板RNA的基因修饰系统转染至HEK293T着陆垫细胞系中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，1μg基因修饰多肽mRNA与10μM模板RNA相结合。将mRNA和模板RNA添加到含有250,000个HEK293T着陆垫细胞的25μLSF缓冲液中，并使用程序DS-150对细胞进行核转染。核转染后，将细胞在37℃、5％CO₂下培养3天，然后进行细胞裂解和基因组DNA提取。为了分析基因编辑活性，使用HBB基因组靶位点侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。在具有9-12nt PBS序列和12-14nt异源对象序列的构型中检测到具有高编辑效率的基因编辑活性，且显示于表A和表B中。特别地，在表A和表B中，“+”指示＜3％的编辑频率，“++”指示3％-7％的编辑频率，并且“+++”指示＞＝7％的编辑频率。The gene modification system comprising a gene modification polypeptide (see Table C) and a template RNA was transfected into the HEK293T landing pad cell line. The gene modification polypeptide and the template RNA were delivered in RNA form by nuclear transfection. In particular, 1 μg of the gene modification polypeptide mRNA was combined with 10 μM template RNA. The mRNA and template RNA were added to 25 μL SF buffer containing 250,000 HEK293T landing pad cells, and the cells were nuclear transfected using program DS-150. After nuclear transfection, the cells were cultured at 37 ° C, 5% CO ₂ for 3 days, followed by cell lysis and genomic DNA extraction. In order to analyze gene editing activity, primers flanking the HBB genomic target site were used to amplify across the locus. The amplicons were analyzed by short read sequencing using Illumina MiSeq. Gene editing activity with high editing efficiency was detected in configurations with 9-12nt PBS sequences and 12-14nt heterologous object sequences, and is shown in Tables A and B. In particular, in Table A and Table B, "+" indicates an editing frequency of <3%, "++" indicates an editing frequency of 3%-7%, and "+++" indicates an editing frequency of >=7%.

应当理解，表A中所示模板RNA序列可以根据正在靶向的细胞进行定制。例如，HEK293T细胞在HBB基因中具有SNP(NC_000011.10：g.5227013A＞G(在HBB编码链中T＞C)，相对于人类hg38参考基因组)，并且因此表A和表B中所示的模板RNA序列适用于具有该SNP的细胞。适用于在该SNP位置处具有不同序列的细胞(“无SNP”)的模板RNA可利用以下序列，其中大写字母指示核心序列且小写字母指示侧翼序列，且下划线指示突变区。类似地，在一些实施例中，期望在编辑时灭活PAM序列(“PAM-杀灭”)，且在其他实施例中，优选地使PAM序列保持完整(无“PAM-杀灭”)。RT模板可经设计为例如如下文所示的“PAM-杀灭”或“无PAM-杀灭”形式。It should be understood that the template RNA sequences shown in Table A can be customized according to the cells being targeted. For example, HEK293T cells have a SNP in the HBB gene (NC_000011.10: g.5227013A>G (T>C in the HBB coding strand), relative to the human hg38 reference genome), and therefore the template RNA sequences shown in Tables A and B are suitable for cells with this SNP. Template RNAs suitable for cells with different sequences at the SNP position ("no SNP") can utilize the following sequences, where capital letters indicate core sequences and lowercase letters indicate flanking sequences, and underscores indicate mutation regions. Similarly, in some embodiments, it is desired to inactivate the PAM sequence ("PAM-killing") upon editing, and in other embodiments, it is preferred to keep the PAM sequence intact (no "PAM-killing"). The RT template can be designed, for example, as shown below in a "PAM-killing" or "no PAM-killing" form.

HBB5间隔子(无SNP)：CATGGTGCATCTGACTCCTG(SEQ ID NO：21668)HBB5 spacer (no SNP): CATGGTGCATCTGACTCCTG (SEQ ID NO: 21668)

HBB5 PBS(无SNP)：GAGTCAGAtgcaccatg(SEQ ID NO：21669)HBB5 PBS (no SNP):GAGTCAGAtgcaccatg (SEQ ID NO: 21669)

HBB5 RT模板(无PAM-杀灭)：aacggcagactTCTCGTCAG(SEQ ID NO：21670)HBB5 RT template (no PAM-killing): aacggcagactTCTCG T CAG (SEQ ID NO: 21670)

HBB8 RT模板(无SNP)：tggtgcatctgACTCCTGAG(SEQ ID NO：21671)HBB8 RT template (no SNP): tggtgcatctgACTCCTG A G (SEQ ID NO: 21671)

实例5：量化293T细胞和CD34+原代人造血干细胞(HSC)中实现的基因编辑多肽和模板重写内源B-球蛋白基因座的活性Example 5: Quantifying the activity of gene-edited peptides and templates rewriting the endogenous B-globulin locus in 293T cells and CD34+ primary human hematopoietic stem cells (HSCs)

该实例表明使用含有基因修饰多肽和模板RNA的基因修饰系统将原代人HSC中的内源B-球蛋白基因座中的氨基酸位置7处的谷氨酸密码子(GAG)转化为丙氨酸(GCA)，由此将非致病性序列重写至位置7。该转化包含两个碱基对的变化(即，将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤分别替换为碱基胞嘧啶和腺嘌呤)。This example demonstrates the use of a gene modification system containing a gene modification polypeptide and a template RNA to convert the glutamic acid codon (GAG) at amino acid position 7 in the endogenous B-globin locus in primary human HSCs to alanine (GCA), thereby rewriting a non-pathogenic sequence to position 7. The conversion involves a change of two base pairs (i.e., replacing the DNA bases adenine and guanine at nucleotide positions 20 and 21 with the bases cytosine and adenine, respectively).

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

更特别地，模板RNA从5’至3’包含以下序列，其中前3个和最后3个碱基具有2′-O-甲基硫代磷酸酯化学修饰。More specifically, the template RNA comprises the following sequence from 5' to 3', wherein the first 3 and last 3 bases have 2'-O-methyl phosphorothioate chemical modifications.

FYF tgRNA11FYF tgRNA11

FYF tgRNA12FYF tgRNA12

FYF tgRNA13FYF tgRNA13

FYF tgRNA14FYF tgRNA14

FYF tgRNA15FYF tgRNA15

FYF tgRNA16FYF tgRNA16

FYF tgRNA17FYF tgRNA17

FYF tgRNA18FYF tgRNA18

FYF tgRNA19FYF tgRNA19

FYF tgRNA20FYF tgRNA20

所测试的基因修饰多肽包含实例8中阐述的标记为RNAV209的序列。The genetically modified polypeptide tested comprised the sequence labeled RNAV209 described in Example 8.

将包含上文所述的基因修饰多肽和模板RNA的基因修饰系统转染至293T细胞和人HSC中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，将1000或2000ng基因修饰多肽RNA与1000或2000ng模板RNA全部以1∶1比率以RNA形式组合。将RNA混合物添加至总共20μL的Lonza SF缓冲液(293T)或Lonza P3缓冲液(HSC)中的200,000个293T细胞或200,000个原代人HSC，并且使用程序DS-150(293T)或DZ-100(HSC)在16孔核转染盒中对细胞进行核转染。在核转染之后，将细胞在室温下孵育10分钟并转移至在各孔中含有为100ng/mL的500μL DMEM+10％胎牛血清(293T)或500μL StemSpan-XF+SCF、为100ng/mL的Flt3-L以及为100ng/mL的TPO(HSC)的24孔板，并在37℃、5％CO₂下培养3天，之后进行细胞裂解及基因组DNA提取。为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。在内源B-球蛋白基因座内的转录起始位点下游将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤分别替换为碱基胞嘧啶和腺嘌呤指示成功编辑。The gene modification system comprising the gene modification polypeptide and template RNA described above is transfected into 293T cells and human HSC. The gene modification polypeptide and template RNA are delivered in RNA form by nuclear transfection. In particular, 1000 or 2000ng gene modification polypeptide RNA and 1000 or 2000ng template RNA are all combined in RNA form at a 1:1 ratio. The RNA mixture is added to 200,000 293T cells or 200,000 primary human HSC in a total of 20 μL of Lonza SF buffer (293T) or Lonza P3 buffer (HSC), and cells are nuclear transfected in 16-well nuclear transfection boxes using program DS-150 (293T) or DZ-100 (HSC). After nuclear transfection, the cells were incubated at room temperature for 10 minutes and transferred to 24-well plates containing 500 μL DMEM + 10% fetal bovine serum (293T) or 500 μL StemSpan-XF + SCF, 100 ng / mL Flt3-L and 100 ng / mL TPO (HSC) in each well, and cultured at 37 ° C, 5% CO ₂ for 3 days, followed by cell lysis and genomic DNA extraction. In order to analyze gene editing activity, primers flanking the locus of the target insertion site were used to amplify across the locus. The amplicons were analyzed by short read sequencing using Illumina MiSeq. The DNA bases adenine and guanine at nucleotide positions 20 and 21 were replaced with bases cytosine and adenine, respectively, downstream of the transcription start site within the endogenous B-globulin locus to indicate successful editing.

所测试的基因修饰系统在293T细胞中实现高达18.5％平均完美重写且在原代人HSC中实现多达7.9％完美重写。如图2所示，在用模板gRNA筛选时，在293T细胞(在2000ng/RNA下的单切口)中检测到平均完美重写水平为4％-18％且在原代人HSC(在2000ng/RNA下的单切口)中为0％-2.5％。如图3所示，使用所示的gRNA，在293T细胞(在2000ng/RNA下的单切口)中检测到平均完美重写水平为6％-18.5％且在原代人HSC(在2000ng/RNA下的单切口)中为0％-7.9％。这些结果表明，使用基因修饰系统将非致病性序列重写至原代人HSC中的内源B-球蛋白基因座中的临床上相关密码子中。The tested gene modification system achieves up to 18.5% average perfect rewriting in 293T cells and achieves up to 7.9% perfect rewriting in primary human HSC. As shown in Figure 2, when screened with template gRNA, the average perfect rewriting level detected in 293T cells (single incision under 2000ng/RNA) is 4%-18% and 0%-2.5% in primary human HSC (single incision under 2000ng/RNA). As shown in Figure 3, using the gRNA shown, the average perfect rewriting level detected in 293T cells (single incision under 2000ng/RNA) is 6%-18.5% and 0%-7.9% in primary human HSC (single incision under 2000ng/RNA). These results show that the non-pathogenic sequence is rewritten to the clinically relevant codons in the endogenous B-globulin locus in primary human HSC using the gene modification system.

实例6：量化人原代成纤维细胞中实现的基因编辑多肽和模板重写内源B-球蛋白基因座的活性Example 6: Quantifying the activity of gene-editing polypeptides and templates rewriting the endogenous B-globin locus in primary human fibroblasts

该实例表明使用含有基因修饰多肽和模板RNA的基因修饰系统将人原代成纤维细胞中的内源B-球蛋白基因座中的氨基酸位置7处的谷氨酸密码子(GAG)转化为丙氨酸(GCA)。该转化包含两个碱基对的变化(即，将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤分别替换为碱基胞嘧啶和腺嘌呤)。This example demonstrates the use of a gene modification system containing a gene modification polypeptide and a template RNA to convert the glutamic acid codon (GAG) at amino acid position 7 in the endogenous B-globin locus in human primary fibroblasts to alanine (GCA). The conversion involves a change of two base pairs (i.e., replacing the DNA bases adenine and guanine at nucleotide positions 20 and 21 with the bases cytosine and adenine, respectively).

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

更特别地，模板RNA包含如先前实例中所描述的tgRNA14的序列。More specifically, the template RNA comprises the sequence of tgRNA14 as described in the previous examples.

该系统进一步包含经设计以产生第二切口的gRNA序列，其中前3个碱基和最后3个碱基具有2′-O-甲基硫代磷酸酯修饰且包含以下序列：5’-CCUUGAUACCAACCUGCCCAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU-3’(SEQ ID NO：20613)。The system further comprises a gRNA sequence designed to generate a second nick, wherein the first 3 bases and the last 3 bases have a 2′-O-methyl phosphorothioate modification and comprise the following sequence: 5′-CCUUGAUACCAACCUGCCCAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU-3′ (SEQ ID NO: 20613).

将包含上文所述的基因修饰多肽和模板RNA的基因修饰系统电穿孔至人原代成纤维细胞中。基因修饰多肽和模板RNA通过电穿孔以RNA形式递送且包含上文详述的序列。特别地，递送两个剂量(1000ng和2000ng)，其中各基因编辑组分以指定剂量作为RNA递送。例如，对于1000ng剂量，将1000ng基因修饰多肽RNA与1000ng呈RNA形式的模板RNA和1000ng呈RNA形式的第二切口gRNA在包含重悬浮于缓冲液R(英杰公司(Invitrogen))中的200,000个原代人成纤维细胞的10μL电穿孔混合物中组合。随后将电穿孔混合物抽吸至10μL氖电穿孔尖端(英杰公司)中，转移至氖电穿孔系统(英杰公司)中，且在1700mV下用一个脉冲电穿孔20mS。随后将细胞转移至12孔板(康宁公司(Corning))的一个孔中，在1mL含有DMEM(补充有15％胎牛血清、1％非必需氨基酸、1％丙酮酸钠和1％HEPES)(全部来自吉布科公司(Gibco))的Glutamax中培养，且在37℃、5％CO₂下培养3天，之后进行细胞裂解和基因组DNA提取。为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。在内源B-球蛋白基因座内的转录起始位点下游将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤分别替换为碱基胞嘧啶和腺嘌呤指示成功编辑。The gene modification system comprising the gene modification polypeptide described above and template RNA is electroporated into human primary fibroblasts.Gene modification polypeptide and template RNA are delivered in RNA form by electroporation and include the sequence described in detail above.In particular, two doses (1000ng and 2000ng) are delivered, wherein each gene editing component is delivered as RNA with a specified dose.For example, for a 1000ng dosage, 1000ng gene modification polypeptide RNA is combined with 1000ng template RNA in RNA form and 1000ng second nick gRNA in RNA form in 10 μL electroporation mixtures comprising 200,000 primary human fibroblasts resuspended in buffer R (Invitrogen).The electroporation mixture is then aspirated into 10 μL neon electroporation tips (Invitrogen), transferred to neon electroporation system (Invitrogen), and electroporated with a pulse 20mS at 1700mV. The cells were then transferred to a well of a 12-well plate (Corning), cultured in 1 mL of Glutamax containing DMEM (supplemented with 15% fetal bovine serum, 1% non-essential amino acids, 1% sodium pyruvate, and 1% HEPES) (all from Gibco), and cultured at 37 ° C, 5% CO ₂ for 3 days, followed by cell lysis and genomic DNA extraction. In order to analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus. Illumina MiSeq was used to analyze amplicons by short read sequencing. The DNA bases adenine and guanine at nucleotide positions 20 and 21 were replaced with bases cytosine and adenine, respectively, downstream of the transcription start site in the endogenous B-globulin locus to indicate successful editing.

如图4所示，当基因编辑多肽与模板向导RNA组合且不添加第二切口gRNA时，分别在1000ng及2000ng剂量下检测到完美重写水平为3.7％和10.6％。添加第二切口使完美重写在1000ng剂量下从3.7％增加至44.5％且在2000ng剂量下从10.6％增加至56.5％。在该实验中，观察到1.5％-1.65％(单一切口；1000ng，2000ng)和7.9％-5.8％(第二切口；1000ng，2000ng)范围内的插入缺失水准。这些结果表明，使用基因修饰系统将非致病性序列重写至人原代成纤维细胞中的内源B-球蛋白基因座中的临床上相关密码子中。此外，根据施用的剂量，引入第二切口gRNA使完美重写增加五倍至十倍。As shown in Figure 4, when the gene editing polypeptide is combined with the template guide RNA and the second nick gRNA is not added, the perfect rewriting level is 3.7% and 10.6% at 1000ng and 2000ng doses, respectively. The addition of the second nick increases the perfect rewriting from 3.7% to 44.5% at a dose of 1000ng and from 10.6% to 56.5% at a dose of 2000ng. In this experiment, insertion and deletion levels in the range of 1.5%-1.65% (single nick; 1000ng, 2000ng) and 7.9%-5.8% (second nick; 1000ng, 2000ng) were observed. These results show that the non-pathogenic sequence is rewritten into the clinically relevant codons in the endogenous B-globulin locus in human primary fibroblasts using the gene modification system. In addition, the introduction of the second nick gRNA increases the perfect rewriting by five to ten times, depending on the dose administered.

实例7：比较基因编辑多肽及多个模板将不同序列重写至野生型人原代成纤维细胞及含有镰状细胞突变的成纤维细胞中的内源B-球蛋白基因座内的相同位置的活性。Example 7: Comparison of the activity of gene editing polypeptides and multiple templates to rewrite different sequences to the same position within the endogenous B-globin locus in wild-type human primary fibroblasts and fibroblasts containing a sickle cell mutation.

该实例证实了在通过改变模板向导RNA的逆转录酶(RT)结构域内的序列且保持基因修饰多肽、模板RNA、引物结合位点(PBS)及模板向导RNA支架的设计不变而将不同突变安装至相同基因组基因座中时的类似功效。在本实例中，两个相邻DNA碱基(其中一者定位于B-球蛋白基因座内的镰状细胞病突变位点处)在野生型成纤维细胞中被取代，从而将人原代成纤维细胞中的内源B-球蛋白基因座中的氨基酸位置7处的谷氨酸密码子(GAG)转化为丙氨酸(GCA)。同时，在相同氨基酸位置处，将含有镰状突变的成纤维细胞中所存在的缬氨酸密码子(GTG)转化为同义谷氨酸密码子(GAA)。This example demonstrates similar efficacy when different mutations are installed into the same genomic locus by changing the sequence within the reverse transcriptase (RT) domain of the template guide RNA and keeping the design of the gene-modified polypeptide, template RNA, primer binding site (PBS), and template guide RNA scaffold unchanged. In this example, two adjacent DNA bases, one of which is located at the sickle cell disease mutation site within the B-globin locus, are replaced in wild-type fibroblasts, thereby converting the glutamate codon (GAG) at amino acid position 7 in the endogenous B-globin locus in human primary fibroblasts to alanine (GCA). At the same time, the valine codon (GTG) present in fibroblasts containing the sickle mutation is converted to a synonymous glutamate codon (GAA) at the same amino acid position.

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

更特别地，野生型成纤维细胞中所用的模板RNA包含如先前实例中所描述的tgRNA14的序列。More specifically, the template RNA used in wild-type fibroblasts comprised the sequence of tgRNA14 as described in the previous examples.

镰状成纤维细胞中所用的模板RNA包含以下序列且在前3个和最后3个碱基处含有2′-O-甲基硫代磷酸酯修饰：5’-CAUGGUGCACCUGACUCCUGGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCAGACUUCUCUUCAGGAGUCAGGUG-3’(SEQ ID NO：20614)The template RNA used in sickle fibroblasts contained the following sequence and contained 2'-O-methyl phosphorothioate modifications at the first 3 and last 3 bases: 5'-CAUGGUGCACCUGACUCCUGGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCAGACUUCUCUUCAGGAGUCAGGUG-3' (SEQ ID NO: 20614)

该系统进一步包含经设计以产生第二切口的gRNA序列，其中该gRNA具有以下的序列：5’-CCUUGAUACCAACCUGCCCAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU-3’(SEQ ID NO：20615)。The system further comprises a gRNA sequence designed to generate a second nick, wherein the gRNA has the following sequence: 5'-CCUUGAUACCAACCUGCCCAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU-3' (SEQ ID NO: 20615).

在本实例内，将包含以上序列的相同gRNA用于野生型和镰状细胞第二切口条件。In this example, the same gRNA comprising the above sequence was used for both wild-type and sickle cell second incision conditions.

将包含上文所述的基因修饰多肽和模板RNA的基因修饰系统电穿孔至野生型及含镰状突变的人原代成纤维细胞中。基因修饰多肽和模板RNA通过电穿孔以RNA形式递送且包含上文详述的序列。递送一个剂量(1000ng)，其中各基因编辑组分以指定剂量作为RNA递送。特别地，将1000ng基因修饰多肽RNA与1000ng呈RNA形式的模板RNA和1000ng呈RNA形式的第二切口gRNA在包含重悬浮于缓冲液R(英杰公司)中的200,000个原代人成纤维细胞的10μL电穿孔混合物中组合。随后将电穿孔混合物抽吸至10μL氖电穿孔尖端(英杰公司)中，转移至氖电穿孔系统(英杰公司)中，且在1700mV下用一个脉冲电穿孔20mS。随后将细胞转移至12孔板(康宁公司)的一个孔中，在1mL含有DMEM(补充有15％胎牛血清、1％非必需氨基酸、1％丙酮酸钠和1％HEPES)(全部来自吉布科公司)的Glutamax中培养，且在37℃、5％CO₂下培养3天，之后进行细胞裂解和基因组DNA提取。为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。通过桑格定序分析扩增子，随后使用TIDER算法分析。在内源B-球蛋白基因座内的转录起始位点下游将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤分别替换为碱基胞嘧啶和腺嘌呤指示在野生型细胞中成功编辑。相反，在内源B-球蛋白基因座内的转录起始位点下游将核苷酸位置20和21处的DNA碱基胸腺嘧啶和鸟嘌呤分别替换为碱基腺嘌呤指示在含有镰状突变的成纤维细胞中成功编辑。The gene modification system comprising the gene modification polypeptide and template RNA described above is electroporated into wild-type and human primary fibroblasts containing sickle mutations. The gene modification polypeptide and template RNA are delivered in RNA form by electroporation and include the sequence described in detail above. A dose (1000ng) is delivered, wherein each gene editing component is delivered as RNA at a specified dose. In particular, 1000ng of gene modification polypeptide RNA is combined with 1000ng of template RNA in RNA form and 1000ng of second nick gRNA in RNA form in a 10 μL electroporation mixture comprising 200,000 primary human fibroblasts resuspended in buffer R (Invitrogen). The electroporation mixture is then aspirated into 10 μL neon electroporation tips (Invitrogen), transferred to a neon electroporation system (Invitrogen), and electroporated with a pulse at 1700mV for 20mS. The cells were then transferred to a well of a 12-well plate (Corning), cultured in 1 mL of Glutamax containing DMEM (supplemented with 15% fetal bovine serum, 1% non-essential amino acids, 1% sodium pyruvate, and 1% HEPES) (all from Gibco), and cultured at 37 ° C, 5% CO ₂ for 3 days, followed by cell lysis and genomic DNA extraction. In order to analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus. The amplicons were analyzed by Sanger sequencing and then analyzed using the TIDER algorithm. The DNA bases adenine and guanine at nucleotide positions 20 and 21 were replaced with bases cytosine and adenine, respectively, downstream of the transcription start site in the endogenous B-globulin locus, indicating successful editing in wild-type cells. In contrast, the DNA bases thymine and guanine at nucleotide positions 20 and 21 were replaced with bases adenine, respectively, downstream of the transcription start site in the endogenous B-globulin locus, indicating successful editing in fibroblasts containing sickle mutations.

如图5所示，当基因编辑多肽与模板向导RNA组合时，分别在野生型和镰状成纤维细胞中检测到完美重写水平为10.8％和6.1％。添加第二切口使完美重写在野生型细胞中增加至75.6％且在镰状成纤维细胞中增加至74.6％。这些结果表明使用基因修饰系统来纠正带有镰状突变的人原代成纤维细胞中的致病性突变且将非致病性突变安装至野生型人原代成纤维细胞中。此外，引入第二切口gRNA使完美重写在野生型原代成纤维细胞中增加超过7倍且在镰状原代成纤维细胞中增加超过十倍。As shown in Figure 5, when the gene editing polypeptide was combined with the template guide RNA, perfect rewriting levels of 10.8% and 6.1% were detected in wild-type and sickle fibroblasts, respectively. Adding a second nick increased perfect rewriting to 75.6% in wild-type cells and 74.6% in sickle fibroblasts. These results indicate the use of a gene modification system to correct pathogenic mutations in human primary fibroblasts with sickle mutations and install non-pathogenic mutations into wild-type human primary fibroblasts. In addition, the introduction of a second nick gRNA increased perfect rewriting by more than 7-fold in wild-type primary fibroblasts and more than ten-fold in sickle primary fibroblasts.

实例8：量化小鼠原代肝细胞中实现的基因编辑多肽和模板重写内源FAH基因座的活性Example 8: Quantifying the activity of gene-edited peptides and templates rewriting the endogenous FAH locus in primary mouse hepatocytes

本实例展示了使用包含基因修饰多肽和模板RNA的基因修饰系统将源自Fah5981SB小鼠的小鼠原代肝细胞中的内源Fah基因座中的A核苷酸转化为G核苷酸。Fah5981SB小鼠模型在Fah基因外显子8的最后一个核苷酸中含有G到A的点突变，导致异常的mRNA剪接和随后的mRNA降解，而不产生Fah蛋白，因此可作为遗传性酪氨酸血症I型的小鼠模型。This example demonstrates the use of a gene modification system comprising a gene modification polypeptide and a template RNA to convert A nucleotides in the endogenous Fah locus in mouse primary hepatocytes derived from Fah5981SB mice to G nucleotides. The Fah5981SB mouse model contains a G to A point mutation in the last nucleotide of exon 8 of the Fah gene, resulting in aberrant mRNA splicing and subsequent mRNA degradation without the production of Fah protein, and thus can be used as a mouse model for hereditary tyrosinemia type I.

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

更特别地，模板RNA(包括化学修饰模式)包含以下序列：More specifically, the template RNA (including chemical modification patterns) comprises the following sequence:

FAH1_R14_P12重RNACS048FAH1_R14_P12 heavy RNACS048

FAH1_R15_P10_重RNACS049FAH1_R15_P10_Heavy RNACS049

FAH2_R19_P11_MUT_重RNACS052FAH2_R19_P11_MUT_Heavy RNACS052

FAH2_R19_P13_MUT_重RNACS053FAH2_R19_P13_MUT_Heavy RNACS053

可在本实验中使用的其他示例性模板RNA包括以下：Other exemplary template RNAs that can be used in this experiment include the following:

FAH1 RNACS050FAH1 RNACS050

FAH1 RNACS051FAH1 RNACS051

在上面的序列中，m＝2’-O-甲基核糖核苷酸，r＝核糖，*＝硫代磷酸酯键。In the above sequence, m = 2'-O-methyl ribonucleotide, r = ribose, * = phosphorothioate bond.

所测试的基因修饰多肽包含以下序列：RNAV209(nCas9-RT)和RNAV214(wtCas9-RT)。特别地，nCas9-RT和wtCas9-RT具有以下氨基酸序列：The gene-modifying polypeptides tested include the following sequences: RNAV209 (nCas9-RT) and RNAV214 (wtCas9-RT). In particular, nCas9-RT and wtCas9-RT have the following amino acid sequences:

nCas9-RT(RNAV209)：nCas9-RT(RNAV209):

wtCas9-RT(RNAV214)：wtCas9-RT(RNAV214):

下划线表示切口酶和野生型序列之间不同的残基。Underlining indicates residues that differ between the nickase and wild-type sequences.

将包含上面列出的基因修饰多肽和上述模板RNA的基因修饰系统转染到小鼠原代肝细胞中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，将4μg基因修饰多肽mRNA与10μg化学合成的模板RNA在5μL水中组合。将转染混合物添加到缓冲液P3[龙沙公司(Lonza)]中的100,000个小鼠原代肝细胞中，然后使用程序DG-138对细胞进行核转染。核转染后，将细胞在37℃、5％CO₂下培养3天，然后进行细胞裂解和基因组DNA提取。为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。使用IlluminaMiSeq通过短读段测序分析扩增子。fah基因外显子8末端A到G序列的转化指示编辑成功。The gene modification system comprising the gene modification polypeptide listed above and the above-mentioned template RNA is transfected into mouse primary hepatocytes. The gene modification polypeptide and template RNA are delivered in RNA form by nuclear transfection. In particular, 4 μg of gene modification polypeptide mRNA is combined with 10 μg of chemically synthesized template RNA in 5 μL of water. The transfection mixture is added to 100,000 mouse primary hepatocytes in buffer P3 [Lonza], and then the cells are nuclear transfected using program DG-138. After nuclear transfection, the cells are cultured at 37 ° C, 5% CO ₂ for 3 days, followed by cell lysis and genomic DNA extraction. In order to analyze gene editing activity, primers flanking the target insertion site locus are used to amplify across the locus. Illumina MiSeq is used to analyze the amplicon by short read sequencing. The conversion of the A to G sequence at the end of exon 8 of the fah gene indicates successful editing.

如图2所示，对于FAH2模板，使用RNAV209检测到4％-8％的完美重写水平(A到G的转化，没有检测到不需要的突变)，但使用RNAV214-040则没有。使用RNAV209观察到4.4％至6.6％的插入缺失水平。此外，使用与外显子7和8结合的引物，通过定量RT-PCR测量WT FahmRNA的量。如图3所示，当使用RNAV209-013mRNA测试FAH2模板时，FAH2模板使Fah mRNA的丰度相对于WT增加高达12％。这些结果表明，使用基因修饰系统来逆转Fah基因中的突变，导致野生型Fah mRNA表达的部分恢复。As shown in Figure 2, for the FAH2 template, a perfect rewriting level of 4%-8% (conversion of A to G, no unwanted mutations detected) was detected using RNAV209, but not using RNAV214-040. Indel levels of 4.4% to 6.6% were observed using RNAV209. In addition, the amount of WT FahmRNA was measured by quantitative RT-PCR using primers that bind to exons 7 and 8. As shown in Figure 3, when the FAH2 template was tested using RNAV209-013mRNA, the FAH2 template increased the abundance of Fah mRNA by up to 12% relative to WT. These results indicate that the use of a gene modification system to reverse mutations in the Fah gene resulted in partial restoration of wild-type Fah mRNA expression.

实例9：量化小鼠肝中实现的体内基因编辑多肽和模板重写内源FAH基因座的活性Example 9: Quantification of the activity of in vivo gene-editing polypeptides and templates rewriting the endogenous FAH locus in mouse liver

本实例展示了使用包含基因修饰多肽和模板RNA的基因修饰系统将Fah5981SB小鼠模型中的小鼠肝中的内源Fah基因座中的A核苷酸转化为G核苷酸。Fah5981SB小鼠模型在Fah基因外显子8的最后一个核苷酸中含有G到A的点突变，导致异常的mRNA剪接和随后的mRNA降解，而不产生Fah蛋白，并作为遗传性酪氨酸血症I型的小鼠模型。This example demonstrates the use of a gene modification system comprising a gene modification polypeptide and a template RNA to convert A nucleotides to G nucleotides in the endogenous Fah locus in mouse liver in the Fah5981SB mouse model. The Fah5981SB mouse model contains a G to A point mutation in the last nucleotide of exon 8 of the Fah gene, resulting in aberrant mRNA splicing and subsequent mRNA degradation without production of Fah protein, and serves as a mouse model for hereditary tyrosinemia type I.

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

更特别地，模板RNA包含以下序列：More specifically, the template RNA comprises the following sequence:

FAH1_R14_P12重RNACS048FAH1_R14_P12 heavy RNACS048

FAH1_R15_P10_重RNACS049FAH1_R15_P10_Heavy RNACS049

FAH2_R19_P11_MUT_重RNACS052FAH2_R19_P11_MUT_Heavy RNACS052

FAH2_R19_P13_MUT_重RNACS053FAH2_R19_P13_MUT_Heavy RNACS053

所测试的基因修饰多肽包含以下序列：RNAV209和RNAV214，其序列各自在实例3中提供。The genetically modified polypeptides tested included the following sequences: RNAV209 and RNAV214, each of which sequences is provided in Example 3.

将包含上述基因修饰多肽和模板RNA的基因修饰系统在LNP中配制并递送至小鼠。特别地，将LNP中配制的2mg/kg总RNA等效物(以1∶1(w/w)的模板RNA和mRNA组合)静脉内给药给7至9周龄、雄雌混合的Fah5981SB小鼠。给药后六小时或6天，处死动物，收集其肝进行分析。为了确定基因修饰多肽在肝中的表达分布，使用抗Cas9抗体对6小时肝样品进行免疫组织化学分析。染色后，通过QuPath Markup确定Cas9阳性肝细胞的定量。如图4所示，在82％-91％的肝细胞中观察到基因修饰多肽的表达。The gene modification system comprising the above-mentioned gene modification polypeptide and template RNA is formulated in LNP and delivered to mice. In particular, 2 mg/kg total RNA equivalents (combined with template RNA and mRNA at 1:1 (w/w)) formulated in LNP were intravenously administered to 7-9 week-old, male and female mixed Fah5981SB mice. Six hours or 6 days after administration, the animals were killed and their livers were collected for analysis. In order to determine the expression distribution of gene modification polypeptides in the liver, 6-hour liver samples were subjected to immunohistochemical analysis using anti-Cas9 antibodies. After staining, the quantification of Cas9-positive hepatocytes was determined by QuPath Markup. As shown in Figure 4, the expression of gene modification polypeptides was observed in 82%-91% of hepatocytes.

为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨给药后6天所收集肝样品的基因组DNA的基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。A核苷酸转化为G核苷酸表明编辑成功。如图5所示，在不同组中检测到0.1％-1.9％的完美重写水平(A到G的转化，没有检测到不需要的突变)。插入缺失水平在0.2％-0.4％范围内。To analyze gene editing activity, primers flanking the target insertion site locus were used to amplify the loci of genomic DNA from liver samples collected 6 days after administration. The amplicon was analyzed by short read sequencing using Illumina MiSeq. The conversion of A nucleotides to G nucleotides indicated successful editing. As shown in Figure 5, a perfect rewrite level of 0.1%-1.9% (conversion of A to G, no unwanted mutations were detected) was detected in different groups. The level of indels was in the range of 0.2%-0.4%.

为了确定基因编辑活性引起的表型纠正，通过实时qRT-PCR确定野生型FAH mRNA的恢复，并使用抗Fah抗体通过免疫组织化学分析确定Fah蛋白表达的恢复。如图6所示，相对于同窝杂合小鼠，不同组中检测到0.1％-6％的野生型mRNA恢复。如图7所示，在不同组中，0.1％-7％的肝横截面积中检测到了Fah蛋白。这些结果表明，在遗传性酪氨酸血症I型的体内小鼠模型中使用基因修饰系统逆转Fah基因突变，导致野生型Fah mRNA和Fah蛋白表达的部分恢复。To determine the phenotypic correction caused by gene editing activity, the restoration of wild-type FAH mRNA was determined by real-time qRT-PCR, and the restoration of Fah protein expression was determined by immunohistochemical analysis using anti-Fah antibodies. As shown in Figure 6, 0.1%-6% wild-type mRNA recovery was detected in different groups relative to heterozygous littermates. As shown in Figure 7, Fah protein was detected in 0.1%-7% of the liver cross-sectional area in different groups. These results indicate that reversal of the Fah gene mutation using the gene modification system in the in vivo mouse model of hereditary tyrosinemia type I resulted in partial restoration of wild-type Fah mRNA and Fah protein expression.

实例10.小鼠体内模型中TTR基因座处的基因编辑Example 10. Gene Editing at the TTR Locus in an In Vivo Mouse Model

本实例表明，在C57Blk/6小鼠中使用Cas9介导的基因编辑成功递送mRNA和指导物，该基因编辑使用靶向TTR基因座的原间隔子序列ACACAAAUACCAGUCCAGCG，其中使用基因修饰多肽和RNA。This example demonstrates the successful delivery of mRNA and guides using Cas9-mediated gene editing in C57Blk/6 mice using the protospacer sequence ACACAAAUACCAGUCCAGCG targeting the TTR locus using gene modifying polypeptides and RNA.

如下制备RNA。通过体外转录产生编码具有下表10A所示序列的基因修饰多肽的mRNA，并将纯化的mRNA溶解在1mM柠檬酸钠(pH 6)中，使RNA的终浓度为1-2mg/mL。类似地，通过化学合成产生具有下表10A所示序列的指导RNA，并将其溶解在水或水性缓冲液中，使RNA的终浓度为1-2mg/mL。RNA was prepared as follows. An mRNA encoding a gene-modified polypeptide having a sequence shown in Table 10A below was produced by in vitro transcription, and the purified mRNA was dissolved in 1 mM sodium citrate (pH 6) to a final concentration of 1-2 mg/mL of RNA. Similarly, a guide RNA having a sequence shown in Table 10A below was produced by chemical synthesis and dissolved in water or an aqueous buffer to a final concentration of 1-2 mg/mL of RNA.

表10A.实例10的序列Table 10A. Sequences of Example 10

脂质纳米颗粒(LNP)组分(可电离脂质、辅助脂质、固醇、PEG)与脂质组分一起溶解在100％乙醇中，摩尔比分别为47∶8∶43.5∶1.5。将RNA(指导和mRNA)以1∶1的重量比组合，并在pH 5的乙酸钠缓冲液中稀释至0.05-0.2mg/mL的浓度。将RNA配制成不同LNP，其中脂质胺与总RNA磷酸(N∶P)的摩尔比为4.0。LNP是由脂质和RNA溶液的微流体或湍流混合形成的。在使用不同流速的混合过程中，水溶剂与有机溶剂的比例保持为3∶1。混合后，将LNP稀释、收集，并使用切向流过滤缓冲液交换到50mMTris、9％蔗糖缓冲液。将配制品浓缩至1.0mg/mL或更高，然后通过0.2μm无菌过滤器过滤。最终的LNP储存在-80℃直至进一步使用。Lipid nanoparticle (LNP) components (ionizable lipids, helper lipids, sterols, PEG) were dissolved in 100% ethanol along with lipid components at a molar ratio of 47:8:43.5:1.5, respectively. RNA (guide and mRNA) were combined at a weight ratio of 1:1 and diluted to a concentration of 0.05-0.2 mg/mL in sodium acetate buffer at pH 5. RNA was formulated into different LNPs with a molar ratio of lipid amine to total RNA phosphate (N:P) of 4.0. LNPs were formed by microfluidic or turbulent mixing of lipid and RNA solutions. During the mixing process using different flow rates, the ratio of aqueous solvent to organic solvent was maintained at 3:1. After mixing, the LNPs were diluted, collected, and buffer exchanged to 50 mM Tris, 9% sucrose buffer using tangential flow filtration. The formulation was concentrated to 1.0 mg/mL or higher and then filtered through a 0.2 μm sterile filter. The final LNPs were stored at -80°C until further use.

通过尾静脉推注注射将LNP配制品以1-0.1mg/kg的浓度静脉内递送至约8周龄的C57Blk/6小鼠。注射后6小时对动物实施安乐死并在尸检时收集肝来测量Cas9-RT的表达。注射后5天对动物实施安乐死，并在尸检时收集肝，以评估TTR基因座的基因编辑活性。通过蛋白质印迹测量肝中Cas9-RT基因编辑多肽的表达，其中通过小鼠单克隆抗体(7A9-3A3，细胞信号传导技术公司(Cell Signaling Technology))检测Cas9，并使用GAPDH(细胞信号传导技术公司)作为负载对照。(图12)。通过桑格测序对TTR基因座的编辑进行定量，然后对靠近原间隔子结合位点的TTR基因座的扩增子进行TIDE分析。观察到TTR基因座的编辑，如图13所示。使用标准曲线(Aviva Biosciences公司)通过ELISA对血清中的TTR蛋白水平进行定量。如图14所示，经处理的动物血清中的TTR蛋白水平下降。这些实验表明，Cas9-RT多肽可以体内表达，并能编辑TTR基因座，导致血清中TTR蛋白水平下降。The LNP formulation was delivered intravenously to C57Blk/6 mice of about 8 weeks of age at a concentration of 1-0.1 mg/kg by tail vein bolus injection. The animals were euthanized 6 hours after injection and the liver was collected at autopsy to measure the expression of Cas9-RT. The animals were euthanized 5 days after injection, and the liver was collected at autopsy to evaluate the gene editing activity of the TTR locus. The expression of Cas9-RT gene editing polypeptide in the liver was measured by Western blotting, wherein Cas9 was detected by mouse monoclonal antibody (7A9-3A3, Cell Signaling Technology), and GAPDH (Cell Signaling Technology) was used as a load control. (Figure 12). The editing of the TTR locus was quantified by Sanger sequencing, and then the amplicon of the TTR locus near the original spacer binding site was subjected to TIDE analysis. The editing of the TTR locus was observed, as shown in Figure 13. The TTR protein level in serum was quantified by ELISA using a standard curve (Aviva Biosciences). As shown in Figure 14, the TTR protein level in the serum of the treated animals was reduced. These experiments show that the Cas9-RT polypeptide can be expressed in vivo and can edit the TTR locus, resulting in a decrease in the TTR protein level in the serum.

实例11.体内食蟹猕猴模型中TTR基因座处的基因编辑Example 11. Gene Editing at the TTR Locus in an In Vivo Cynomolgus Monkey Model

该实例表明，在食蟹猕猴模型中使用Cas9介导的基因编辑成功递送mRNA和指导物，该基因编辑使用靶向TTR基因座的原间隔子序列ACACAAAUACCAGUCCAGCG(SEQ ID NO：20630)，其中使用基因修饰多肽和RNA。This example demonstrates successful delivery of mRNA and guides using Cas9-mediated gene editing in a cynomolgus macaque model using the protospacer sequence ACACAAAUACCAGUCCAGCG (SEQ ID NO: 20630) targeting the TTR locus, using gene modifying polypeptides and RNA.

如下制备RNA。通过体外转录产生编码具有下表11A所示序列的基因修饰多肽的mRNA，并将纯化的mRNA溶解在1mM柠檬酸钠(pH 6)中，使RNA的终浓度为1-2mg/mL。类似地，通过化学合成产生具有下表11A所示序列的指导RNA，并将其溶解在水或水性缓冲液中，使RNA的终浓度为1-2mg/mL。RNA was prepared as follows. mRNA encoding a gene-modified polypeptide having a sequence shown in Table 11A below was produced by in vitro transcription, and the purified mRNA was dissolved in 1 mM sodium citrate (pH 6) to a final concentration of 1-2 mg/mL of RNA. Similarly, a guide RNA having a sequence shown in Table 11A below was produced by chemical synthesis and dissolved in water or an aqueous buffer to a final concentration of 1-2 mg/mL of RNA.

表11A.实例11的序列Table 11A. Sequences of Example 11

脂质纳米颗粒(LNP)组分(可电离脂质、辅助脂质、固醇、PEG)与脂质组分一起溶解在100％乙醇中，摩尔比分别为47∶8∶43.5∶1.5。将RNA(指导和mRNA)以1∶1的重量比组合，并在pH 5的乙酸钠缓冲液中稀释至0.05-0.2mg/mL的浓度。将RNA配制成不同LNP，其中脂质胺与总RNA磷酸(N∶P)的摩尔比为4.0。LNP是由脂质和RNA溶液的微流体或湍流混合形成的。在使用不同流速的混合过程中，水溶剂与有机溶剂的比例保持为3∶1。混合后，将LNP稀释、收集，并使用切向流过滤缓冲液交换到50mMTris、9％蔗糖缓冲液。将配制品浓缩至1.0mg/mL或更高，然后通过0.2μm无菌过滤器过滤。最终的LNP储存在-80℃直至进一步使用。LNP配制品通过输注在1小时内以2mg/kg静脉内递送，其中输注体积为5ml/kg。来自亚洲大陆的食蟹猕猴在使用注射泵进行静脉内输注前1.5-2h通过肌内注射给予地塞米松2mg/kg推注。输注后对动物进行监测，并通过输注后8-12h、24h和48h从肝采集的腹腔镜活检来测量Cas9-RT的表达。输注后14天对动物实施安乐死，并通过将器官分成8个不同的片段来收获肝，评估TTR基因座的基因编辑活性。通过使用ProteinSimple Jess系统(bio-techne公司)的毛细管电泳蛋白质印迹对肝中Cas9-RT基因编辑多肽的表达进行定量，其中通过小鼠单克隆抗体(7A9-3A3，细胞信号传导技术公司)检测Cas9。Lipid nanoparticle (LNP) components (ionizable lipids, helper lipids, sterols, PEG) were dissolved in 100% ethanol along with lipid components at a molar ratio of 47:8:43.5:1.5, respectively. RNA (guide and mRNA) were combined at a weight ratio of 1:1 and diluted to a concentration of 0.05-0.2 mg/mL in sodium acetate buffer at pH 5. RNA was formulated into different LNPs with a molar ratio of lipid amine to total RNA phosphate (N:P) of 4.0. LNPs were formed by microfluidic or turbulent mixing of lipid and RNA solutions. During the mixing process using different flow rates, the ratio of aqueous solvent to organic solvent was maintained at 3:1. After mixing, the LNPs were diluted, collected, and buffer exchanged to 50 mM Tris, 9% sucrose buffer using tangential flow filtration. The formulation was concentrated to 1.0 mg/mL or higher and then filtered through a 0.2 μm sterile filter. The final LNPs were stored at -80°C until further use. LNP formulations were delivered intravenously at 2 mg/kg within 1 hour by infusion, with an infusion volume of 5 ml/kg. Macaques from mainland Asia were given dexamethasone 2 mg/kg bolus by intramuscular injection 1.5-2 h before intravenous infusion using a syringe pump. The animals were monitored after infusion, and the expression of Cas9-RT was measured by laparoscopic biopsies collected from the liver 8-12 h, 24 h, and 48 h after infusion. The animals were euthanized 14 days after infusion, and the liver was harvested by dividing the organ into 8 different fragments to evaluate the gene editing activity of the TTR locus. The expression of Cas9-RT gene editing polypeptides in the liver was quantified by capillary electrophoresis western blotting using the ProteinSimple Jess system (bio-techne company), wherein Cas9 was detected by mouse monoclonal antibodies (7A9-3A3, Cell Signaling Technology Company).

通过曲线下面积分析来测量Cas9-RT基因编辑多肽的相对表达，如图15所示。通过对靠近原间隔子结合位点的TTR基因座的扩增子测序对TTR基因座的编辑进行定量。观察到TTR基因座的编辑，如图16中所示。这些实验证明，在非人类灵长类动物模型中，Cas9-RT多肽可以体内表达，且可以编辑TTR基因座。The relative expression of Cas9-RT gene editing polypeptides was measured by area under the curve analysis, as shown in Figure 15. Editing of the TTR locus was quantified by sequencing the amplicon of the TTR locus near the protospacer binding site. Editing of the TTR locus was observed, as shown in Figure 16. These experiments demonstrate that Cas9-RT polypeptides can be expressed in vivo and can edit the TTR locus in a non-human primate model.

实例12：量化CD34+原代人造血干细胞(HSC)中实现的基因修饰多肽和模板RNA编辑内源B-球蛋白基因座的活性Example 12: Quantifying the activity of gene-modified polypeptides and template RNA editing of the endogenous B-globin locus in CD34+ primary human hematopoietic stem cells (HSCs)

该实例表明使用含有示例性基因修饰多肽和模板RNA的基因修饰系统将原代人HSC中的内源B-球蛋白基因座中的氨基酸位置7处的谷氨酸密码子(GAG)转化为丙氨酸(GCG)，由此表明靶向与镰状细胞病(SCD)相关的序列位置及编辑序列以编码位置7处的非致病性氨基酸。通过该过程安装的“C”残基被称为“Makassar”变体，并且是在人类群体中出现的非致病性序列变体。该转化包含一个碱基对的变化(即，将核苷酸位置20处的DNA碱基腺嘌呤替换为SEQ ID NO：20633中的碱基胞嘧啶)。This example demonstrates the use of a gene modification system containing an exemplary gene modification polypeptide and a template RNA to convert a glutamic acid codon (GAG) at amino acid position 7 in the endogenous B-globin locus in primary human HSCs to alanine (GCG), thereby demonstrating targeting a sequence position associated with sickle cell disease (SCD) and editing the sequence to encode a non-pathogenic amino acid at position 7. The "C" residue installed by this process is referred to as the "Makassar" variant and is a non-pathogenic sequence variant that occurs in the human population. The conversion comprises a one base pair change (i.e., replacing the DNA base adenine at nucleotide position 20 with the base cytosine in SEQ ID NO: 20633).

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

示例性模板RNA从5’至3’包含以下序列，其中前3个和最后3个碱基具有如所指示的2′-O-甲基硫代磷酸酯化学修饰。在下面的序列中，m＝2’-O-甲基核糖核苷酸，r＝核糖，*＝硫代磷酸酯键。An exemplary template RNA comprises the following sequence from 5' to 3', wherein the first 3 and last 3 bases have 2'-O-methyl phosphorothioate chemical modifications as indicated. In the sequence below, m = 2'-O-methyl ribonucleotide, r = ribose, * = phosphorothioate bond.

这些序列的未经修饰版本显示在下表BB中。在一些实施例中，此表中使用的序列无需经化学修饰即可使用。Unmodified versions of these sequences are shown below in Table BB. In some embodiments, the sequences used in this table may be used without chemical modification.

表BB.无核苷酸修饰的tg34、tg35和tg36。Table BB. tg34, tg35 and tg36 without nucleotide modifications.

将包含上文所述的基因修饰多肽和模板RNA的基因修饰系统转染至人HSC中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，将2000ng编码基因修饰多肽的mRNA与2000ng模板RNA组合。将RNA混合物添加至总共20μL的Lonza P3缓冲液中的200,000个原代人HSC，并且使用程序DZ-100在16孔核转染盒中对HSC进行核转染。在核转染之后，将细胞在室温下孵育10分钟并转移至在各孔中含有为100ng/mL的500μLStemSpan-XF+SCF、为100ng/mL的Flt3-L以及为100ng/mL的TPO的24孔板，并在37℃、5％CO₂下培养3天，之后进行细胞裂解及基因组DNA提取。为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。在内源B-球蛋白基因座内的转录起始位点下游将核苷酸位置20处的DNA碱基腺嘌呤替换为碱基胞嘧啶指示成功编辑。The gene modification system comprising the gene modification polypeptide and template RNA described above is transfected into human HSC. The gene modification polypeptide and template RNA are delivered in RNA form by nuclear transfection. In particular, 2000ng of mRNA encoding the gene modification polypeptide is combined with 2000ng of template RNA. The RNA mixture is added to 200,000 primary human HSCs in a total of 20 μL of Lonza P3 buffer, and HSCs are nuclear transfected in a 16-well nuclear transfection box using program DZ-100. After nuclear transfection, the cells are incubated at room temperature for 10 minutes and transferred to a 24-well plate containing 500 μL StemSpan-XF+SCF of 100ng/mL, Flt3-L of 100ng/mL, and TPO of 100ng/mL in each well, and cultured for 3 days at 37°C, 5% _CO2 , followed by cell lysis and genomic DNA extraction. To analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus. Amplicons were analyzed by short read sequencing using Illumina MiSeq. Replacement of the DNA base adenine at nucleotide position 20 with the base cytosine downstream of the transcription start site within the endogenous B-globin locus indicated successful editing.

如图17所示，当用示例性模板gRNA及编码示例性基因修饰多肽的mRNA处理原代人HSC时，在原代人HSC中检测到平均完美重写水平为1.3％至1.8％，对应于用SCD密码子处的A替换C核苷酸。这些结果表明，使用基因修饰系统将非致病性序列编辑至原代人HSC中的内源B-球蛋白基因座中的临床上相关密码子中。结果进一步表明，几种示例性模板RNA可用于实现所需编辑。As shown in Figure 17, when primary human HSCs were treated with exemplary template gRNAs and mRNAs encoding exemplary gene-modified polypeptides, an average perfect rewriting level of 1.3% to 1.8% was detected in primary human HSCs, corresponding to replacement of C nucleotides with A at SCD codons. These results indicate that non-pathogenic sequences were edited into clinically relevant codons in the endogenous B-globulin locus in primary human HSCs using the gene modification system. The results further indicate that several exemplary template RNAs can be used to achieve the desired editing.

实例13：比较不同的第二链靶向性gRNA与基因修饰多肽和模板RNA的组合编辑CD34+原代人造血干细胞(HSC)中的内源性B-球蛋白基因座的活性。Example 13: Comparison of the activity of different second-strand targeting gRNAs in combination with gene-modifying polypeptides and template RNAs in editing the endogenous B-globulin locus in CD34+ primary human hematopoietic stem cells (HSCs).

该实例表明，使用含有基因修饰多肽、模板RNA和几种不同示例性第二链靶向性gRNA中之一的示例性基因修饰系统将原代人HSC中的内源B-球蛋白基因座中的氨基酸位置7处的谷氨酸密码子(GAG)转化为丙氨酸(GCA或GCG)，由此表明靶向与镰状细胞病(SCD)相关的序列位置及编辑该序列以编码位置7处的非致病性氨基酸。该转化包括：包含示例性HBB5间隔子的示例性模板RNA的两个碱基对的变化(即，将核苷酸位置20及21处的DNA碱基腺嘌呤和鸟嘌呤分别替换为碱基胞嘧啶和腺嘌呤)；及包含示例性HBB8间隔子的示例性模板RNA的一个碱基对的变化(即，将核苷酸位置20处的DNA碱基腺嘌呤替换为碱基胞嘧啶)。This example demonstrates the use of an exemplary gene modification system containing a gene modification polypeptide, a template RNA, and one of several different exemplary second strand targeting gRNAs to convert a glutamic acid codon (GAG) at amino acid position 7 in an endogenous B-globin locus in primary human HSCs to an alanine (GCA or GCG), thereby demonstrating targeting of a sequence position associated with sickle cell disease (SCD) and editing the sequence to encode a non-pathogenic amino acid at position 7. The conversion includes two base pair changes of an exemplary template RNA comprising an exemplary HBB5 spacer (i.e., DNA bases adenine and guanine at nucleotide positions 20 and 21 are replaced with bases cytosine and adenine, respectively); and one base pair change of an exemplary template RNA comprising an exemplary HBB8 spacer (i.e., DNA base adenine at nucleotide position 20 is replaced with base cytosine).

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

模板RNA包含实例5中阐述的分别针对示例性HBB5模板RNA标记为FYF tgRNA14或针对示例性HBB8模板RNA标记为tg34的核酸序列。The template RNA comprises the nucleic acid sequence described in Example 5, labeled as FYF tgRNA14 for the exemplary HBB5 template RNA or as tg34 for the exemplary HBB8 template RNA, respectively.

基因修饰多肽包含实例8中阐述的标记为RNAV209的氨基酸序列。The genetically modified polypeptide comprises the amino acid sequence labeled RNAV209 described in Example 8.

经设计以产生第二切口的第二链靶向性gRNA序列包含表X1中所列的序列。The second strand targeting gRNA sequences designed to produce the second nick include the sequences listed in Table X1.

将包含上文所述的基因修饰多肽和模板RNA的基因修饰系统转染至人HSC中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，将3000ng编码基因修饰多肽的mRNA与2000ng模板RNA及2000ng(针对包含HBB5模板RNA的系统)或3000ng(针对包含HBB8模板RNA的系统)的第二链靶向性gRNA组合。将RNA混合物添加至总共20μL的LonzaP3缓冲液中的200,000个原代人HSC，并且使用程序DZ-100在16孔核转染盒中对HSC进行核转染。在核转染之后，将细胞在室温下孵育10分钟并转移至在各孔中含有为100ng/mL的500μLStemSpan-XF+SCF、为100ng/mL的Flt3-L以及为100ng/mL的TPO的24孔板，并在37℃、5％CO₂下培养3天，之后进行细胞裂解及基因组DNA提取。为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。在内源B-球蛋白基因座内的转录起始位点下游将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤替换为碱基胞嘧啶和腺嘌呤(HBB5模板RNA)或将核苷酸位置20处的DNA碱基腺嘌呤替换为碱基胞嘧啶(HBB8模板RNA)指示成功编辑。The gene modification system comprising the gene modification polypeptide and template RNA described above is transfected into human HSC. The gene modification polypeptide and template RNA are delivered in RNA form by nuclear transfection. In particular, 3000ng of mRNA encoding the gene modification polypeptide is combined with 2000ng of template RNA and 2000ng (for a system comprising HBB5 template RNA) or 3000ng (for a system comprising HBB8 template RNA) of the second strand targeting gRNA. The RNA mixture is added to 200,000 primary human HSCs in a total of 20 μL of Lonza P3 buffer, and HSCs are nuclear transfected in a 16-well nuclear transfection box using program DZ-100. After nuclear transfection, the cells were incubated at room temperature for 10 minutes and transferred to a 24-well plate containing 500 μL StemSpan-XF+SCF, 100 ng/mL Flt3-L, and 100 ng/mL TPO in each well, and cultured at 37 ° C, 5% CO ₂ for 3 days, followed by cell lysis and genomic DNA extraction. In order to analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus. The amplicons were analyzed by short read sequencing using Illumina MiSeq. The DNA bases adenine and guanine at nucleotide positions 20 and 21 were replaced with bases cytosine and adenine (HBB5 template RNA) or the DNA base adenine at nucleotide position 20 was replaced with base cytosine (HBB8 template RNA) downstream of the transcription start site within the endogenous B-globulin locus, indicating successful editing.

如图18A所示，当用包含示例性HBB5模板RNA tg14及各种第二链靶向性gRNA的示例性基因修饰系统处理HSC时，在原代人HSC中检测到平均完美重写水平为4.5％-21.3％，对应于将核苷酸位置20和21处的腺嘌呤和鸟嘌呤(分别)替换为SCD密码子处的碱基胞嘧啶和腺嘌呤。如图18B所示，当用包含示例性HBB8模板gRNA tg34及各种第二链靶向性gRNA的示例性基因修饰系统处理HSC时，在原代人HSC中检测到平均完美重写水平为2.9％-24.6％，对应于将核苷酸位置20处的腺嘌呤替换为SCD密码子处的碱基胞嘧啶。As shown in Figure 18A, when HSCs were treated with an exemplary gene modification system comprising an exemplary HBB5 template RNA tg14 and various second-strand targeting gRNAs, an average perfect rewriting level of 4.5%-21.3% was detected in primary human HSCs, corresponding to the replacement of adenine and guanine (respectively) at nucleotide positions 20 and 21 with the bases cytosine and adenine at SCD codons. As shown in Figure 18B, when HSCs were treated with an exemplary gene modification system comprising an exemplary HBB8 template gRNA tg34 and various second-strand targeting gRNAs, an average perfect rewriting level of 2.9%-24.6% was detected in primary human HSCs, corresponding to the replacement of adenine at nucleotide position 20 with the base cytosine at SCD codons.

这些结果表明，使用第二链靶向性gRNA增加靶向原代人HSC中的内源B-球蛋白基因座中的临床上相关密码子的示例性基因修饰系统的编辑活性。结果进一步表明，调节第二链靶向性gRNA(例如，相对于示例性模板RNA的间隔子所靶向的序列)的定位增加编辑活性的增强，例如在不存在第二链靶向性gRNA的情况下比完美重写高超过9倍。These results indicate that the editing activity of an exemplary gene modification system targeting clinically relevant codons in the endogenous B-globulin locus in primary human HSCs is increased using a second strand targeting gRNA. The results further indicate that the positioning of the second strand targeting gRNA (e.g., relative to the sequence targeted by the spacer of the exemplary template RNA) increases the enhancement of editing activity, e.g., more than 9 times higher than perfect rewriting in the absence of a second strand targeting gRNA.

实例14：表征包括用于编辑CD34+原代人造血干细胞(HSC)中的内源B-球蛋白基因座的沉默取代的模板RNA的构型。Example 14: Characterization of the configuration of template RNA comprising silent substitutions for editing the endogenous B-globin locus in CD34+ primary human hematopoietic stem cells (HSCs).

该实例表明使用含有示例性基因修饰多肽和包含不同沉默取代的各种模板RNA的基因修饰系统将原代人HSC中的内源B-球蛋白基因座中的氨基酸位置7处的谷氨酸密码子转化为丙氨酸，由此表明靶向与镰状细胞病(SCD)相关的序列位置及编辑序列以将非致病性序列编码至位置7中。该转化包含示例性HBB5模板RNA的两个碱基对的变化(即，将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤替换为碱基胞嘧啶和腺嘌呤)加上包括另外的相关沉默取代(其通过使用不同同义密码子改变DNA的核酸序列但不改变蛋白质序列)。This example demonstrates the use of a gene modification system containing an exemplary gene modification polypeptide and various template RNAs containing different silent substitutions to convert a glutamic acid codon at amino acid position 7 in the endogenous B-globulin locus in primary human HSCs to alanine, thereby demonstrating targeting of a sequence position associated with sickle cell disease (SCD) and editing of the sequence to encode a non-pathogenic sequence into position 7. The conversion comprises a two base pair change of the exemplary HBB5 template RNA (i.e., replacement of the DNA bases adenine and guanine at nucleotide positions 20 and 21 with the bases cytosine and adenine) plus the inclusion of additional relevant silent substitutions that alter the nucleic acid sequence of the DNA but not the protein sequence by using different synonymous codons.

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

更特别地，示例性模板RNA从5’至3’包含以下序列，其中前3个和最后3个碱基具有2′-O-甲基硫代磷酸酯化学修饰。在下面的序列中，m＝2’-O-甲基核糖核苷酸，r＝核糖，*＝硫代磷酸酯键。包括取代与RT/PBS长度的不同组合(表X2)。More specifically, the exemplary template RNA comprises the following sequence from 5' to 3', wherein the first 3 and last 3 bases have 2'-O-methyl phosphorothioate chemical modifications. In the sequence below, m = 2'-O-methyl ribonucleotide, r = ribose, * = phosphorothioate bond. Different combinations of substitutions and RT/PBS lengths are included (Table X2).

不包含沉默取代的选择对应模板RNA序列提供于实例5中(例如，FYF tgRNA14为tg14h的对应模板RNA序列，FYF tgRNA19为tg19h的对应模板RNA序列等)。Selected corresponding template RNA sequences that do not include silent substitutions are provided in Example 5 (eg, FYF tgRNA14 is the corresponding template RNA sequence of tg14h, FYF tgRNA19 is the corresponding template RNA sequence of tg19h, etc.).

所使用的基因修饰多肽包含实例8中阐述的标记为RNAV209的序列。The genetically modified polypeptide used comprised the sequence labeled RNAV209 described in Example 8.

将包含上文所述的基因修饰多肽和模板RNA的基因修饰系统转染至人HSC中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，将3000ng编码基因修饰多肽的mRNA与2000ng模板RNA组合。将RNA混合物添加至总共20μL的Lonza P3缓冲液中的200,000个原代人HSC，并且使用程序DZ-100在16孔核转染盒中对HSC进行核转染。在核转染之后，将细胞在室温下孵育10分钟并转移至在各孔中含有为100ng/mL的500μL StemSpan-XF+SCF、为100ng/mL的Flt3-L以及为100ng/mL的TPO的24孔板，并在37℃、5％CO₂下培养3天，之后进行细胞裂解及基因组DNA提取。为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。在内源B-球蛋白基因座内转录起始位点下游将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤替换为碱基胞嘧啶和腺嘌呤(HBB5模板RNA)加上包括预期沉默取代指示成功编辑。The gene modification system comprising the gene modification polypeptide and template RNA described above is transfected into human HSC. The gene modification polypeptide and template RNA are delivered in RNA form by nuclear transfection. In particular, 3000ng of mRNA encoding the gene modification polypeptide is combined with 2000ng of template RNA. The RNA mixture is added to 200,000 primary human HSCs in a total of 20 μL of Lonza P3 buffer, and HSCs are nuclear transfected in a 16-well nuclear transfection box using program DZ-100. After nuclear transfection, the cells are incubated at room temperature for 10 minutes and transferred to 24-well plates containing 500 μL StemSpan-XF+SCF of 100ng/mL, Flt3-L of 100ng/mL, and TPO of 100ng/mL in each well, and cultured for 3 days at 37°C, 5% _CO2 , followed by cell lysis and genomic DNA extraction. To analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus. Amplicons were analyzed by short read sequencing using Illumina MiSeq. The DNA bases adenine and guanine at nucleotide positions 20 and 21 were replaced with the bases cytosine and adenine (HBB5 template RNA) downstream of the transcription start site within the endogenous B-globin locus, plus the inclusion of the expected silent substitution indicated successful editing.

如图19A所示，当用包含含有各种沉默取代的示例性HBB5模板RNA的示例性基因修饰系统处理HSC时，在原代人HSC中检测到平均完美重写水平为0.2％-7.3％，对应于将核苷酸位置20和21处的腺嘌呤和鸟嘌呤(分别)替换为SCD密码子处的碱基胞嘧啶和腺嘌呤。结果显示，在一些情况下，一种或多种沉默取代可增加跨几种不同模板RNA的编辑活性，例如示例性沉默取代hs1。特别地，将编码HBB基因的计入初始甲硫氨酸第6个氨基酸(脯氨酸)的密码子替换为CCC或CCG使编辑增加。As shown in Figure 19A, when HSCs were treated with an exemplary gene modification system containing an exemplary HBB5 template RNA containing various silent substitutions, an average perfect rewriting level of 0.2%-7.3% was detected in primary human HSCs, corresponding to the replacement of adenine and guanine (respectively) at nucleotide positions 20 and 21 with the bases cytosine and adenine at the SCD codon. The results show that in some cases, one or more silent substitutions can increase editing activity across several different template RNAs, such as the exemplary silent substitution hs1. In particular, replacing the codon encoding the 6th amino acid (proline) of the HBB gene counting the initial methionine with CCC or CCG increases editing.

这些结果表明，当靶向原代人HSC中的内源B-球蛋白基因座中的临床上相关密码子时，在示例性模板RNA内引入沉默取代使包含所述模板RNA的基因修饰系统的编辑活性增加高达5倍。结果进一步表明，调节沉默取代的一种或多种一致性可增加对编辑活性的增强。These results show that when targeting clinically relevant codons in the endogenous B-globin locus in primary human HSCs, the introduction of silent substitutions within the exemplary template RNA increased the editing activity of the gene modification system comprising the template RNA by up to 5-fold. The results further show that modulating one or more consistency of silent substitutions can increase the enhancement of editing activity.

实例15：表征包括用于编辑CD34+原代人造血干细胞(HSC)中的内源B-球蛋白基因座的沉默取代的模板RNA的构型。Example 15: Characterization of the configuration of template RNA comprising silent substitutions for editing the endogenous B-globin locus in CD34+ primary human hematopoietic stem cells (HSCs).

该实例表明使用含有示例性基因修饰多肽和包含不同沉默取代的各种模板RNA的基因修饰系统将原代人HSC中的内源B-球蛋白基因座中的氨基酸位置7处的谷氨酸密码子转化为丙氨酸，由此表明靶向与镰状细胞病(SCD)相关的序列位置及编辑序列以将非致病性序列编码至位置7中。该转化包含示例性HBB8模板RNA的一个碱基对的变化(即，将核苷酸位置20处的DNA碱基腺嘌呤替换为碱基胞嘧啶)加上包括另外的相关沉默取代(其通过使用不同同义密码子改变DNA的核酸序列但不改变蛋白质序列)。This example demonstrates the use of a gene modification system containing an exemplary gene modification polypeptide and various template RNAs containing different silent substitutions to convert a glutamic acid codon at amino acid position 7 in the endogenous B-globulin locus in primary human HSCs to alanine, thereby demonstrating targeting of a sequence position associated with sickle cell disease (SCD) and editing of the sequence to encode a non-pathogenic sequence into position 7. The conversion comprises a one base pair change of the exemplary HBB8 template RNA (i.e., replacement of the DNA base adenine at nucleotide position 20 with the base cytosine) plus the inclusion of an additional relevant silent substitution that alters the nucleic acid sequence of the DNA but does not alter the protein sequence by using a different synonymous codon.

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

更特别地，示例性模板RNA从5’至3’包含以下序列，其中前3个和最后3个碱基具有2′-O-甲基硫代磷酸酯化学修饰。在下面的序列中，m＝2’-O-甲基核糖核苷酸，r＝核糖，*＝硫代磷酸酯键。包括取代与RT/PBS长度的不同组合(表X3)。More specifically, the exemplary template RNA comprises the following sequence from 5' to 3', wherein the first 3 and last 3 bases have 2'-O-methyl phosphorothioate chemical modifications. In the sequence below, m = 2'-O-methyl ribonucleotide, r = ribose, * = phosphorothioate bond. Different combinations of substitutions and RT/PBS lengths are included (Table X3).

将包含上文所述的基因修饰多肽和模板RNA的基因修饰系统转染至人HSC中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，将3000ng编码基因修饰多肽的mRNA与3000ng模板RNA组合。将RNA混合物添加至总共20μL的Lonza P3缓冲液中的200,000个原代人HSC，并且使用程序DZ-100在16孔核转染盒中对HSC进行核转染。在核转染之后，将细胞在室温下孵育10分钟并转移至在各孔中含有为100ng/mL的500μL StemSpan-XF+SCF、为100ng/mL的Flt3-L以及为100ng/mL的TPO的24孔板，并在37℃、5％CO₂下培养3天，之后进行细胞裂解及基因组DNA提取。为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。在内源B-球蛋白基因座内转录起始位点下游将核苷酸位置20处的DNA碱基腺嘌呤替换为碱基胞嘧啶(HBB8模板RNA)加上包括预期沉默取代指示成功编辑。The gene modification system comprising the gene modification polypeptide and template RNA described above is transfected into human HSC. The gene modification polypeptide and template RNA are delivered in RNA form by nuclear transfection. In particular, 3000ng of mRNA encoding the gene modification polypeptide is combined with 3000ng of template RNA. The RNA mixture is added to 200,000 primary human HSCs in a total of 20 μL of Lonza P3 buffer, and HSCs are nuclear transfected in a 16-well nuclear transfection box using program DZ-100. After nuclear transfection, the cells are incubated at room temperature for 10 minutes and transferred to 24-well plates containing 500 μL StemSpan-XF+SCF of 100ng/mL, Flt3-L of 100ng/mL, and TPO of 100ng/mL in each well, and cultured for 3 days at 37°C, 5% _CO2 , followed by cell lysis and genomic DNA extraction. To analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus. Amplicons were analyzed by short read sequencing using Illumina MiSeq. The DNA base adenine at nucleotide position 20 was replaced with the base cytosine (HBB8 template RNA) downstream of the transcription start site within the endogenous B-globin locus plus the inclusion of the expected silent substitution indicated successful editing.

如图19B所示，当用包含含有各种沉默取代的示例性HBB8模板RNA的示例性基因修饰系统处理HSC时，在原代人HSC中检测到平均完美重写水平为0.1％-13.1％，对应于将核苷酸位置20和21处的腺嘌呤和鸟嘌呤(分别)替换为SCD密码子处的碱基胞嘧啶和腺嘌呤。这些结果进一步表明，当靶向原代人HSC中的内源B-球蛋白基因座中的临床上相关密码子时，在示例性模板gRNA内引入沉默取代使包含所述模板RNA的基因修饰系统的编辑活性增加超过9倍。结果进一步表明，调节沉默取代的一种或多种一致性可增加对编辑活性的增强。As shown in Figure 19B, when HSC was treated with an exemplary gene modification system comprising an exemplary HBB8 template RNA containing various silent substitutions, an average perfect rewrite level of 0.1%-13.1% was detected in primary human HSC, corresponding to the replacement of adenine and guanine (respectively) at nucleotide positions 20 and 21 with the bases cytosine and adenine at the SCD codon. These results further indicate that when targeting clinically relevant codons in the endogenous B-globulin locus in primary human HSC, the introduction of silent substitutions in the exemplary template gRNA increased the editing activity of the gene modification system comprising the template RNA by more than 9 times. The results further indicate that regulating one or more consistencies of silent substitutions can increase the enhancement of editing activity.

实例16：评估第二链靶向性gRNA和沉默取代对CD34+原代人造血干细胞(HSC)中实现的基因修饰多肽和模板RNA编辑内源B-球蛋白基因座的活性的影响。Example 16: Evaluation of the effects of second strand targeting gRNA and silencing substitutions on the activity of gene-modified polypeptides and template RNA editing of the endogenous B-globin locus achieved in CD34+ primary human hematopoietic stem cells (HSCs).

该实例表明使用含有或不含各种第二链靶向性gRNA、示例性基因修饰多肽和模板RNA的基因修饰系统将原代人HSC中的内源B-球蛋白基因座中的氨基酸位置7处的谷氨酸密码子转化为丙氨酸，由此将非致病性序列重写至位置7。该转化包含示例性HBB5模板RNA的2个碱基对的变化(即，将核苷酸位置18、20和21处的DNA碱基胸苷、腺嘌呤和鸟嘌呤替换为碱基鸟嘌呤、胞嘧啶和腺嘌呤)。对于示例性HBB8模板RNA，转化包含将核苷酸位置18和20处的DNA碱基胸苷和腺嘌呤分别变成碱基胞嘧啶和胞嘧啶(例如使用模板RNA tg34_HBB8_hs13)或将核苷酸位置18、20、21处的DNA碱基胸苷、腺嘌呤和鸟嘌呤分别替换为碱基胞嘧啶、胞嘧啶和胞嘧啶(例如使用tg34_HBB8_hs10)。该实例表明使用包含各种靶向第二链靶向性gRNA的系统进行编辑，这些系统具有：包含沉默取代的示例性HBB5模板RNA(图20A)，或两个示例性HBB8模板RNA中的一个，其各自包含不同的沉默取代(图20B)。This example demonstrates that the glutamic acid codon at amino acid position 7 in the endogenous B-globin locus in primary human HSCs is converted to alanine using a gene modification system containing or not containing various second-strand targeting gRNAs, exemplary gene modification polypeptides, and template RNAs, thereby rewriting the non-pathogenic sequence to position 7. The conversion comprises a change of 2 base pairs of the exemplary HBB5 template RNA (i.e., DNA bases thymidine, adenine, and guanine at nucleotide positions 18, 20, and 21 are replaced with bases guanine, cytosine, and adenine). For the exemplary HBB8 template RNA, the conversion comprises DNA bases thymidine and adenine at nucleotide positions 18 and 20 are changed to bases cytosine and cytosine, respectively (e.g., using template RNA tg34_HBB8_hs13) or DNA bases thymidine, adenine, and guanine at nucleotide positions 18, 20, and 21 are replaced with bases cytosine, cytosine, and cytosine, respectively (e.g., using tg34_HBB8_hs10). This example demonstrates editing using systems containing various second-strand targeting gRNAs with: an exemplary HBB5 template RNA containing a silencing substitution ( FIG. 20A ), or one of two exemplary HBB8 template RNAs, each containing a different silencing substitution ( FIG. 20B ).

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

更特别地，模板RNA包含实例14中阐述的标记为tg14_hs1的序列或实例15中阐述的标记为tg34_HBB8hs10和tg34_HBB8hs13的序列。More specifically, the template RNA comprises the sequence labeled tg14_hs1 described in Example 14 or the sequences labeled tg34_HBB8hs10 and tg34_HBB8hs13 described in Example 15.

该系统进一步包含第二链靶向性gRNA，其包含表X1中的序列。The system further comprises a second strand targeting gRNA comprising the sequence in Table X1.

将包含上文所述的基因修饰多肽和模板RNA的基因修饰系统转染至人HSC中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，在具有或不具有2000ng第二链靶向性gRNA的情况下将3000ng编码基因修饰多肽的mRNA与3000ng模板RNA组合。将RNA混合物添加至总共20μL的Lonza P3缓冲液中的200,000个原代人HSC，并且使用程序DZ-100在16孔核转染盒中对HSC进行核转染。在核转染之后，将细胞在室温下孵育10分钟并转移至在各孔中含有为100ng/mL的500μL StemSpan-XF+SCF、为100ng/mL的Flt3-L以及为100ng/mL的TPO的24孔板，并在37℃、5％CO₂下培养3天，之后进行细胞裂解及基因组DNA提取。为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。将核苷酸位置18、20和21处的DNA碱基胸苷、腺嘌呤和鸟嘌呤替换为碱基鸟嘌呤、胞嘧啶和腺嘌呤指示成功编辑HBB5间隔子。将核苷酸位置18和20处的胸苷和腺嘌呤分别替换为碱基胞嘧啶和胞嘧啶(tg34_HBB8_hs13)或将核苷酸位置18、20、21处的DNA碱基胸苷、腺嘌呤和鸟嘌呤分别替换为碱基胞嘧啶、胞嘧啶和胞嘧啶(tg34_HBB8_hs10)指示成功编辑HBB8间隔子。The gene modification system comprising the gene modification polypeptide and template RNA described above is transfected into human HSC. The gene modification polypeptide and template RNA are delivered in RNA form by nuclear transfection. In particular, 3000ng of mRNA encoding gene modification polypeptide is combined with 3000ng template RNA with or without 2000ng second-chain targeting gRNA. The RNA mixture is added to 200,000 primary human HSCs in a total of 20 μL of Lonza P3 buffer, and HSCs are nuclear transfected in a 16-well nuclear transfection box using program DZ-100. After nuclear transfection, the cells are incubated at room temperature for 10 minutes and transferred to 24-well plates containing 500 μL StemSpan-XF+SCF of 100ng/mL, Flt3-L of 100ng/mL, and TPO of 100ng/mL in each well, and cultured for 3 days at 37°C, 5% _CO2 , followed by cell lysis and genomic DNA extraction. To analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus. The amplicons were analyzed by short read sequencing using Illumina MiSeq. DNA bases thymidine, adenine, and guanine at nucleotide positions 18, 20, and 21 were replaced with bases guanine, cytosine, and adenine, indicating successful editing of the HBB5 spacer. Thymidine and adenine at nucleotide positions 18 and 20 were replaced with bases cytosine and cytosine, respectively (tg34_HBB8_hs13) or DNA bases thymidine, adenine, and guanine at nucleotide positions 18, 20, and 21 were replaced with bases cytosine, cytosine, and cytosine, respectively (tg34_HBB8_hs10), indicating successful editing of the HBB8 spacer.

图20A显示用基因修饰系统处理的HSC中的编辑％的图，这些基因修饰系统包含示例性HBB5模板RNA tg14_hs1(包含示例性沉默取代)且具有或不具有各种第二链靶向性gRNA。结果表明针对用于将非致病性序列重写至内源B-球蛋白基因座中的临床上相关密码子中的含有沉默取代的HBB5模板RNA的第二链靶向性gRNA与模板gRNA的累加效应。Figure 20A shows a graph of % editing in HSCs treated with gene modification systems comprising exemplary HBB5 template RNA tg14_hs1 (comprising exemplary silent substitutions) with or without various second strand targeting gRNAs. The results indicate the additive effect of second strand targeting gRNAs and template gRNAs for HBB5 template RNAs containing silent substitutions used to rewrite non-pathogenic sequences into clinically relevant codons in the endogenous B-globulin locus.

图20B显示用基因修饰系统处理的HSC中的编辑％的图，这些基因修饰系统包含两个示例性HBB8模板RNA tg34_hs13或tg34_hs10中的一个(各自包含不同的示例性沉默取代)且具有或不具有各种第二链靶向性gRNA。结果进一步表明针对用于将非致病性序列重写至内源B-球蛋白基因座中的临床上相关密码子中的含有沉默取代的HBB8模板RNA的第二链靶向性gRNA与模板gRNA的累加效应。Figure 20B shows a graph of % editing in HSCs treated with gene modification systems comprising one of two exemplary HBB8 template RNAs, tg34_hs13 or tg34_hs10, each comprising a different exemplary silent substitution, and with or without various second-strand targeting gRNAs. The results further demonstrate the additive effect of second-strand targeting gRNAs and template gRNAs for HBB8 template RNAs containing silent substitutions for rewriting non-pathogenic sequences into clinically relevant codons in the endogenous B-globulin locus.

实例17：评估第二链靶向性gRNA和沉默取代对CD34+原代人造血干细胞(HSC)中实现的基因修饰多肽和模板RNA编辑内源B-球蛋白基因座的活性的影响。Example 17: Evaluation of the effects of second strand targeting gRNA and silencing substitutions on the activity of gene-modified polypeptides and template RNA editing of the endogenous B-globin locus achieved in CD34+ primary human hematopoietic stem cells (HSCs).

该实例表明使用含有或不含有第二链靶向性gRNA、示例性基因修饰多肽和各种模板RNA(一些包含沉默取代)的基因修饰系统将原代人HSC中的内源B-球蛋白基因座中的氨基酸位置7处的谷氨酸密码子转化为丙氨酸，由此表明靶向与镰状细胞病(SCD)相关的序列位置及编辑序列以将非致病性序列编码至位置7中。该转化包含示例性HBB5模板RNA的2个碱基对的变化(即，将核苷酸位置20和21的DNA碱基腺嘌呤和鸟嘌呤替换为碱基胞嘧啶和腺嘌呤)加上或减去将核苷酸位置18处的胸苷另外替换为鸟嘌呤(沉默取代)。This example demonstrates the use of a gene modification system with or without a second strand targeting gRNA, exemplary gene modifying polypeptides, and various template RNAs (some containing silent substitutions) to convert a glutamic acid codon at amino acid position 7 in the endogenous B-globin locus in primary human HSCs to alanine, thereby demonstrating targeting of a sequence position associated with sickle cell disease (SCD) and editing of the sequence to encode a non-pathogenic sequence into position 7. The conversion comprised a 2 base pair change of the exemplary HBB5 template RNA (i.e., replacement of the DNA bases adenine and guanine at nucleotide positions 20 and 21 with the bases cytosine and adenine) plus or minus an additional replacement of thymidine at nucleotide position 18 with guanine (silent substitution).

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

更特别地，模板RNA包含实例14中所阐述的标记为tg14h、tg14_hs1、tg19h或tg19_hs1的序列。More specifically, the template RNA comprises a sequence labeled tg14h, tg14_hs1, tg19h or tg19_hs1 as described in Example 14.

该系统进一步包含经设计以产生第二切口的gRNA序列，其中该gRNA具有表X1中标记为HBB5_g37的序列。The system further comprises a gRNA sequence designed to produce a second nick, wherein the gRNA has a sequence labeled HBB5_g37 in Table X1.

将包含上文所述的基因修饰多肽和模板RNA的基因修饰系统转染至人HSC中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，在具有或不具有2000ng第二链靶向性gRNA的情况下将3000ng编码基因修饰多肽的mRNA与2000ng模板RNA组合。将RNA混合物添加至总共20μL的Lonza P3缓冲液中的200,000个原代人HSC，并且使用程序DZ-100在16孔核转染盒中对细胞进行核转染。在核转染之后，将细胞在室温下孵育10分钟并转移至含有为100ng/mL的500μL StemSpan-XF+SCF、为100ng/mL的Flt3-L以及为100ng/mL的TPO的24孔板，并在37℃、5％CO₂下培养3天，之后进行细胞裂解及基因组DNA提取。为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。在内源B-球蛋白基因座内的转录起始位点下游将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤分别替换为碱基胞嘧啶和腺嘌呤(tg14h或tg19h)或将核苷酸位置18、20、21处的DNA碱基胸苷、腺嘌呤和鸟嘌呤分别替换为碱基鸟嘌呤、胞嘧啶、腺嘌呤(tg14_hs1或tg19_hs1)指示成功编辑。The gene modification system comprising the gene modification polypeptide and template RNA described above is transfected into human HSC. The gene modification polypeptide and template RNA are delivered in RNA form by nuclear transfection. In particular, 3000ng of mRNA encoding gene modification polypeptide is combined with 2000ng template RNA with or without 2000ng second-chain targeting gRNA. The RNA mixture is added to 200,000 primary human HSCs in a total of 20 μL of Lonza P3 buffer, and the cells are nuclear transfected in a 16-well nuclear transfection box using program DZ-100. After nuclear transfection, the cells are incubated at room temperature for 10 minutes and transferred to a 24-well plate containing 500 μL StemSpan-XF+SCF of 100ng/mL, Flt3-L of 100ng/mL, and TPO of 100ng/mL, and cultured for 3 days at 37°C, 5% _CO2 , followed by cell lysis and genomic DNA extraction. To analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus. Illumina MiSeq was used to analyze amplicons by short read sequencing. The DNA bases adenine and guanine at nucleotide positions 20 and 21 were replaced with bases cytosine and adenine (tg14h or tg19h) or the DNA bases thymidine, adenine and guanine at nucleotide positions 18, 20, and 21 were replaced with bases guanine, cytosine, and adenine (tg14_hs1 or tg19_hs1) downstream of the transcription start site in the endogenous B-globulin locus, indicating successful editing.

如图20C所示，当用包含示例性HBB5模板RNA tg14h或tg19h的示例性基因修饰系统处理HSC且不添加第二链靶向性gRNA时，在人HSC中检测到平均完美重写水平为1.8％和3.4％，对应于将核苷酸位置20和21处的腺嘌呤和鸟嘌呤(分别)替换为SCD密码子处的碱基胞嘧啶和腺嘌呤。在模板gRNA中包括hs1沉默取代(tg14_hs1或tg19_hs1)使完美重写增加至9.1％和6.3％。As shown in Figure 20C, when HSCs were treated with an exemplary gene modification system comprising an exemplary HBB5 template RNA tg14h or tg19h and no second strand targeting gRNA was added, average perfect rewriting levels of 1.8% and 3.4% were detected in human HSCs, corresponding to the replacement of adenine and guanine (respectively) at nucleotide positions 20 and 21 with the bases cytosine and adenine at SCD codons. Including hs1 silent substitutions (tg14_hs1 or tg19_hs1) in the template gRNA increased perfect rewriting to 9.1% and 6.3%.

添加第二链靶向性gRNA使对tg14h的平均完美重写增加至17.1％且使对tg14_hs1的平均完美重写增加至30.2％。类似地，添加第二链靶向性gRNA使对tg19h的平均完美重写为20.2％且使对tg19_hs1的平均完美重写为32.2％。Adding a second strand targeting gRNA increased the average perfect rewriting to 17.1% for tg14h and 30.2% for tg14_hs1. Similarly, adding a second strand targeting gRNA increased the average perfect rewriting to 20.2% for tg19h and 32.2% for tg19_hs1.

这些结果表明，沉默取代和第二链靶向性gRNA可单独地增加基因修饰系统的编辑活性，且进一步显示第二链靶向性gRNA和沉默取代在示例性HBB5模板RNA内的累加效应。结果显示，当在原代人HSC中使用沉默取代和第二链靶向性gRNA以将非致病性序列写入内源B-球蛋白基因座中的临床上相关密码子中时，编辑活性的累积增加超过20倍。These results indicate that silent substitutions and second-strand targeting gRNAs can individually increase the editing activity of the gene modification system, and further show the additive effects of second-strand targeting gRNAs and silent substitutions within an exemplary HBB5 template RNA. The results show that when silent substitutions and second-strand targeting gRNAs were used in primary human HSCs to write non-pathogenic sequences into clinically relevant codons in the endogenous B-globulin locus, the cumulative increase in editing activity was more than 20-fold.

实例18：评估编辑内源B-球蛋白基因座的基因修饰系统对CD34+原代人造血干细胞(HSC)中的干性标记物的影响。Example 18: Evaluation of the effects of a gene modification system that edits the endogenous B-globin locus on stemness markers in CD34+ primary human hematopoietic stem cells (HSCs).

该实例表明，使用含有示例性基因修饰多肽和模板RNA的基因修饰系统在具有或不具有第二链靶向性gRNA的情况下进行编辑以将原代人HSC中的内源B-球蛋白基因座中的氨基酸位置7处的谷氨酸密码子转化为丙胺酸并不显著影响干细胞标记物的水平及细胞标记物表征亚群的比例。This example demonstrates that editing using a gene modification system containing an exemplary gene modification polypeptide and a template RNA with or without a second strand targeting gRNA to convert a glutamic acid codon at amino acid position 7 in the endogenous B-globin locus in primary human HSCs to alanine does not significantly affect the levels of stem cell markers and the proportions of cell marker-characterized subpopulations.

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

更特别地，模板RNA包含实例5中所阐述的标记为FYF tgRNA14的核酸序列。More specifically, the template RNA comprises the nucleic acid sequence labeled FYF tgRNA14 described in Example 5.

该系统进一步包含经设计以产生第二切口的第二链靶向性gRNA序列，其中该gRNA具有表X1中标记为HBB5_g37的序列。The system further comprises a second strand targeting gRNA sequence designed to produce a second nick, wherein the gRNA has a sequence labeled HBB5_g37 in Table X1.

所测试的基因修饰多肽包含实例8中阐述的标记为RNAV209的氨基酸序列。The genetically modified polypeptide tested comprised the amino acid sequence labeled RNAV209 described in Example 8.

将包含上文所述的基因修饰多肽和模板RNA的基因修饰系统转染至人HSC中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，在具有或不具有2000ng第二链靶向性gRNA的情况下将3000ng编码基因修饰多肽的mRNA与2000ng模板RNA组合。将RNA混合物添加至总共20μL的Lonza P3缓冲液中的200,000个原代人HSC，并且使用程序DZ-100在16孔核转染盒中对细胞进行核转染。在核转染之后，将细胞在室温下孵育10分钟并转移至含有为100ng/mL的500μL StemSpan-XF+SCF、为100ng/mL的Flt3-L以及为100ng/mL的TPO的24孔板，并在37℃、5％CO₂下培养3天，之后进行细胞裂解及基因组DNA提取。为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。在内源B-球蛋白基因座内的转录起始位点下游将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤分别替换为碱基胞嘧啶和腺嘌呤指示成功编辑。为分析代表不同HSC亚群的细胞表面标记物，将细胞用经荧光标记的抗人CD90、CD133、CD34抗体染色且在核转染之后3天通过流式细胞术分析。The gene modification system comprising the gene modification polypeptide and template RNA described above is transfected into human HSC. The gene modification polypeptide and template RNA are delivered in RNA form by nuclear transfection. In particular, 3000ng of mRNA encoding gene modification polypeptide is combined with 2000ng template RNA with or without 2000ng second-chain targeting gRNA. The RNA mixture is added to 200,000 primary human HSCs in a total of 20 μL of Lonza P3 buffer, and the cells are nuclear transfected in a 16-well nuclear transfection box using program DZ-100. After nuclear transfection, the cells are incubated at room temperature for 10 minutes and transferred to a 24-well plate containing 500 μL StemSpan-XF+SCF of 100ng/mL, Flt3-L of 100ng/mL, and TPO of 100ng/mL, and cultured for 3 days at 37°C, 5% _CO2 , followed by cell lysis and genomic DNA extraction. In order to analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus. Illumina MiSeq was used to analyze the amplicon by short read sequencing. The DNA bases adenine and guanine at nucleotide positions 20 and 21 were replaced with bases cytosine and adenine, respectively, downstream of the transcription start site in the endogenous B-globulin locus to indicate successful editing. To analyze cell surface markers representing different HSC subsets, cells were stained with fluorescently labeled anti-human CD90, CD133, CD34 antibodies and analyzed by flow cytometry 3 days after nuclear transfection.

如图21A所示，当分别在不具有或具有第二链靶向性gRNA的情况下用示例性基因编辑多肽与模板指导RNA tg14组合处理人HSC时，在该HSC中检测到编辑活性水平为6.3％和34.4％，对应于将核苷酸位置20和21处的腺嘌呤和鸟嘌呤(分别)替换为SCD密码子处的碱基胞嘧啶和腺嘌呤。对造血亚群(CD34+CD133+CD90+，富集于具有长期重构潜力的HSC中的标记物的组合；CD34+CD133+CD90-，富集于早期祖细胞中的标记物的组合；CD34+CD133-，富集于定向祖细胞中的标记物的组合；CD34-，不存在富集于分化细胞中的标记物)的分布的分析揭示，在比较经示例性基因修饰系统处理的样品(在添加或不添加第二链靶向性gRNA的情况下)与经模拟物处理的对照(图21B)时，未偏离亚群比例。As shown in Figure 21A, when human HSCs were treated with an exemplary gene editing polypeptide in combination with template guide RNA tg14 without or with a second strand targeting gRNA, respectively, editing activity levels of 6.3% and 34.4% were detected in the HSCs, corresponding to the replacement of adenine and guanine (respectively) at nucleotide positions 20 and 21 with the bases cytosine and adenine at SCD codons. Analysis of the distribution of hematopoietic subpopulations (CD34+CD133+CD90+, a combination of markers enriched in HSCs with long-term remodeling potential; CD34+CD133+CD90-, a combination of markers enriched in early progenitor cells; CD34+CD133-, a combination of markers enriched in committed progenitor cells; CD34-, no markers enriched in differentiated cells) revealed that there was no deviation from the subpopulation ratio when comparing samples treated with the exemplary gene modification system (with or without the addition of a second strand targeting gRNA) with mock-treated controls (Figure 21B).

这些结果表明，将非致病性序列引入内源B-球蛋白基因座中的临床上相关密码子中的编辑不影响原代人HSC的表型，且特别地不影响指示HSC中的分化潜力的标记物。These results indicate that editing that introduces non-pathogenic sequences into clinically relevant codons in the endogenous B-globulin locus does not affect the phenotype of primary human HSCs, and in particular does not affect markers indicative of differentiation potential in HSCs.

实例19：使用用于编辑所实现的内源B-球蛋白基因座的基因修饰多肽和模板RNA评估具有长期重构能力的HSC亚群的编辑。Example 19: Evaluation of Editing of a HSC Subpopulation with Long-term Remodeling Capacity Using Genetically Modified Polypeptides and Template RNA for Editing of the Endogenous B-Globin Locus Achieved.

该实例表明，使用含有示例性基因修饰多肽和模板RNA以及第二链靶向性gRNA的基因修饰系统编辑以将原代人HSC中的内源B-球蛋白基因座中的氨基酸位置7处的谷氨酸密码子(GAG)转化成丙氨酸(GCA或GCG)有效地靶向与长期重构相关的HSC亚群以及其他亚群，由此将非致病性序列重写至具有长寿及分化潜力的干细胞的位置7。该转化包含示例性HBB5模板RNA的两个碱基对的变化(即，将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤分别替换为碱基胞嘧啶和腺嘌呤)及示例性HBB8模板RNA的一个碱基对的变化(即，将核苷酸位置20处的DNA碱基腺嘌呤替换为碱基胞嘧啶)。This example demonstrates that editing using a gene modification system containing an exemplary gene modification polypeptide and template RNA and a second strand targeting gRNA to convert a glutamic acid codon (GAG) at amino acid position 7 in the endogenous B-globin locus in primary human HSCs to alanine (GCA or GCG) effectively targets a subpopulation of HSCs associated with long-term remodeling, as well as other subpopulations, thereby rewriting a non-pathogenic sequence to position 7 of stem cells with longevity and differentiation potential. The conversion comprises a two-base pair change of the exemplary HBB5 template RNA (i.e., replacing the DNA bases adenine and guanine at nucleotide positions 20 and 21 with the bases cytosine and adenine, respectively) and a one-base pair change of the exemplary HBB8 template RNA (i.e., replacing the DNA base adenine at nucleotide position 20 with the base cytosine).

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

模板RNA包含实例5中阐述的分别针对HBB5模板RNA标记为FYFtgRNA14或针对HBB8模板RNA标记为tgRNA34的序列。The template RNA comprises the sequence described in Example 5, which is labeled as FYFtgRNA14 for the HBB5 template RNA or as tgRNA34 for the HBB8 template RNA.

该系统进一步包含第二链靶向性gRNA，其包含表X1中列出为HBB5_g37和HBB8_256fw的序列。The system further comprises a second strand targeting gRNA comprising the sequences listed as HBB5_g37 and HBB8_256fw in Table X1.

将包含上文所述的基因修饰多肽和模板RNA的基因修饰系统转染至人HSC中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，在具有或不具有2000ng第二链靶向性gRNA的情况下将3000ng编码基因修饰多肽的mRNA与2000ng模板RNA组合。将RNA混合物添加至总共20μL的Lonza P3缓冲液中的200,000个原代人HSC，并且使用程序DZ-100在16孔核转染盒中对细胞进行核转染。在核转染之后，将细胞在室温下孵育10分钟并转移至含有为100ng/mL的500μL StemSpan-XF+SCF、为100ng/mL的Flt3-L以及为100ng/mL的TPO的24孔板，并在37℃、5％CO₂下培养。核转染之后3天，将细胞用经荧光标记的抗人CD90、CD133、CD34抗体染色，并使CD34+CD133+CD90+和CD34+CD133+CD90部分经FACS分选，并进行细胞裂解及基因组DNA提取。为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。在内源B-球蛋白基因座内的转录起始位点下游将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤替换为碱基胞嘧啶和腺嘌呤(HBB5模板RNA)或将核苷酸位置20处的DNA碱基腺嘌呤替换为碱基胞嘧啶(HBB8模板RNA)指示成功编辑。The gene modification system comprising the gene modification polypeptide and template RNA described above is transfected into human HSC. The gene modification polypeptide and template RNA are delivered in RNA form by nuclear transfection. In particular, 3000ng of mRNA encoding gene modification polypeptide is combined with 2000ng template RNA with or without 2000ng second-chain targeting gRNA. The RNA mixture is added to 200,000 primary human HSCs in a total of 20 μL of Lonza P3 buffer, and the cells are nuclear transfected in a 16-well nuclear transfection box using program DZ-100. After nuclear transfection, the cells are incubated at room temperature for 10 minutes and transferred to a 24-well plate containing 500 μL StemSpan-XF+SCF of 100ng/mL, Flt3-L of 100ng/mL, and TPO of 100ng/mL, and cultured at 37°C, 5% _CO2 . Three days after nuclear transfection, cells were stained with fluorescently labeled anti-human CD90, CD133, CD34 antibodies, and CD34+CD133+CD90+ and CD34+CD133+CD90 parts were sorted by FACS, and cell lysis and genomic DNA extraction were performed. In order to analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus. Illumina MiSeq was used to analyze amplicons by short read sequencing. The DNA bases adenine and guanine at nucleotide positions 20 and 21 were replaced with bases cytosine and adenine (HBB5 template RNA) or the DNA base adenine at nucleotide position 20 was replaced with base cytosine (HBB8 template RNA) downstream of the transcription start site in the endogenous B-globulin locus to indicate successful editing.

如图22A所示，在分别用包含HBB5模板RNA或HBB8模板RNA的基因修饰系统处理之后，在CD34+CD133+CD90+HSC亚群中检测到19.3％和29.8％的编辑活性水平。CD34+CD133+CD90+细胞富集于具有长期重构潜力的HSC中。在分别用包含HBB5模板RNA和HBB8模板RNA的相同示例性基因修饰系统处理的所有其余HSC群体(不为CD34+CD133+CD90+)中检测到23.73％和31.5％的编辑活性水平。使用示例性HBB5模板RNA tg14_hs1(表X1)及第二链靶向性gRNA重复实验(图22B)，结果显示CD34+CD133+90+HSC富集部分中的编辑活性为56％且CD34+90-祖细胞富集部分中的编辑活性为52.9％。该结果显示，添加沉默取代至模板RNA(比较图22B中的tg14_hs1与图22A中的FYF tgRNA14)会显著增加基因修饰系统在用于长期原代人HSC中时的编辑活性。As shown in Figure 22A, after treatment with a gene modification system comprising HBB5 template RNA or HBB8 template RNA, 19.3% and 29.8% editing activity levels were detected in the CD34+CD133+CD90+HSC subpopulation. CD34+CD133+CD90+ cells are enriched in HSCs with long-term reconstruction potential. 23.73% and 31.5% editing activity levels were detected in all remaining HSC populations (not CD34+CD133+CD90+) treated with the same exemplary gene modification system comprising HBB5 template RNA and HBB8 template RNA, respectively. The experiment was repeated using exemplary HBB5 template RNA tg14_hs1 (Table X1) and second-chain targeting gRNA (Figure 22B), and the results showed that the editing activity in the CD34+CD133+90+HSC enriched portion was 56% and the editing activity in the CD34+90-progenitor enriched portion was 52.9%. The results show that adding silencing substitutions to the template RNA (compare tg14_hs1 in FIG. 22B with FYF tgRNA14 in FIG. 22A ) significantly increases the editing activity of the gene modification system when used in long-term primary human HSCs.

这些结果表明，示例性基因修饰系统的编辑活性可将非致病性序列写入表型上长期原代人HSC中的内源B-球蛋白基因座中的临床上相关密码子中。结果进一步表明，表型上长期原代人HSC中的编辑活性水平与在其余HSC群体中达到的水平相当。结果进一步表明长期及祖细胞HSC中的高编辑水平(大于50％)。These results indicate that the editing activity of the exemplary gene modification system can write non-pathogenic sequences into clinically relevant codons in the endogenous B-globin locus in phenotypically long-term primary human HSCs. The results further indicate that the level of editing activity in phenotypically long-term primary human HSCs is comparable to that achieved in the rest of the HSC population. The results further indicate high editing levels (greater than 50%) in long-term and progenitor HSCs.

实例20：评估使用基因编辑多肽和模板RNA重写CD34+原代人造血干细胞(HSC)的内源B-球蛋白基因座对分化能力的影响。Example 20: Evaluation of the effect of rewriting the endogenous B-globin locus of CD34+ primary human hematopoietic stem cells (HSCs) on differentiation capacity using gene-editing polypeptides and template RNA.

该实例表明，使用含有示例性基因修饰多肽和模板RNA的基因修饰系统在具有或不具有第二链靶向性gRNA的情况下进行编辑以将原代人HSC中的内源B-球蛋白基因座中的氨基酸位置7处的谷氨酸密码子(GAG)转化为丙氨酸(GCA或GCG)(由此将非致病性序列重写至位置7中)不会显著改变人HSC的分化能力。该转化包含示例性HBB5模板RNA的两个碱基对的变化(即，将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤分别替换为碱基胞嘧啶和腺嘌呤)。This example demonstrates that editing using a gene modification system containing an exemplary gene modification polypeptide and a template RNA with or without a second strand targeting gRNA to convert a glutamic acid codon (GAG) at amino acid position 7 in the endogenous B-globin locus in primary human HSCs to alanine (GCA or GCG) (thereby rewriting a non-pathogenic sequence into position 7) does not significantly alter the differentiation capacity of human HSCs. The conversion comprises a change of two base pairs of the exemplary HBB5 template RNA (i.e., replacing the DNA bases adenine and guanine at nucleotide positions 20 and 21 with the bases cytosine and adenine, respectively).

在本实例中，模板RNA含有：In this example, the template RNA contains:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

模板RNA包含实例5中阐述的针对HBB5模板RNA标记为FYFtgRNA14的序列。The template RNA comprises the sequence labeled FYFtgRNA14 for the HBB5 template RNA described in Example 5.

该系统进一步包含第二链靶向性gRNA，其包含表X1中列出为HBB5_g27的序列。The system further comprises a second strand targeting gRNA comprising the sequence listed as HBB5_g27 in Table X1.

将包含上文所述的基因修饰多肽和模板RNA的基因修饰系统转染至人HSC中。基因修饰多肽和模板RNA通过核转染以RNA形式递送。特别地，在具有或不具有2000ng第二链靶向性gRNA的情况下将3000ng编码基因修饰多肽RNA的mRNA与2000ng模板RNA组合。将RNA混合物添加至总共20μL的Lonza P3缓冲液中的200,000个原代人HSC，并且使用程序DZ-100在16孔核转染盒中对细胞进行核转染。在核转染之后，将细胞在室温下孵育10分钟并转移至含有为100ng/mL的500μL StemSpan-XF+SCF、为100ng/mL的Flt3-L以及为100ng/mL的TPO的24孔板，并在37℃、5％CO₂下培养。核转染之后2天，将细胞在半固体Methcult培养基中培养用于集落形成测定。The gene modification system comprising the gene modification polypeptide and template RNA described above is transfected into human HSC. The gene modification polypeptide and template RNA are delivered in RNA form by nuclear transfection. In particular, 3000ng of mRNA encoding gene modification polypeptide RNA is combined with 2000ng template RNA with or without 2000ng second-chain targeting gRNA. The RNA mixture is added to 200,000 primary human HSCs in a total of 20 μL of Lonza P3 buffer, and the cells are nuclear transfected in a 16-well nuclear transfection box using program DZ-100. After nuclear transfection, the cells are incubated at room temperature for 10 minutes and transferred to a 24-well plate containing 500 μL StemSpan-XF+SCF of 100ng/mL, Flt3-L of 100ng/mL, and TPO of 100ng/mL, and cultured at 37°C, 5% _CO2 . Two days after nucleofection, cells were cultured in semisolid Methcult medium for colony formation assays.

为了分析基因编辑活性，使用靶插入位点基因座侧翼的引物跨基因座进行扩增。使用Illumina MiSeq通过短读段测序分析扩增子。在内源B-球蛋白基因座内的转录起始位点下游将核苷酸位置20和21处的DNA碱基腺嘌呤和鸟嘌呤替换为碱基胞嘧啶和腺嘌呤(HBB5模板RNA)指示成功编辑。To analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus. The amplicons were analyzed by short read sequencing using Illumina MiSeq. Successful editing was indicated by replacing the DNA bases adenine and guanine at nucleotide positions 20 and 21 with the bases cytosine and adenine (HBB5 template RNA) downstream of the transcription start site within the endogenous B-globulin locus.

如图23A所示，在用具有或不具有第二链靶向性gRNA的示例性基因修饰系统处理从3个不同供体获得的HSC之后的总集落CFU数目与在HSC接受模拟处理时的总集落CFU数目相当。这些结果表明，用示例性基因修饰系统处理并未显著降低经处理HSC的活力。如图23B所示，在甲基纤维素中克隆生长14天之后由用示例性基因修饰系统转染的CD34+细胞产生的CFU-E、BFU-E、CFU-M、CFU-GM及CFU-G的数目与当CD34+细胞接受模拟处理时的对应CFU数目相当。图23C显示用示例性基因修饰系统处理的HSC开始体外分化之后的去核CD235+细胞的百分比的图。结果显示，用示例性基因修饰系统处理的HSC以与经模拟处理的HSC类似的速率产生类似百分比的红细胞样细胞。As shown in Figure 23A, the total colony CFU number after HSC obtained from 3 different donors is treated with an exemplary gene modification system with or without a second-strand targeting gRNA is comparable to the total colony CFU number when HSC is subjected to simulation treatment. These results show that the viability of treated HSC is not significantly reduced by treatment with an exemplary gene modification system. As shown in Figure 23B, the number of CFU-E, BFU-E, CFU-M, CFU-GM and CFU-G produced by CD34+ cells transfected with an exemplary gene modification system after 14 days of clonal growth in methylcellulose is comparable to the corresponding CFU number when CD34+ cells are subjected to simulation treatment. Figure 23C shows a graph of the percentage of enucleated CD235+ cells after HSC treated with an exemplary gene modification system begins in vitro differentiation. The results show that HSC treated with an exemplary gene modification system produces similar percentages of erythroid cells at a rate similar to that of simulated HSC.

这些结果显示，使用本文所述的示例性基因修饰系统将非致病性序列编辑至内源B-球蛋白基因座中的临床上相关密码子中对人HSC的分化能力没有显著影响。These results show that editing of non-pathogenic sequences into clinically relevant codons in the endogenous B-globulin locus using the exemplary gene modification system described herein has no significant effect on the differentiation capacity of human HSCs.

实例21：筛选纠正具有SCD突变的人CD34+细胞中的SCD突变的模板RNA的构型Example 21: Screening for the configuration of template RNA that corrects SCD mutations in human CD34+ cells harboring SCD mutations

该实例描述了使用示例性基因修饰系统来鉴定用于纠正SCD突变的有利构型，该基因修饰系统含有基因修饰多肽和模板RNA，该模板RNA包含不同长度的异源对象序列和PBS序列。在本实例中，模板RNA含有：This example describes the use of an exemplary gene modification system to identify favorable configurations for correcting SCD mutations, the gene modification system comprising a gene modification polypeptide and a template RNA comprising heterologous subject sequences and PBS sequences of varying lengths. In this example, the template RNA comprises:

(1)gRNA间隔子；(1) gRNA spacer;

(2)gRNA支架；(2) gRNA scaffold;

(3)异源对象序列；以及(3) heterogeneous object sequences; and

(4)引物结合位点(PBS)序列。(4) Primer binding site (PBS) sequence.

模板RNA设计成含有8-17个核苷酸PBS序列及9-20个核苷酸异源对象序列(表X4)。具有两个不同gRNA示例性间隔子序列HBB5和HBB8的模板RNA用于靶向CD34+SCD人细胞中的SCD突变。异源对象序列和PBS序列经设计以通过使用本文所述的基因修饰系统在突变位点处用“A”核苷酸(野生型)或用“C”(Makassar安装)替换“T”核苷酸来纠正SCD突变。模板RNA也经设计以产生以下中的任一者或两者：1)PAM-杀灭突变或2)一个或多个沉默取代。Template RNAs were designed to contain 8-17 nucleotide PBS sequences and 9-20 nucleotide heterologous subject sequences (Table X4). Template RNAs with two different gRNA exemplary spacer sequences, HBB5 and HBB8, were used to target SCD mutations in CD34+SCD human cells. The heterologous subject and PBS sequences were designed to correct SCD mutations by replacing "T" nucleotides with "A" nucleotides (wild type) or with "C" (Makassar installation) at the mutation site using the gene modification system described herein. Template RNAs were also designed to generate either or both of the following: 1) PAM-killing mutations or 2) one or more silent substitutions.

包含编码基因修饰多肽的mRNA及来自表X4的模板RNA且具有或不具有第二链靶向性gRNA(例如来自表X1)的示例性基因修饰系统用于转染含有SCD突变的人HSC。使用基因修饰系统以通过用原代人HSC中的内源B-球蛋白基因座中的突变位点处的“A”(野生型)或“C”(Makassar)核苷酸替换“T”核苷酸来纠正SCD突变。扩增子测序将用于显示原代人HSC中的内源B-球蛋白基因座中的突变位点处的编辑。An exemplary gene modification system comprising an mRNA encoding a gene-modified polypeptide and a template RNA from Table X4 with or without a second strand targeting gRNA (e.g., from Table X1) is used to transfect human HSCs containing SCD mutations. The gene modification system is used to correct SCD mutations by replacing "T" nucleotides with "A" (wild type) or "C" (Makassar) nucleotides at the mutation site in the endogenous B-globin locus in primary human HSCs. Amplicon sequencing will be used to show editing at the mutation site in the endogenous B-globin locus in primary human HSCs.

结果将表明，当纠正原代人HSC中的内源B-球蛋白基因座中的SCD突变时，示例性基因修饰系统具有编辑活性。The results will demonstrate that the exemplary gene modification system has editing activity when correcting SCD mutations in the endogenous B-globin locus in primary human HSCs.

本文中，当认为RNA序列(例如，模板RNA序列)包含含有胸腺嘧啶(T)的特定序列(例如，表A或表B的序列或其一部分)时，当然应理解，RNA序列可以(并且确实经常)包含尿嘧啶(U)来代替T。例如，RNA序列在表A或表B中的序列中显示为T的每个位置处都可以包含U。更特别地，本披露提供根据表A和表B中所示的每个模板序列的RNA序列，其中该RNA序列具有U代替表A和表B的序列中的每个T。Herein, when an RNA sequence (e.g., a template RNA sequence) is considered to include a specific sequence containing thymine (T) (e.g., a sequence of Table A or Table B or a portion thereof), it is of course understood that the RNA sequence can (and does often) include uracil (U) in place of T. For example, the RNA sequence can include U at each position shown as T in the sequence in Table A or Table B. More specifically, the present disclosure provides an RNA sequence according to each template sequence shown in Table A and Table B, wherein the RNA sequence has U in place of each T in the sequence of Table A and Table B.

应该理解，对于本申请中描述某个参数的所有数值界限，诸如“约”、“至少”、“小于”和“大于”而言，该描述还必然涵盖以列举的值为界限的任何范围。因此，例如，描述“至少1、2、3、4或5”还特别地描述了范围1-2、1-3、1-4、1-5、2-3、2-4、2-5、3-4、3-5和4-5等。It should be understood that for all numerical limits describing a parameter in this application, such as "about," "at least," "less than," and "greater than," the description also necessarily encompasses any range bounded by the recited values. Thus, for example, the description "at least 1, 2, 3, 4, or 5" also specifically describes ranges of 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, and 4-5, etc.

对于本文引用的所有专利、申请或其他参考文献，例如非专利文献和参考序列信息，应当理解，出于所有目的以及对于所陈述的主张，将它们通过援引以其全文并入本文。如果通过援引并入的文件与本申请之间存在任何冲突，则以本申请为准。与本申请中披露的参考基因序列相关的所有信息，例如GeneID或登录号(通常参考NCBI登录号)，包括例如基因组基因座、基因组序列、功能注释、等位基因变体和参考mRNA(包括，例如，外显子边界或响应元件)和蛋白质序列(例如保守结构域结构)，以及化学参考(例如PubChem化合物、PubChem物质或PubChem生物测定条目，包括其中的注释，例如结构和测定等)，通过援引以其全文并入本文。For all patents, applications or other references cited herein, such as non-patent literature and reference sequence information, it should be understood that for all purposes and for stated claims, they are incorporated herein by reference in their entirety. If there is any conflict between the document incorporated by reference and the present application, the present application shall prevail. All information related to the reference gene sequence disclosed in the present application, such as GeneID or accession number (usually with reference to NCBI accession number), including, for example, genomic loci, genomic sequences, functional annotations, allelic variants and reference mRNA (including, for example, exon boundaries or response elements) and protein sequences (such as conserved domain structures), and chemical references (such as PubChem compounds, PubChem substances or PubChem bioassay entries, including annotations therein, such as structures and determinations, etc.), are incorporated herein by reference in their entirety.

本申请中使用的标题仅为方便起见并且不影响对本申请的解释。The headings used in this application are for convenience only and do not affect the interpretation of this application.

Claims

1. A template RNA comprising, from 5 'to 3':

(i) A gRNA spacer complementary to a first portion of a human HBB gene, wherein the gRNA spacer has a sequence comprising a core nucleotide of a gRNA spacer sequence of table 1, and optionally comprises one or more consecutive nucleotides starting from the 3' end of a flanking nucleotide of the gRNA spacer, or wherein the gRNA spacer has a sequence of a spacer selected from table a, table AA, table B1, tables 5A-5D, table X4, or table X4A;

(ii) A gRNA scaffold that binds to a genetically modified polypeptide (e.g., binds to a Cas domain of the genetically modified polypeptide),

(Iii) A heterologous subject sequence comprising a mutation region for introducing a mutation into a second portion of the human HBB gene (e.g., correcting a mutation therein) (wherein, optionally, the heterologous subject sequence comprises from 5 'to 3' a post-editing homologous region, a mutation region, and a pre-editing homologous region), and

(Iv) A Primer Binding Site (PBS) sequence comprising at least 5, 6, 7 or 8 bases having 100% identity to a third portion of the human HBB gene.

2. The template RNA of claim 1, wherein the heterologous subject sequence comprises a core nucleotide of the RT template sequence in table 3, and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence, or wherein the heterologous subject sequence comprises a sequence of the RT template sequence in table a, table AA, table B1, tables 5A-5D, table X4, or table X4A.

3. The template RNA of claim 1, wherein the heterologous subject sequence comprises a core nucleotide of the RT template sequence in table 3 corresponding to the gRNA spacer sequence, and optionally comprises one or more consecutive nucleotides starting from the 3' end of flanking nucleotides of the RT template sequence, or wherein the heterologous subject sequence comprises a sequence of the RT template sequence in table a, table AA, table B1, tables 5A-5D, table X4, or table X4A corresponding to the gRNA spacer sequence.

4. A template RNA comprising, from 5 'to 3':

(i) A gRNA spacer complementary to a first part of the human HBB gene,

(Iii) A heterologous subject sequence comprising a mutation region for introducing a mutation into a second portion of the human HBB gene, wherein the heterologous subject sequence comprises a core nucleotide of the RT template sequence in table 3 and optionally comprises one or more contiguous nucleotides starting from the 3' end of the flanking nucleotides of the RT template sequence, or wherein the heterologous subject sequence comprises an RT template sequence in table a, table AA, table B1, tables 5A-5D, table X4 or table X4A; and

(Iv) A PBS sequence comprising at least 5, 6, 7 or 8 bases having 100% identity to a third portion of the human HBB gene.

5. The template RNA of claim 4, wherein the gRNA spacer comprises a core nucleotide of a gRNA spacer sequence in table 1, and optionally comprises one or more consecutive nucleotides starting from the 3' end of a flanking nucleotide of the gRNA spacer sequence, or wherein the gRNA spacer comprises a gRNA spacer sequence in table a, table AA, table B1, tables 5A-5D, table X4, or table X4A.

6. The template RNA of any one of claims 1-5, wherein the gRNA spacer comprises CATGGTGCATCTGACTCCTG (SEQ ID NO: 21668) or CATGGTGCACCTGACTCCTG (SEQ ID NO: 19249).

7. The template RNA of any one of claims 1-5, wherein the gRNA spacer comprises GTAACGGCAGACTTCTCCAC (SEQ ID NO: 19971).

8. The template RNA of claim 4, wherein the heterologous subject sequence comprises a core nucleotide corresponding to a gRNA spacer sequence in table 1 of the RT template sequence, and optionally comprises one or more consecutive nucleotides starting from the 3' end of flanking nucleotides of the gRNA spacer sequence, or wherein the heterologous subject sequence comprises nucleotides corresponding to a gRNA spacer sequence in table a, table AA, table B1, tables 5A-5D, table X4, or table X4A of the RT template sequence.

9. The template RNA of any one of claims 1-8, wherein the PBS sequence has a sequence comprising core nucleotides from the same row of PBS sequence as the RT template sequence in table 3, and optionally comprising one or more consecutive nucleotides starting from the 5' end of the flanking nucleotides of the PBS sequence.

10. The template RNA of any one of claims 1-8, wherein the PBS sequence has a sequence comprising a core nucleotide corresponding to the PBS sequence in table 3 of the RT template sequence, the gRNA spacer sequence, or both, and optionally comprises one or more consecutive nucleotides starting at the 5' end of flanking nucleotides of the PBS sequence, or wherein the PBS sequence has a sequence comprising a core nucleotide corresponding to the PBS sequence in table a, table AA, table B1, table 5A-5D, table X4, or table X4A of the RT template sequence, the gRNA spacer sequence, or both.

11. The template RNA of any one of claims 1-10, wherein the gRNA scaffold comprises the sequence of the gRNA scaffold in table 12, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

12. The template RNA of any one of claims 1-10, wherein the gRNA scaffold comprises a sequence corresponding to the RT template sequence, the gRNA spacer sequence, or a gRNA scaffold in table 12 of both, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

13. The template RNA of any one of claims 1-12, wherein the mutation is a V6E mutation (e.g., correction of a pathogenic E6V mutation) of the HBB gene.

14. The template RNA of any one of claims 1-13, wherein the pre-edit sequence length comprises about 1 nucleotide to about 35 nucleotides (e.g., comprises about 1-5, 5-10, 10-15, 15-20, 20-25, 25-30, or 30-35 nucleotides).

15. The template RNA of any one of claims 1-14, wherein the mutation region comprises a single nucleotide.

16. The template RNA of any one of claims 1-14, wherein the mutation region is at least two nucleotides in length.

17. The template RNA of any one of claims 1-14 or 16, wherein the mutation region is up to 32 (e.g., up to 5, 10, 15, 20, 25, 30, or 32) nucleotides in length and comprises one, two, or three sequence differences relative to the second portion of the human HBB gene.

18. The template RNA of any one of claims 1-14, 16 or 17, wherein the mutant region comprises two sequence differences relative to the second portion of the human HBB gene.

19. The template RNA of any one of claims 1-14 or 16-18, wherein the mutant region comprises a first region (e.g., a first nucleotide) designed to correct a pathogenic mutation in the HBB gene and a second region (e.g., a second nucleotide) designed to inactivate a PAM sequence (e.g., a "PAM-kill" mutation exemplified in table a, AA, B, or B1).

20. The template RNA of any one of claims 1-19, wherein the heterologous subject sequence has a sequence from the same row of RT template sequences as the gRNA spacer sequence in table a or B, or has 1,2, or 3 substitutions relative thereto, wherein optionally the bolded T shown in the RT template sequence in table a is replaced with G (e.g., a sequence without PAM-killing mutations), or wherein further optionally the bolded C shown in the RT template in table B is replaced with T or U (e.g., a sequence without SNP that is present in HEK293T cells but not present in hg38 human reference genome).

21. The template RNA of any one of claims 1-20, wherein the mutation region comprises less than 80%, 70%, 60%, 50%, 40% or 30% identity to the corresponding portion of the human HBB gene.

22. The template RNA of any one of claims 1-21, wherein the template RNA comprises one or more silent mutations (e.g., silent substitutions), e.g., as exemplified in table 7A, X4 or X4A.

23. The template RNA of example 22, wherein the one or more silent mutations comprises a silent substitution at the codon of the 6 th amino acid (proline) of the incorporated initial methionine encoding the HBB gene, e.g., a substitution of CCC or CCG.

24. The template RNA of any one of the preceding claims, wherein the mutant region comprises a first region designed to correct a pathogenic mutation in the HBB gene and a second region designed to introduce a silencing substitution.

25. The template RNA of any one of claims 1-24, comprising one or more chemically modified nucleotides.

26. A genetic modification system, comprising:

The template RNA of any one of claims 1-25, and

A genetically modified polypeptide or a nucleic acid (e.g., RNA) encoding the genetically modified polypeptide.

27. The genetic modification system of claim 26, wherein the genetic modification polypeptide comprises:

A Reverse Transcriptase (RT) domain (e.g., an RT domain from a retrovirus, or a polypeptide domain having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% amino acid sequence identity thereto); and

A Cas domain (e.g., cas9 domain) that binds to the target DNA molecule and is heterologous to the RT domain; and

Optionally, a linker disposed between the RT domain and the Cas domain.

28. The genetic modification system of claim 27, wherein the RT domain comprises:

(a) RT domain of table 6; or (b)

(B) RT domains from: murine Leukemia Virus (MMLV), porcine Endogenous Retrovirus (PERV), avian reticuloendotheliosis virus (AVIRE), feline Leukemia Virus (FLV), simian Foamy Virus (SFV) (e.g., SFV 3L), bovine Leukemia Virus (BLV), mersen-fei-henhouse monkey virus (MPMV), human Foamy Virus (HFV), or bovine foamy/syncytial virus (BFV/BSV).

29. The genetic modification system of claim 27 or 28, wherein the Cas domain comprises a Cas domain of table 7 or table 8.

30. The genetic modification system of claim 27 or 28, wherein the Cas domain:

(a) Is a Cas9 domain;

(b) Is a SpCas9 domain, blatCas domain, nme2Cas9 domain, pnpCas9 domain, sauCas domain, sauCas9-KKH domain, sauriCas9 domain, sauriCas9-KKH domain, scaCas9-Sc++ domain, spyCas9-NG domain, spyCas9-SpRY domain, or St1Cas9 domain; and/or

(C) Is a Cas9 domain comprising an N670A mutation, an N611A mutation, an N605A mutation, an N580A mutation, an N588A mutation, an N872A mutation, an N863 mutation, an N622A mutation, or an H840A mutation.

31. The genetic modification system of claim 30, wherein the Cas9 domain binds to a PAM sequence listed in table 7 or table 12.

32. The genetic modification system of claim 31, wherein the second portion of the human HBB gene overlaps with PAM recognized by the Cas domain, e.g., wherein the second portion of the human HBB gene is within the PAM or wherein the PAM is within the second portion of the human HBB gene.

33. The genetic modification system of any one of claims 26-32, wherein the gRNA spacer is a gRNA spacer according to table 1, and the Cas domain comprises or has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity to a Cas domain listed in the same row of table 1.

34. The genetic modification system of any one of claims 26-32, wherein the template RNA comprises the sequence of the template RNA sequence in table 3, table a, table AA, table B1, table 5A-5D, table X4, or table X4A, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

35. The genetic modification system of any one of claims 26-32, wherein:

(a) The template RNA comprises the sequences of the template RNA sequences of Table 3, table A, table AA, table B1, tables 5A-5D, table X4 or Table X4A;

(b) The Cas domain comprises a Cas domain of table 7 or table 8;

(c) The linker comprises the linker sequence of Table 10 (e.g., the linker sequence of any one of SEQ ID NOS: 5217, 5106, 5190 and 5218); and

(D) The genetically modified polypeptide comprises one or two NLS sequences from Table 11 (e.g., NLS sequences of any one of SEQ ID NOs: 5245, 5290, 5323, 5330, 5349, 5350, 5351 and 4001).

36. The genetic modification system of any one of claims 26-35, which creates a first nick in the first strand of the human HBB gene.

37. The genetic modification system of claim 36, further comprising a second strand targeting gRNA that directs a second nick to a second strand of the human HBB gene.

38. The genetic modification system of claim 37, wherein the second-strand targeting gRNA comprises:

(i) A sequence comprising a core nucleotide of a left or right gRNA spacer sequence from table 2, and optionally comprising one or more consecutive nucleotides starting from the 3' end of a flanking nucleotide of the left or right gRNA spacer sequence; or (b)

(Ii) A second strand targeting gRNA comprising the spacer sequence of table 6A or having 1,2, or 3 substitutions of spacer sequences relative thereto.

39. The genetic modification system of claim 37, wherein the second strand targeting gRNA comprises a sequence comprising a core nucleotide of the left or right gRNA spacer sequence from table 2 that corresponds to the gRNA spacer sequence of (i), and optionally comprises one or more consecutive nucleotides starting from the 3' end of the flanking nucleotides of the left or right gRNA spacer sequence.

40. The genetic modification system of claim 37, wherein the second-strand targeting gRNA comprises:

(i) A sequence comprising a core nucleotide of a second nicked gRNA sequence from table 4, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, and optionally comprising one or more contiguous nucleotides starting from the 3' end of the flanking nucleotide of the second nicked gRNA sequence; or (b)

(Ii) A second strand targeting gRNA comprising a spacer sequence from table 6A or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

41. The genetic modification system of claim 37, wherein the second strand targeting gRNA comprises a sequence that comprises a core nucleotide of a second nicked gRNA sequence from table 4 that corresponds to the gRNA spacer sequence of (i), or a sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical thereto, and optionally comprises one or more contiguous nucleotides starting from the 3' end of a flanking nucleotide of the second nicked gRNA sequence.

42. The genetic modification system of any one of claims 37-41, wherein the second strand targeting gRNA has a "PAM-in orientation" with a template RNA of the genetic modification system, e.g., as exemplified in table 4, 6A, X4, or X4A.

43. The genetic modification system of any one of claims 37-42, wherein the second strand targeting gRNA targets a sequence that overlaps with a target mutation of the template RNA.

44. The genetic modification system of claim 43, wherein the second strand targeting gRNA comprises:

(i) Sequences complementary to sickle cell mutations (e.g., spacer sequences);

(ii) A sequence complementary to a wild-type sequence at a sickle cell locus (e.g., a spacer sequence);

(iii) A sequence complementary to Makassar sequences at a sickle cell locus (e.g., a spacer sequence);

(iv) A sequence (e.g., a spacer sequence) complementary to a SNP near the sickle cell locus, e.g., a SNP contained in genomic DNA of a subject (e.g., patient);

(v) Complementary to or comprising one or more silent substitutions near the sickle cell locus (e.g., a spacer sequence).

45. The template RNA or gene modification system of any one of the preceding claims, wherein the gRNA spacer comprises about 1,2, 3 or more flanking nucleotides of the gRNA spacer.

46. The template RNA or gene modification system of any one of the preceding claims, wherein the heterologous subject sequence comprises about 2, 3, 4, 5, 10, 20, 30, 40 or more flanking nucleotides of the RT template sequence.

47. The template RNA or gene modification system of any one of the preceding claims, wherein the heterologous subject sequence comprises about 8-30, 9-25, 10-20, 11-16, or 12-15 (e.g., about 11-16) nucleotides.

48. The template RNA or gene modification system of any one of the preceding claims, wherein the mutation region comprises a sequence difference of 1,2 or 3 nucleotide positions relative to the corresponding portion of the human HBB gene.

49. The template RNA or gene modification system of any one of the preceding claims, wherein the mutation region comprises a sequence difference of at least 2 nucleotide positions relative to the corresponding portion of the human HBB gene.

50. The template RNA or gene modification system of any one of the preceding claims, wherein the post-editing and/or pre-editing homology region comprises 100% identity to the HBB gene.

51. The template RNA or gene modification system of any one of the preceding claims, wherein the PBS sequence further comprises about 1,2,3,4, 5, 6, 7 or more flanking nucleotides.

52. The template RNA or gene modification system of any one of the preceding claims, wherein the PBS sequence comprises about 5-20, 8-16, 8-14, 8-13, 9-12, or 10-12 (e.g., about 9-12) nucleotides.

53. The template RNA or gene modification system of any one of the preceding claims, wherein the PBS sequence binds within 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides of the HBB gene nick site.

54. The genetic modification system of any one of the preceding claims, wherein the domains of the genetically modified polypeptide are linked by a peptide linker.

55. The genetic modification system of claim 54, wherein the linker comprises the linker sequence of Table 10 (e.g., the linker sequence of any one of SEQ ID NOS: 5217, 5106, 5190 and 5218).

56. The genetic modification system of any one of the preceding claims, wherein the genetically modified polypeptide further comprises one or more Nuclear Localization Sequences (NLS).

57. The genetic modification system of claim 56, wherein the genetically modified polypeptide comprises a first NLS and a second NLS.

58. The genetic modification system of claim 56 or 57, wherein the NLS comprises the NLS sequence of table 11 (e.g., the sequence of any one of SEQ ID NOs: 5245, 5290, 5323, 5330, 5349, 5350, 5351 and 4001).

59. A template RNA comprising the sequence of the template RNA in table 4, table a, table AA, table B1, tables 5A-5D, table X4, or table X4A, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.

60. A template RNA comprising the sequence of the template RNA in table 4, table a, table AA, table B1, tables 5A-5D, table X4 or table X4A.

61. A genetic modification system, comprising:

(iii) A template RNA comprising the sequence of the template RNA in table 4, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto; and

(Iv) The second nicked gRNA sequence from table 4 in the same row as (i) has a sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical thereto.

62. A genetic modification system, comprising:

(iii) A template RNA comprising the sequence of the template RNA of table 4; and

(Iv) The second nicked gRNA sequence from table 4 in the same row as (i).

63. A DNA encoding the template RNA of any one of claims 1-25, 46-52, 59, or 60, or the genetic modification system of any one of claims 26-58, 60, or 61.

64. A pharmaceutical composition comprising the system of any one of claims 26-58, 60 or 61, or one or more nucleic acids encoding the system, and a pharmaceutically acceptable excipient or carrier.

65. The pharmaceutical composition of claim 64, wherein the pharmaceutically acceptable excipient or carrier is selected from the group consisting of: plasmid vectors, viral vectors, vesicles and lipid nanoparticles.

66. The pharmaceutical composition of claim 65, wherein the viral vector is an adeno-associated virus.

67. A host cell (e.g. a mammalian cell, e.g. a human cell) comprising the template RNA or the gene modification system of any one of the preceding claims.

68. A method of preparing the template RNA of any one of claims 1-25, 46-52, 59, or 60, the method comprising synthesizing the template RNA by: the DNA encoding the template RNA is introduced into the host cell by in vitro transcription (e.g., solid state synthesis) or by introducing the DNA into the host cell under conditions that allow for the production of the template RNA.

69. A method for modifying a target site in a human HBB gene in a cell, the method comprising contacting the cell with the genetic modification system of any one of claims 26-58, 60 or 61 or DNA encoding the genetic modification system, thereby modifying the target site in the human HBB gene in the cell.

70. A method for treating a subject having a disease or condition associated with a mutation in the human HBB gene, the method comprising administering the gene modification system or DNA encoding the gene modification system of any one of claims 26-58, 60 or 61 to the subject, thereby treating a subject having a disease or condition associated with a mutation in the human HBB gene.

71. The method of claim 69 or 70, wherein the disease or condition is Sickle Cell Disease (SCD).

72. The method of any one of claims 69-71 wherein the subject has an E6V mutation.

73. A method for treating a subject having SCD, the method comprising administering to the subject the genetic modification system or DNA encoding the genetic modification system of any one of claims 26-58, 60 or 61, thereby treating the subject having SCD.

74. The genetic modification system or method of any one of the preceding claims, wherein introduction of the system into a target cell corrects a pathogenic mutation of the HBB gene.

75. The genetic modification system or method of any one of the preceding claims, wherein the pathogenic mutation is an E6V mutation, and wherein the correction comprises an amino acid substitution of V6E.

76. The genetic modification system or method of any one of the preceding claims, wherein introduction of the system into a target cell results in a mutation that promotes restoration of function of the HBB gene.

77. The genetic modification system or method of any one of the preceding claims, wherein correction of the mutation occurs in at least 30% (e.g., 30%, 40%, 50%, 60%, 70% or more) of the target nucleic acid.

78. The genetic modification system or method of any one of the preceding claims, wherein correction of the mutation occurs in at least 30% (e.g., 30%, 40%, 50%, 60%, 70% or more) of the target cells.

79. The genetic modification system or method of any one of the preceding claims, wherein the genetic modification system comprises a second strand-targeted gRNA, and wherein correction of mutations in the target cell population is increased relative to a target cell population treated with a genetic modification system comprising a template RNA without the second strand-targeted gRNA.

80. The genetic modification system or method of any one of the preceding claims, wherein the template RNA comprises one or more silencing substitutions (e.g., as exemplified in tables 7A, X4 and X4A), and wherein correction of mutations in the target cell population is increased relative to a target cell population treated with a genetic modification system comprising a template RNA that does not comprise one or more silencing substitutions.

81. The method of any one of the preceding claims, wherein the cell is a mammalian cell, such as a human cell.

82. The method of any one of the preceding claims, wherein the subject is a human.

83. The method of any one of the preceding claims, wherein the contacting occurs ex vivo, e.g., wherein the DNA of the cell or subject is modified ex vivo.

84. The method of any one of the preceding claims, wherein the contacting occurs in vivo, e.g., wherein the DNA of the cell or subject is modified in vivo.

85. The method of any one of the preceding claims, wherein contacting the cell or the subject with the system comprises contacting the cell or the cell in the subject with a nucleic acid (e.g., DNA or RNA) encoding the genetically modified polypeptide under conditions that allow for production of the genetically modified polypeptide.