HK40111540A

HK40111540A - Methods for large-size chromosomal transfer and modified chromosomes and organisms using same

Info

Publication number: HK40111540A
Application number: HK62024097025.6A
Authority: HK
Inventors: 张继伟; 魏喻
Original assignee: 上海伊米诺康生物科技有限公司
Priority date: 2021-09-24
Filing date: 2022-09-23
Publication date: 2025-01-03

Description

Large-size chromosome transfer methods and modified chromosomes and organisms produced using these methods.

通过引用并入序列表By referencing and incorporating into the sequence list

本申请包含序列表，该序列表已通过EFS网站以ASCII格式提交，并据此通过引用以其整体并入。This application contains a sequence list which has been submitted in ASCII format via the EFS website and is incorporated herein by reference in its entirety.

背景技术Background Technology

基因或染色体的大片段的操纵是用于基础和翻译研究以及疗法开发的有力工具。人基因的大小范围为数百个碱基至至少2,300千碱基(KB)，人染色体的大小范围为38兆碱基对(MB)至近250MB。因此，对大基因、跨越多个基因的区域和部分染色体的有效研究需要操作大的序列片段。然而，大片段操作仍然是基因编辑领域最重要的挑战之一。本公开提供了用于操作大序列的方法。Manipulation of large segments of genes or chromosomes is a powerful tool for basic and translational research, as well as for therapies. Human genes range in size from hundreds of base pairs to at least 2,300 kilobases (KB), and human chromosomes range in size from 38 megabase pairs (MB) to nearly 250 MB. Therefore, efficient study of large genes, regions spanning multiple genes, and portions of chromosomes requires manipulation of large sequence fragments. However, large-fragment manipulation remains one of the most significant challenges in the field of gene editing. This disclosure provides methods for manipulating large sequences.

发明内容Summary of the Invention

本公开提供了产生工程化的染色体的方法，其包括：(a)提供细胞，其包含含有靶序列的靶染色体和含有模板序列的模板染色体；(b)使细胞与(i)第一核酸分子和(ii)第二核酸分子接触，所述第一核酸分子从5’至3’包含5’同源臂、至少一个第一标记和3’同源臂，所述5’同源臂含有靶序列5’末端上游的核苷酸序列，所述3’同源臂含有模板序列5’末端上游的核苷酸序列；所述第二核酸分子从5’至3’包含5’同源臂、至少一个第二标记和3’同源臂，所述5’同源臂含有模板序列3’末端下游的核苷酸序列，所述3’同源臂含有靶序列3’末端下游的核苷酸序列；(c)在靶序列处或其两侧，以及在模板序列的5’和3’末端产生双链断裂，从而将模板序列以及第一和第二标记插入靶染色体中；以及(d)选择表达第一和第二标记的一个或多个细胞。This disclosure provides a method for generating engineered chromosomes, comprising: (a) providing cells comprising a target chromosome containing a target sequence and a template chromosome containing a template sequence; (b) contacting the cells with (i) a first nucleic acid molecule and (ii) a second nucleic acid molecule, the first nucleic acid molecule comprising, from 5' to 3', a 5' homologous arm, at least one first marker, and a 3' homologous arm, the 5' homologous arm containing a nucleotide sequence upstream of the 5' end of the target sequence, and the 3' homologous arm containing a nucleotide sequence upstream of the 5' end of the template sequence; the second nucleic acid molecule comprising, from 5' to 3', a 5' homologous arm, at least one second marker, and a 3' homologous arm, the 5' homologous arm containing a nucleotide sequence downstream of the 3' end of the template sequence, and the 3' homologous arm containing a nucleotide sequence downstream of the 3' end of the target sequence; (c) generating double-strand breaks at or on either side of the target sequence and at the 5' and 3' ends of the template sequence, thereby inserting the template sequence and the first and second markers into the target chromosome; and (d) selecting one or more cells expressing the first and second markers.

在一些实施方案中，在插入模板序列后，第一标记位于模板序列的5’末端，第二标记位于模板序列的3’末端。In some implementations, after inserting the template sequence, the first marker is located at the 5' end of the template sequence, and the second marker is located at the 3' end of the template sequence.

在一些实施方案中，第一和第二核酸分子的5’和3’同源臂的长度介于约20与2,000个碱基对(bp)之间，介于约50bp与1,500bp之间，介于约100bp与1,400bp之间，介于约150bp与1,300bp之间，介于约200bp与1,200bp之间，介于约300bp与1,100bp之间，介于约400bp与1,000bp之间，或介于约500bp与900bp之间，或介于约600bp与800bp之间。在一些实施方案中，第一和第二核酸分子的5’和3’同源臂的长度介于约400bp与1,500bp之间，介于约500bp与1,300bp之间，或介于约600bp与1,000bp之间。在一些实施方案中，第一和第二核酸分子的5’和3’同源臂的长度介于约600bp与1,000bp之间。In some embodiments, the lengths of the 5' and 3' homologous arms of the first and second nucleic acid molecules are between about 20 and 2,000 base pairs (bp), between about 50 bp and 1,500 bp, between about 100 bp and 1,400 bp, between about 150 bp and 1,300 bp, between about 200 bp and 1,200 bp, between about 300 bp and 1,100 bp, between about 400 bp and 1,000 bp, or between about 500 bp and 900 bp, or between about 600 bp and 800 bp. In some embodiments, the lengths of the 5' and 3' homologous arms of the first and second nucleic acid molecules are between about 400 bp and 1,500 bp, between about 500 bp and 1,300 bp, or between about 600 bp and 1,000 bp. In some implementations, the lengths of the 5' and 3' homologous arms of the first and second nucleic acid molecules are between approximately 600 bp and 1,000 bp.

在一些实施方案中，模板序列的长度为至少25千碱基对(KB)、至少50KB、至少100KB、至少200KB、至少400KB、至少500KB、至少600KB、至少700KB、至少800KB、至少900KB、至少1兆碱基对(MB)、至少2MB、至少3MB、至少4MB、至少5MB、至少6MB、至少7MB、至少8MB、至少9MB、至少10MB、至少15MB、至少20MB、至少25MB、至少30MB、至少40MB、至少50MB、至少60MB、至少70MB、至少80MB、至少90MB、至少100MB、至少120MB、至少140MB、至少160MB、至少180MB、至少200MB、至少220MB或至少250MB。在一些实施方案中，模板序列的长度介于50KB与250MB之间、50KB与100MB之间、50KB与50MB之间、50KB与20MB之间、50KB与10MB之间、50KB与5MB之间、50KB与3MB之间、50KB与2MB之间、50KB与1MB之间、100KB与200MB之间、100KB与100MB之间、100KB与50MB之间、100KB与20MB之间、100KB与10MB之间、100KB与5MB之间、100KB与3MB之间、100KB与2MB之间、100KB与1MB之间、100KB与500KB之间、200KB与100MB之间、200KB与50MB之间、200KB与20MB之间、200KB与10MB之间、200KB与5MB之间、200KB与3MB之间、200KB与2MB之间、200KB与1MB之间、200KB与500KB之间、500KB与100MB之间、500KB与50MB之间、500KB与20MB之间、500KB与10MB之间、500KB与5MB之间、500KB与3MB之间、500KB与2MB之间、500KB与1MB之间、1MB与100MB之间、1MB与50MB之间、1MB与20MB之间、1MB与10MB之间、1MB与5MB之间、1MB与3MB之间、1MB与2MB之间、3MB与100MB之间、3MB与50MB之间、3MB与20MB之间、3MB与10MB之间、3MB与5MB之间、5MB与100MB之间、5MB与50MB之间、5MB与20MB之间、5MB与10MB之间、10MB与100MB之间、10MB与50MB之间或在10MB与20MB之间。在一些实施方案中，模板序列的长度介于200KB与50MB之间、介于1MB与20MB之间、介于1MB与10MB之间、介于1MB与5MB之间、介于1MB与3MB之间、介于3MB与20MB之间、介于3MB与10MB之间、介于3MB与7MB之间或介于3MB与5MB之间。In some implementations, the template sequence length is at least 25 kilobase pairs (KB), at least 50 KB, at least 100 KB, at least 200 KB, at least 400 KB, at least 500 KB, at least 600 KB, at least 700 KB, at least 800 KB, at least 900 KB, at least 1 megabase pair (MB), at least 2 MB, at least 3 MB, at least 4 MB, at least 5 MB, at least 6 MB, at least 7 MB, at least 8 MB, at least 9 MB, at least 10 MB, at least 15 MB, at least 20 MB, at least 25 MB, at least 30 MB, at least 40 MB, at least 50 MB, at least 60 MB, at least 70 MB, at least 80 MB, at least 90 MB, at least 100 MB, at least 120 MB, at least 140 MB, at least 160 MB, at least 180 MB, at least 200 MB, at least 220 MB, or at least 250 MB. In some implementations, the template sequence length is between 50KB and 250MB, 50KB and 100MB, 50KB and 50MB, 50KB and 20MB, 50KB and 10MB, 50KB and 5MB, 50KB and 3MB, 50KB and 2MB, 50KB and 1MB, 100KB and 200MB, 100KB and 100MB, 100KB and 50MB, and 100KB. Between 20MB and 100KB, between 10MB and 10MB, between 100KB and 5MB, between 100KB and 3MB, between 100KB and 2MB, between 100KB and 1MB, between 100KB and 500KB, between 200KB and 10MB, between 200KB and 50MB, between 200KB and 20MB, between 200KB and 10MB, between 200KB and 5MB, between 200KB and 3MB, between 200KB and 2MB Between 200KB and 1MB, between 200KB and 500KB, between 500KB and 100MB, between 500KB and 50MB, between 500KB and 20MB, between 500KB and 10MB, between 500KB and 5MB, between 500KB and 3MB, between 500KB and 2MB, between 500KB and 1MB, between 1MB and 100MB, between 1MB and 50MB, between 1MB and 20MB, between 1MB and 10MB Between 1MB and 5MB, between 1MB and 3MB, between 1MB and 2MB, between 3MB and 100MB, between 3MB and 50MB, between 3MB and 20MB, between 3MB and 10MB, between 3MB and 5MB, between 5MB and 100MB, between 5MB and 50MB, between 5MB and 20MB, between 5MB and 10MB, between 10MB and 100MB, between 10MB and 50MB, or between 10MB and 20MB. In some implementations, the template sequence length is between 200KB and 50MB, between 1MB and 20MB, between 1MB and 10MB, between 1MB and 5MB, between 1MB and 3MB, between 3MB and 20MB, between 3MB and 10MB, between 3MB and 7MB, or between 3MB and 5MB.

在一些实施方案中，在(c)处产生双链断裂包括使用CRISPR/Cas内切核酸酶和一种或多种引导核酸(gNA)、一种或多种锌指核酸酶、一种或多种转录激活子样效应因子核酸酶(TALEN)或一种或多种CRE重组酶来诱导双链断裂。在一些实施方案中，CRISPR/Cas内切核酸酶包括CasI、CasIB、Cas2、Cas3、Cas4、Cas5、Cas6、Cas7、Cas8、Cas9、Cas10、CasX、CasY、Cpf1(Cas12a)、Cas12b、Cas13a、CsyI、Csy2、Csy3、CseI、Cse2、CscI、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、CmrI、Cmr3、Cmr4、Cmr5、Cmr6、CsbI、Csb2、Csb3、Csx17、CsxI4、Csx10、Csx16、CsaX、Csx3、Csx1、Csx15、CsfI、Csf2、Csf3、Csf4、Cms1、C2c1、C2c2或C2c3或其同源物、直系同源物(orthologs)或经修饰的形式。在一些实施方案中，CRISPR/Cas内切核酸酶包括Cas9、Cpf1(Cas12a)、Cas12b、CasX、CasY、C2c1或C2c3或其同源物、直系同源物或经修饰的形式。在一些实施方案中，CRISPR/Cas内切核酸酶包括Cas9。在一些实施方案中，gNA包括单引导RNA(sgRNA)。In some embodiments, generating a double-strand break at (c) involves inducing the double-strand break using a CRISPR/Cas endonuclease and one or more guide nucleic acids (gNA), one or more zinc finger nucleases, one or more transcription activator-like effector nucleases (TALEN), or one or more CRE recombinases. In some embodiments, the CRISPR/Cas endonuclease includes CasI, CasIB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, Cas10, CasX, CasY, Cpf1 (Cas12a), Cas12b, Cas13a, CsyI, Csy2, Csy3, CseI, Cse2, CscI, Csc2, Csa5, Csn2, Csm2, Cs CRISPR/Cas endonucleases include m3, Csm4, Csm5, Csm6, CmrI, Cmr3, Cmr4, Cmr5, Cmr6, CsbI, Csb2, Csb3, Csx17, CsxI4, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, CsfI, Csf2, Csf3, Csf4, Cms1, C2c1, C2c2, or C2c3, or their homologs, orthologs, or modified forms. In some embodiments, CRISPR/Cas endonucleases include Cas9, Cpf1 (Cas12a), Cas12b, CasX, CasY, C2c1, or C2c3, or their homologs, orthologs, or modified forms. In some embodiments, CRISPR/Cas endonucleases include Cas9. In some implementations, gNA includes a single guide RNA (sgRNA).

在一些实施方案中，靶染色体从5’至3’包含第一核酸分子的5’同源臂序列、靶序列和第二核酸分子的3’同源臂序列。在一些实施方案中，模板染色体从5’至3’包含第一核酸分子的3’同源臂序列、模板序列和第二核酸分子的5’同源臂序列。In some embodiments, the target chromosome comprises, from 5' to 3', the 5' homologous arm sequence of the first nucleic acid molecule, the target sequence, and the 3' homologous arm sequence of the second nucleic acid molecule. In some embodiments, the template chromosome comprises, from 5' to 3', the 3' homologous arm sequence of the first nucleic acid molecule, the template sequence, and the 5' homologous arm sequence of the second nucleic acid molecule.

在一些实施方案中，靶序列包含至少1个基因、至少2个基因、至少3个基因、至少5个基因、至少10个基因、至少20个基因、至少30个基因、至少40个基因、至少50个基因、至少100个基因或至少200个基因。在一些实施方案中，靶序列包含与模板序列的一个或多个基因同源的一个或多个基因。In some embodiments, the target sequence comprises at least 1 gene, at least 2 genes, at least 3 genes, at least 5 genes, at least 10 genes, at least 20 genes, at least 30 genes, at least 40 genes, at least 50 genes, at least 100 genes, or at least 200 genes. In some embodiments, the target sequence comprises one or more genes homologous to one or more genes in the template sequence.

在一些实施方案中，模板序列包含天然存在的序列。在一些实施方案中，模板序列包含至少1个基因、至少2个基因、至少3个基因、至少5个基因、至少10个基因、至少20个基因、至少30个基因、至少40个基因、至少50个基因、至少100个基因或至少200个基因。在一些实施方案中，模板序列包含对天然存在的序列的一个或多个修饰。在一些实施方案中，模板序列包含人工序列。在一些实施方案中，人工序列包含编码一种或多种抗体或其抗原结合片段的序列。在一些实施方案中，一种或多种抗体或其抗原结合片段包含scFv、双特异性抗体或多特异性抗体。In some embodiments, the template sequence comprises a naturally occurring sequence. In some embodiments, the template sequence comprises at least one gene, at least two genes, at least three genes, at least five genes, at least ten genes, at least twenty genes, at least thirty genes, at least forty genes, at least fifty genes, at least one hundred genes, or at least two hundred genes. In some embodiments, the template sequence comprises one or more modifications to the naturally occurring sequence. In some embodiments, the template sequence comprises an artificial sequence. In some embodiments, the artificial sequence comprises a sequence encoding one or more antibodies or antigen-binding fragments thereof. In some embodiments, one or more antibodies or antigen-binding fragments thereof comprise scFv, bispecific antibodies, or multispecific antibodies.

在一些实施方案中，通过插入模板序列来删除靶序列。在一些实施方案中，(a)靶染色体从5’至3’包含第一核酸分子的5’同源臂序列、第一sgRNA靶序列、靶序列、第二sgRNA靶序列和第二核酸分子的3’同源臂序列；以及(b)模板染色体从5’至3’包含第三sgRNA靶序列、第一种核酸分子的3’同源臂序列、模板序列、第二核酸分子的5’同源臂序列和第四sgRNA靶序列。在一些实施方案中，产生双链断裂包括将细胞与CRISPR/Cas内切核酸酶以及第一、第二、第三和第四sgRNA接触。在一些实施方案中，第一、第二、第三和第四sgRNA包含对第一、第二、第三和第四sgRNA靶序列特异的靶向序列。In some embodiments, the target sequence is deleted by inserting a template sequence. In some embodiments, (a) the target chromosome from 5' to 3' comprises the 5' homologous arm sequence of a first nucleic acid molecule, a first sgRNA target sequence, a target sequence, a second sgRNA target sequence, and the 3' homologous arm sequence of a second nucleic acid molecule; and (b) the template chromosome from 5' to 3' comprises the third sgRNA target sequence, the 3' homologous arm sequence of the first nucleic acid molecule, the template sequence, the 5' homologous arm sequence of the second nucleic acid molecule, and the fourth sgRNA target sequence. In some embodiments, generating a double-strand break includes contacting the cell with a CRISPR/Cas endonuclease and the first, second, third, and fourth sgRNAs. In some embodiments, the first, second, third, and fourth sgRNAs contain targeting sequences specific to the first, second, third, and fourth sgRNA target sequences.

在一些实施方案中，将细胞与CRISPR/Cas内切核酸酶和sgRNA接触包括用一种或多种编码CRISPR/Cas内切核酸酶和sgRNA的核酸分子转染细胞。In some implementations, contacting cells with CRISPR/Cas endonucleases and sgRNA involves transfecting cells with one or more nucleic acid molecules encoding CRISPR/Cas endonucleases and sgRNA.

在一些实施方案中，插入模板序列包括删除极少靶序列的序列或不删除靶序列的序列。在一些实施方案中，插入模板序列破坏了靶序列的一种或多种功能。在一些实施方案中，插入模板序列破坏了靶序列中的基因。在一些实施方案中,(a)靶染色体从5’至3’包含第一核酸分子的5’同源臂序列、第一sgRNA靶序列和第二核酸分子的3’同源臂序列；以及(b)模板染色体从5’至3’包含第二sgRNA靶序列、第一核酸分子的3’同源臂序列、模板序列、第二核酸分子的5’同源臂序列和第三sgRNA靶序列。在一些实施方案中，产生双链断裂包括将细胞与CRISPR/Cas内切核酸酶以及第一、第二和第三sgRNA接触。在一些实施方案中，第一、第二和第三sgRNA包含对第一、第二和第三sgRNA靶序列特异的靶向序列。在一些实施方案中，将细胞与CRISPR/Cas内切核酸酶和sgRNA接触包括用一种或多种编码CRISPR/Cas内切核酸酶和sgRNA的核酸分子转染细胞。In some embodiments, inserting a template sequence includes deleting a sequence with minimal or no deletion of the target sequence. In some embodiments, inserting a template sequence disrupts one or more functions of the target sequence. In some embodiments, inserting a template sequence disrupts a gene in the target sequence. In some embodiments, (a) the target chromosome from 5' to 3' contains a 5' homologous arm sequence of a first nucleic acid molecule, a first sgRNA target sequence, and a 3' homologous arm sequence of a second nucleic acid molecule; and (b) the template chromosome from 5' to 3' contains a second sgRNA target sequence, a 3' homologous arm sequence of the first nucleic acid molecule, a template sequence, a 5' homologous arm sequence of the second nucleic acid molecule, and a third sgRNA target sequence. In some embodiments, generating a double-strand break includes contacting the cell with a CRISPR/Cas endonuclease and the first, second, and third sgRNAs. In some embodiments, the first, second, and third sgRNAs contain targeting sequences specific to the first, second, and third sgRNA target sequences. In some implementations, contacting cells with CRISPR/Cas endonucleases and sgRNA involves transfecting cells with one or more nucleic acid molecules encoding CRISPR/Cas endonucleases and sgRNA.

在一些实施方案中，第一或第二标记包含与能够在细胞中表达荧光蛋白的启动子可操作地连接的荧光蛋白。在一些实施方案中，荧光蛋白包括绿色荧光蛋白(GFP)、黄色荧光蛋白(YFP)、红色荧光蛋白(RFP)、青色荧光蛋白(CFP)、蓝色荧光蛋白(BFP)、dsRed、mCherry或tdTomato。在一些实施方案中，荧光蛋白包括GFP。在一些实施方案中，第一标记还包括选择标记。在一些实施方案中，第二标记还包括选择标记。在一些实施方案中，选择性标记选自由以下组成的组：二氢叶酸还原酶(DHFR)、谷氨酰胺合酶(GS)、嘌呤霉素乙酰转移酶、杀稻瘟素脱氨酶、组氨醇脱氢酶、潮霉素磷酸转移酶(hph)、博来霉素抗性基因和氨基糖苷磷酸转移酶(新霉素抗性)。在一些实施方案中，第一和第二标记不是相同的选择标记。在一些实施方案中，第一标记包含与能够在细胞中表达GFP的启动子和嘌呤霉素乙酰转移酶可操作地连接的GFP，并且第二标记包含潮霉素磷酸转移酶。In some embodiments, the first or second marker comprises a fluorescent protein operatively linked to a promoter capable of expressing a fluorescent protein in the cell. In some embodiments, the fluorescent protein includes green fluorescent protein (GFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent protein (CFP), blue fluorescent protein (BFP), dsRed, mCherry, or tdTomato. In some embodiments, the fluorescent protein includes GFP. In some embodiments, the first marker further includes a selection marker. In some embodiments, the second marker further includes a selection marker. In some embodiments, the selection marker is selected from the group consisting of: dihydrofolate reductase (DHFR), glutamine synthase (GS), puromycin acetyltransferase, blastcin deaminase, histamine dehydrogenase, hygromycin phosphotransferase (hph), bleomycin resistance gene, and aminoglycoside phosphotransferase (neomycin resistance). In some embodiments, the first and second markers are not the same selection marker. In some embodiments, the first marker comprises GFP operatively linked to a promoter capable of expressing GFP in the cell and puromycin acetyltransferase, and the second marker comprises hygromycin phosphotransferase.

在一些实施方案中，该方法还包括(e)在步骤(d)之后删除第一或第二标记的全部或一部分。在一些实施方案中，删除第一或第二标记包括用CRISPR/Cas内切核酸酶和gNA诱导删除，所述gNA包含对编码标记的序列特异的靶向序列。In some embodiments, the method further includes (e) deleting all or part of the first or second marker after step (d). In some embodiments, deletion of the first or second marker includes deletion induced by a CRISPR/Cas endonuclease and gNA, wherein the gNA contains a sequence-specific targeting sequence encoding the marker.

在一些实施方案中，细胞包括杂交细胞、胚胎杂交干细胞(EHS)或受精卵。在一些实施方案中，通过融合来自选自由以下组成的组的任何两个物种的ES细胞来产生EHS细胞：小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、鸡和猴。在一些实施方案中，通过将人胚胎干细胞与来自非人物种的胚胎干细胞融合来产生EHS细胞。在一些实施方案中，非人物种是小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、鸡或猴。在一些实施方案中，通过融合来自选自由以下组成的组的任何两种不同物种的EH细胞来产生EHS细胞：小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、鸡和猴。在一些实施方案中，融合包括电融合、病毒诱导融合或化学诱导融合。In some embodiments, the cells include hybrid cells, embryonic hybrid stem cells (EHS), or fertilized eggs. In some embodiments, EHS cells are generated by fusing ES cells from any two species selected from the group consisting of: mice, rats, rabbits, guinea pigs, hamsters, sheep, goats, donkeys, cattle, horses, camels, chickens, and monkeys. In some embodiments, EHS cells are generated by fusing human embryonic stem cells with embryonic stem cells from a non-human species. In some embodiments, the non-human species are mice, rats, rabbits, guinea pigs, hamsters, sheep, goats, donkeys, cattle, horses, camels, chickens, or monkeys. In some embodiments, EHS cells are generated by fusing EH cells from any two different species selected from the group consisting of: mice, rats, rabbits, guinea pigs, hamsters, sheep, goats, donkeys, cattle, horses, camels, chickens, and monkeys. In some embodiments, the fusion includes electrofusion, viral-induced fusion, or chemically induced fusion.

在一些实施方案中，细胞包括杂交细胞。在一些实施方案中，产生杂交细胞包括：(a)产生微核人细胞(micronucleated human cell)；和(b)将微核人细胞与来自非人物种的细胞融合，从而产生杂交细胞。在一些实施方案中，通过在足以诱导微核化的条件下将人细胞暴露于秋水仙胺(colcemid)并使用离心收集微核细胞来产生微核人细胞。在一些实施方案中，非人物种是小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、鸡或猴。在一些实施方案中，来自非人物种的细胞是ES细胞，并且杂交细胞是EHS细胞。In some embodiments, the cells include hybrid cells. In some embodiments, generating hybrid cells includes: (a) generating micronucleated human cells; and (b) fusing micronucleated human cells with cells from a non-human species to generate hybrid cells. In some embodiments, micronucleated human cells are generated by exposing human cells to colchicine under conditions sufficient to induce micronucleation and collecting the micronucleated cells using centrifugation. In some embodiments, the non-human species is a mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cow, horse, camel, chicken, or monkey. In some embodiments, the cells from the non-human species are ES cells, and the hybrid cells are EHS cells.

在一些实施方案中，靶序列包含编码免疫球蛋白或T细胞受体亚单位的基因。在一些实施方案中，靶染色体包括小鼠第12号染色体，模板染色体包括人第14号染色体。在一些实施方案中，靶序列包含小鼠Igh可变区序列。在一些实施方案中，小鼠Igh可变区序列包含编码小鼠VH、DH和JH1-6基因区片段的序列和间插非编码序列。在一些实施方案中，模板序列包含人IGH可变区序列。在一些实施方案中，人IGH可变区序列包含编码人VH、DH和JH1-6基因区段的序列和间插非编码序列。在一些实施方案中，靶序列包含小鼠Igl可变区序列。在一些实施方案中，靶序列包含小鼠Igk可变区序列。在一些实施方案中，模板序列包含人IGL可变区序列。在一些实施方案中，模板序列包含人IGK可变区序列。在一些实施方案中，小鼠Igk可变区序列包含编码小鼠V_k和J_k1-5基因区段的序列和间插非编码序列。在一些实施方案中，模板序列包含人IGK可变区序列。在一些实施方案中，人IGK可变区序列包含编码人V_k和J_k1-5基因区段的序列和间插非编码序列。In some embodiments, the target sequence comprises a gene encoding an immunoglobulin or a T-cell receptor subunit. In some embodiments, the target chromosome includes mouse chromosome 12, and the template chromosome includes human chromosome 14. In some embodiments, the target sequence comprises a mouse Igh variable region sequence. In some embodiments, the mouse Igh variable region sequence comprises sequences encoding segments of the mouse VH, DH, and JH1-6 gene regions and intercalated non-coding sequences. In some embodiments, the template sequence comprises a human IGH variable region sequence. In some embodiments, the human IGH variable region sequence comprises sequences encoding segments of the human VH, DH, and JH1-6 gene regions and intercalated non-coding sequences. In some embodiments, the target sequence comprises a mouse Igl variable region sequence. In some embodiments, the target sequence comprises a mouse Igk variable region sequence. In some embodiments, the template sequence comprises a human IGL variable region sequence. In some embodiments, the template sequence comprises a human IGK variable region sequence. In some embodiments, the mouse Igk variable region sequence comprises sequences encoding segments of the mouse _Vk and _Jk1-5 gene regions and intercalated non-coding sequences. In some embodiments, the template sequence comprises a human IGK variable region sequence. In some embodiments, the human IGK variable region sequence comprises sequences encoding human _Vk and _Jk1-5 gene segments and interpolated non-coding sequences.

在一些实施方案中，该方法还包括从步骤(d)中选择的细胞中回收工程化的染色体。在一些实施方案中，回收工程化的染色体包括在足以诱导微核化的条件下将细胞暴露于秋水仙胺，以及使用离心收集微核细胞。In some embodiments, the method further includes recovering engineered chromosomes from the cells selected in step (d). In some embodiments, recovering engineered chromosomes includes exposing cells to colchicine under conditions sufficient to induce micronucleation, and collecting micronucleated cells using centrifugation.

在一些实施方案中，第一和第二核酸分子是质粒。In some implementations, the first and second nucleic acid molecules are plasmids.

本公开提供了通过本公开的方法产生的工程化的染色体。This disclosure provides engineered chromosomes produced by the methods of this disclosure.

在一些实施方案中，工程化的染色体是包含替代小鼠Igh可变区的人IGH可变区的序列的小鼠第12号染色体。在一些实施方案中，小鼠Igh可变区包含VH、DH和JH1-6基因区段和间插非编码序列。在一些实施方案中，人IGH可变区包含VH、DH和JH1-6基因区段和间插非编码序列。在一些实施方案中，工程化的染色体是包含人IGK可变区的序列替代小鼠Igk可变区的小鼠第6号染色体。在一些实施方案中，小鼠Igk可变区序列包含编码小鼠V_k和J_k1-5基因区段的序列和间插非编码序列。在一些实施方案中，模板序列包含人IGK可变区序列。在一些实施方案中，人IGK可变区序列包含编码人V_k和J_k1-5基因区段的序列和间插非编码序列。In some embodiments, the engineered chromosome is mouse chromosome 12 containing a sequence of the human IGH variable region replacing the mouse Igh variable region. In some embodiments, the mouse Igh variable region contains VH, DH, and JH1-6 gene segments and intercalated non-coding sequences. In some embodiments, the human IGH variable region contains VH, DH, and JH1-6 gene segments and intercalated non-coding sequences. In some embodiments, the engineered chromosome is mouse chromosome 6 containing a sequence of the human IGK variable region replacing the mouse Igk variable region. In some embodiments, the mouse Igk variable region sequence contains sequences encoding mouse _Vk and _Jk1-5 gene segments and intercalated non-coding sequences. In some embodiments, the template sequence contains a human IGK variable region sequence. In some embodiments, the human IGK variable region sequence contains sequences encoding human _Vk and _Jk1-5 gene segments and intercalated non-coding sequences.

本公开提供了包含本公开的工程化的染色体的细胞。This disclosure provides cells comprising engineered chromosomes of this disclosure.

在一些实施方案中，细胞能够与小鼠ES细胞杂交。在一些实施方案中，细胞是胚胎干(ES)细胞、胚胎杂交干(EHS)细胞或合子细胞。在一些实施方案中，EHS细胞是人与小鼠ES细胞的杂交体。在一些实施方案中，ES细胞是小鼠ES细胞。在一些实施方案中，细胞是微核细胞。In some embodiments, the cells are capable of hybridizing with mouse ES cells. In some embodiments, the cells are embryonic stem (ES) cells, embryonic hybrid stem (EHS) cells, or zygote cells. In some embodiments, the EHS cells are hybrids of human and mouse ES cells. In some embodiments, the ES cells are mouse ES cells. In some embodiments, the cells are micronucleated cells.

本公开提供了包括产生小鼠胚胎干细胞的方法，其包括：(a)将包含通过本公开的法中的任一方法产生的工程化的染色体的微核细胞与小鼠ES细胞融合，其中：(i)小鼠Es细胞包含与工程化的染色体同源的染色体，该同源染色体包含与能够在ES细胞中表达荧光蛋白的启动子可操作地连接的第一荧光蛋白，以及(ii)至少一个亚群的微核细胞包含工程化的染色体，并且其中工程化的染色体包含不同于第一荧光蛋白的第二荧光蛋白，第二荧光蛋白与能够在ES细胞中表达荧光蛋白的启动子可操作地连接；(b)选择表达第一和第二荧光蛋白的ES细胞；(c)培养步骤(c)中选择的ES细胞，直至至少一个亚群的ES细胞丢失同源染色体；以及(d)选择表达第二种荧光蛋白但不表达第一种荧光蛋白的ES细胞。This disclosure provides a method for generating mouse embryonic stem cells, comprising: (a) fusing micronucleated cells containing engineered chromosomes generated by any of the methods of this disclosure with mouse ES cells, wherein: (i) the mouse ES cells contain chromosomes homologous to the engineered chromosomes, the homologous chromosomes containing a first fluorescent protein operatively linked to a promoter capable of expressing a fluorescent protein in the ES cells; and (ii) at least one subpopulation of micronucleated cells containing engineered chromosomes, wherein the engineered chromosomes contain a second fluorescent protein different from the first fluorescent protein, the second fluorescent protein being operatively linked to a promoter capable of expressing a fluorescent protein in the ES cells; (b) selecting ES cells expressing the first and second fluorescent proteins; (c) culturing the ES cells selected in step (c) until at least one subpopulation of ES cells loses the homologous chromosome; and (d) selecting ES cells expressing the second fluorescent protein but not the first fluorescent protein.

在一些实施方案中，在步骤(c)中培养细胞包括培养细胞至少5天、至少7天、至少10天或至少14天。在一些实施方案中，在步骤(b)和(d)中选择细胞包括荧光激活细胞分选(FACS)。In some embodiments, culturing cells in step (c) includes culturing cells for at least 5 days, at least 7 days, at least 10 days, or at least 14 days. In some embodiments, selecting cells in steps (b) and (d) includes fluorescence-activated cell sorting (FACS).

本公开提供了通过本公开的方法产生的小鼠ES细胞。This disclosure provides mouse ES cells generated by the methods of this disclosure.

本公开提供了由本公开的小鼠ES细胞产生的转基因小鼠。This disclosure provides a transgenic mouse produced from the mouse ES cells of this disclosure.

在一些实施方案中，产生转基因小鼠包括将ES细胞注射到二倍体胚泡中，从ES细胞向去核小鼠胚胎进行核转移，或四倍体胚胎互补。在一些实施方案中，小鼠第12号染色体包含替代小鼠Igh可变区的人IGH可变区的序列。在一些实施方案中，小鼠Igh可变区包含VH、DH和JH1-6基因区段和间插非编码序列。在一些实施方案中，人IGH可变区包含VH、DH和JH1-6基因区段和间插非编码序列。在一些实施方案中，小鼠第6号染色体包含替代小鼠Igk可变区的人IGK可变区的序列。在一些实施方案中，小鼠Igk可变区序列包含编码小鼠V_k和J_k1-5基因区段的序列和间插非编码序列。在一些实施方案中，模板序列包含人IGK可变区序列。在一些实施方案中，人IGK可变区序列包含编码人V_k和J_k1-5基因区段的序列和间插非编码序列。In some embodiments, generating transgenic mice includes injecting ES cells into diploid blastocysts, nuclear transfer from ES cells to enucleated mouse embryos, or complementation of tetraploid embryos. In some embodiments, mouse chromosome 12 contains a sequence representing the human IGH variable region replacing the mouse Igh variable region. In some embodiments, the mouse Igh variable region contains VH, DH, and JH1-6 gene segments and intercalated non-coding sequences. In some embodiments, the human IGH variable region contains VH, DH, and JH1-6 gene segments and intercalated non-coding sequences. In some embodiments, mouse chromosome 6 contains a sequence representing the human IGK variable region replacing the mouse Igk variable region. In some embodiments, the mouse Igk variable region sequence contains sequences encoding mouse _Vk and _Jk1-5 gene segments and intercalated non-coding sequences. In some embodiments, the template sequence contains a human IGK variable region sequence. In some embodiments, the human IGK variable region sequence contains sequences encoding human _Vk and _Jk1-5 gene segments and intercalated non-coding sequences.

本公开提供了产生抗体的方法，其包括：(a)用抗原攻击本公开转基因小鼠，由此转基因小鼠产生多种抗体，所述抗体包含来自人IGH可变区的人V、D和J区段；以及(b)分离对抗原特异的抗体。This disclosure provides a method for generating antibodies, comprising: (a) attacking a transgenic mouse of the present disclosure with an antigen, thereby generating a variety of antibodies comprising human V, D and J segments from the human IGH variable region; and (b) isolating antibodies specific to the antigen.

本公开提供了产生抗体的方法，其包括：(a)用抗原攻击本发明的转基因小鼠，由此转基因小鼠产生多种抗体，所述抗体包含来自人IGK或IGL可变区的人V和J区段；以及(b)分离对抗原特异的抗体。This disclosure provides a method for generating antibodies, comprising: (a) attacking a transgenic mouse of the present invention with an antigen, thereby generating a variety of antibodies, said antibodies comprising human V and J segments from the human IGK or IGL variable region; and (b) isolating antibodies specific to the antigen.

本公开提供了衍生自由本公开的转基因小鼠产生的抗体的抗体。在一些实施方案中，抗体包含单链可变区段(scFv)、双特异性抗体或多特异性抗体。This disclosure provides antibodies derived from the transgenic mice of this disclosure. In some embodiments, the antibody comprises a single-chain variable segment (scFv), a bispecific antibody, or a multispecific antibody.

本公开提供了产生染色体重排的方法，其包括：(a)提供细胞，其包含含有靶位置的靶染色体和含有模板序列的模板染色体；(b)将细胞与核酸分子接触，所述核酸分子从5’至3’包含5’同源臂和3’同源臂，所述5’同源臂含有靶位置5’末端上游的核苷酸序列，所述3’同源臂含有模板序列5’末端上游的核苷酸序列；(c)在靶位置上和模板序列的5’末端产生双链断裂，从而将标记插入5’同源臂序列3’的靶染色体，随后插入模板序列，从而产生染色体重排；以及(d)选择表达该标记的一个或多个细胞。This disclosure provides a method for generating chromosomal rearrangements, comprising: (a) providing cells containing a target chromosome with a target location and a template chromosome containing a template sequence; (b) contacting the cells with a nucleic acid molecule comprising a 5' homologous arm and a 3' homologous arm from 5' to 3', the 5' homologous arm containing a nucleotide sequence upstream of the 5' end of the target location, and the 3' homologous arm containing a nucleotide sequence upstream of the 5' end of the template sequence; (c) generating a double-strand break at the target location and at the 5' end of the template sequence, thereby inserting a marker into the target chromosome at the 3' of the 5' homologous arm sequence, followed by insertion into the template sequence, thereby generating chromosomal rearrangements; and (d) selecting one or more cells expressing the marker.

在一些实施方案中，核酸分子的5’和3’同源臂的长度介于约20bp与2,000bp之间，介于约50bp与1,500bp之间，介于约100bp和1,400bp之间，介于约150bp和1,300bp之间，介于约200bp和1,200bp之间，介于约300bp和1,100bp之间，介于约400bp与1,000bp之间，或介于约500bp与900bp之间，或介于约600bp与800bp之间。在一些实施方案中，核酸分子的5’和3’同源臂的长度介于约400bp与1,500bp之间，长度介于约500bp与1,300bp之间，或长度介于约600b与1,000bp之间。在一些实施方案中，核酸分子的5’和3’同源臂的长度介于约600bp与1,000bp之间。In some embodiments, the lengths of the 5' and 3' homologous arms of the nucleic acid molecule are between about 20 bp and 2,000 bp, between about 50 bp and 1,500 bp, between about 100 bp and 1,400 bp, between about 150 bp and 1,300 bp, between about 200 bp and 1,200 bp, between about 300 bp and 1,100 bp, between about 400 bp and 1,000 bp, or between about 500 bp and 900 bp, or between about 600 bp and 800 bp. In some embodiments, the lengths of the 5' and 3' homologous arms of the nucleic acid molecule are between about 400 bp and 1,500 bp, between about 500 bp and 1,300 bp, or between about 600 bp and 1,000 bp. In some embodiments, the lengths of the 5' and 3' homologous arms of the nucleic acid molecule are between about 600 bp and 1,000 bp.

在一些实施方案中，在(c)中产生双链断裂包括使用CRISPR/Cas内切核酸酶和至少一种sgRNA、一种或多种锌指核酸酶、一种或多种转录激活子样效应因子核酸酶(TALEN)或一种或多种CRE重组酶来诱导双链断裂。在一些实施方案中，CRISPR/Cas内切核酸酶包括CasI、CasIB、Cas2、Cas3、Cas4、Cas5、Cas6、Cas7、Cas8、Cas9、Cas10、CasX、CasY、Cas12a(Cpf1)、Cas12b、Cas13a、CsyI、Csy2、Csy3、CseI、Cse2、CscI、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、CmrI、Cmr3、Cmr4、Cmr5、Cmr6、CsbI、Csb2、Csb3、Csx17、CsxI4、Csx10、Csx16、CsaX、Csx3、Csx1、Csx15、CsfI、Csf2、Csf3、Csf4、Cms1、C2c1、C2c2或C2c3或其同源物、直系同源物、或经修饰的形式。在一些实施方案中，CRISPR/Cas内切核酸酶包括Cas9、Cpf1、CasX、CasY、C2c1或C2c3或其同源物、直系同源物、或经修饰的形式。在一些实施方案中，CRISPR/Cas内切核酸酶包括Cas9。在一些实施方案中，产生双链断裂包括将细胞与CRISPR/Cas内切核酸酶、至少第一gNA和第二gNA接触，所述第一gNA包含对靶位置特异的靶向序列，使得CRISPR/Cas内切核酸酶切割靶位置，所述第二gNA包含对模板序列5’末端特异的靶向序列。在一些实施方案中，将细胞与CRISPR/Cas内切核酸酶和sgRNA接触包括用一种或多种编码CRISPR/Cas内切核酸酶和sgRNA的核酸分子转染细胞。在一些实施方案中，一种或多种核酸分子是质粒。In some embodiments, generating double-strand breaks in (c) includes inducing double-strand breaks using a CRISPR/Cas endonuclease and at least one sgRNA, one or more zinc finger nucleases, one or more transcription activator-like effector nucleases (TALENs), or one or more CRE recombinases. In some embodiments, the CRISPR/Cas endonuclease includes CasI, CasIB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, Cas10, CasX, CasY, Cas12a(Cpf1), Cas12b, Cas13a, CsyI, Csy2, Csy3, CseI, Cse2, CscI, Csc2, Csa5, Csn2, Cs CRISPR/Cas endonucleases include m2, Csm3, Csm4, Csm5, Csm6, CmrI, Cmr3, Cmr4, Cmr5, Cmr6, CsbI, Csb2, Csb3, Csx17, CsxI4, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, CsfI, Csf2, Csf3, Csf4, Cms1, C2c1, C2c2, or C2c3, or their homologs, orthologs, or modified forms. In some embodiments, the CRISPR/Cas endonuclease includes Cas9, Cpf1, CasX, CasY, C2c1, or C2c3, or their homologs, orthologs, or modified forms. In some embodiments, the CRISPR/Cas endonuclease includes Cas9. In some embodiments, generating a double-strand break includes contacting cells with a CRISPR/Cas endonuclease, at least a first gNA and a second gNA, the first gNA containing a target-specific targeting sequence such that the CRISPR/Cas endonuclease cleaves the target site, and the second gNA containing a target sequence specific to the 5' end of the template sequence. In some embodiments, contacting cells with the CRISPR/Cas endonuclease and sgRNA includes transfecting cells with one or more nucleic acid molecules encoding the CRISPR/Cas endonuclease and sgRNA. In some embodiments, the one or more nucleic acid molecules are plasmids.

在一些实施方案中，标记包含与能够在细胞中表达荧光蛋白的启动子可操作连接的荧光蛋白。在一些实施方案中，荧光蛋白包括GFP、YFP、RFP、CFP、BFP、dsRed、mCherry或tdTomato。在一些实施方案中，标记还包括选择标记。在一些实施方案中，选择标记选自由以下组成的组：二氢叶酸还原酶(DHFR)、谷氨酰胺合酶(GS)、嘌呤霉素乙酰转移酶、杀稻瘟素脱氨酶、组氨醇脱氢酶、潮霉素磷酸转移酶(hph)、博来霉素抗性基因和氨基糖苷磷酸转移酶(新霉素抗性)。In some embodiments, the marker comprises a fluorescent protein operatively linked to a promoter capable of expressing a fluorescent protein in the cell. In some embodiments, the fluorescent protein includes GFP, YFP, RFP, CFP, BFP, dsRed, mCherry, or tdTomato. In some embodiments, the marker further includes a selection marker. In some embodiments, the selection marker is selected from the group consisting of: dihydrofolate reductase (DHFR), glutamine synthase (GS), puromycin acetyltransferase, blastcin deaminase, histamine dehydrogenase, hygromycin phosphotransferase (hph), bleomycin resistance gene, and aminoglycoside phosphotransferase (neomycin resistance).

在一些实施方案中，细胞包括胚胎干(ES)细胞。In some implementations, the cells include embryonic stem (ES) cells.

在一些实施方案中，核酸分子是质粒。In some implementations, the nucleic acid molecule is a plasmid.

本公开提供了包含通过本公开的方法产生的染色体重排的细胞。在一些实施方案中，细胞是小鼠ES细胞。This disclosure provides cells comprising chromosomal rearrangements produced by the methods of this disclosure. In some embodiments, the cells are mouse ES cells.

本公开提供了转基因小鼠，其来自通过本公开的方法产生的小鼠ES细胞。This disclosure provides transgenic mice derived from mouse ES cells produced by the methods of this disclosure.

附图说明Attached Figure Description

通过参考以下阐述说明性实施方案的详细描述和附图，将获得对本公开的特征和有利方面的更好理解，其中：A better understanding of the features and advantages of this disclosure will be obtained by referring to the following detailed description and accompanying drawings illustrating the illustrative embodiments, wherein:

图1是从上至下显示小鼠免疫球蛋白重链复合物(Igh)、人Igh和其中可变结构域(V_H、D_H和J_H1-6)已经人源化的小鼠Igh的图解。Chro:染色体。Figure 1 is a diagram from top to bottom showing the mouse immunoglobulin heavy chain complex (Igh), human Igh, and the humanized mouse Igh containing variable domains ( _VH , _DH , and _JH1-6 ). Chro: chromosome.

图2是显示工程化的小鼠与人胚胎干(ES)细胞通过电融合进行的杂交的图解。小鼠ES细胞表达标记新霉素，人ES细胞表达mCherry。胚胎杂交干细胞(杂交瘤细胞)对G418具有抗性，并对mCherry呈阳性。Figure 2 is a diagram illustrating the hybridization of engineered mouse and human embryonic stem (ES) cells via electrofusion. Mouse ES cells express the marker neomycin, while human ES cells express mCherry. Embryonic hybrid stem cells (hybridoma cells) are resistant to G418 and positive for mCherry.

图3A是显示三对PCR引物(如箭头所示)在人Igh基因V_H、D_H和J_H1-6区中的放置的图解，所述引物用于对胚胎杂交干细胞(EHS)进行基因分型。Figure 3A is a diagram showing the placement of three pairs of PCR primers (as indicated by arrows) in the _VH , _DH and _JH1-6 regions of the human Igh gene, which are used for genotyping of embryonic hybrid stem cells (EHS).

图3B是显示12个胚胎杂交干细胞(EHS)克隆的PCR结果的示例性凝胶，所述克隆使用图3A所示的引物进行了基因分型。Figure 3B is an exemplary gel showing the PCR results of 12 embryonic hybrid stem cell (EHS) clones, which were genotyped using the primers shown in Figure 3A.

图4A-图4B是显示通过HDR介导的染色体重排(HCMR)HDR：同源性定向修复在EHS细胞(图4A)中建立工程化的人源化染色体的流程的图解。用以下质粒共转染EHS细胞：5’HMCR质粒，其含有与小鼠Igh基因的5’同源的5’臂、与人Igh基因的5’同源的3’臂和pCMV-EGFP-polyA-PGK-嘌呤霉素-polyA盒；3’HMCR质粒，其含有与人Igh可变基因座的3’末端同源的5’臂、与小鼠Igh可变基因座的3’末端同源的3’臂和PGK-潮霉素-polyA盒；和4种含有靶向小鼠Igh和人Igh的5’和3’可变结构域的Cas9和sgRNA的质粒，如由所示的。或者(图4B)通过CRE-Loxp介导的染色体重排(CMCR)：四种质粒被设计成介导CMCR过程。小鼠Igh 5’(pCMV-GFP-BGH PolyA-Loxp)和3’(BGH polyA-Loxp-511-潮霉素-BGH polyA-PGK-BSD-BGHPolyA)质粒被设计成分别插入小鼠Igh可变基因座的5’和3’末端。同时，人IGH 5’(BGHpolyA-Loxp-Puro-BGH PolyA-PGK-新霉素-BGH polyA)和3’(pCMV-BGP-BGH polyA-PGK-Loxp-511)质粒被设计成分别插入人IGH可变基因座的5’和3’末端。Crewas被转染到成功整合的EHS细胞中用于CMCR。Figures 4A-4B are diagrams illustrating the process of establishing engineered humanized chromosomes in EHS cells (Figure 4A) via HDR-mediated chromosome rearrangement (HCMR). HDR: Homologous Directed Repair. EHS cells were co-transfected with the following plasmids: a 5’ HMCR plasmid containing a 5’ arm homologous to the mouse Igh gene, a 3’ arm homologous to the human Igh gene, and a pCMV-EGFP-polyA-PGK-purinemycin-polyA box; a 3’ HMCR plasmid containing a 5’ arm homologous to the 3’ end of the human Igh variable locus, a 3’ arm homologous to the 3’ end of the mouse Igh variable locus, and a PGK-hygromycin-polyA box; and four plasmids containing Cas9 and sgRNAs targeting the 5’ and 3’ variable domains of mouse Igh and human Igh, as shown. Alternatively (Figure 4B) via CRE-Loxp-mediated chromosomal rearrangement (CMCR): Four plasmids were designed to mediate the CMCR process. The mouse Igh 5’ (pCMV-GFP-BGH PolyA-Loxp) and 3’ (BGH polyA-Loxp-511-hygromycin-BGH polyA-PGK-BSD-BGHPolyA) plasmids were designed to insert into the 5’ and 3’ ends of the mouse Igh variable locus, respectively. Simultaneously, the human IGH 5’ (BGHpolyA-Loxp-Puro-BGH PolyA-PGK-neomycin-BGH polyA) and 3’ (pCMV-BGP-BGH polyA-PGK-Loxp-511) plasmids were designed to insert into the 5’ and 3’ ends of the human IGH variable locus, respectively. Crewas was transfected into successfully integrated EHS cells for CMCR.

图5A是显示用于验证工程化的人染色体的PCR引物(如箭头所示)的放置的图解。Figure 5A is a diagram showing the placement of PCR primers (as indicated by arrows) used to verify engineered human chromosomes.

图5B显示了使用图5A所示的4对引物的PCR结果。显示了192个单克隆的结果。Figure 5B shows the PCR results using the four primer pairs shown in Figure 5A. Results for 192 single clones are displayed.

图6是显示在小鼠ES细胞中用工程化的人染色体替换小鼠染色体的图解。通过暴露于秋水仙胺对携带用GFP标记的工程化的人染色体的EHS细胞真行微粉，通过离心收集微细胞，并将其与小鼠ES细胞电融合，在所述小鼠ES细胞中相应的小鼠染色体已用mCherry标记。通过荧光激活细胞分选(FACS)分离GFP+mCherry+细胞。然后培养细胞，通过FACS分离已经丢失小鼠染色体的GFP+mCherry-细胞。Figure 6 is a diagram illustrating the replacement of mouse chromosomes with engineered human chromosomes in mouse ES cells. EHS cells carrying engineered human chromosomes labeled with GFP were micronized by exposure to colchicine. Microcells were collected by centrifugation and electrofused with mouse ES cells in which the corresponding mouse chromosomes were labeled with mCherry. GFP+mCherry+ cells were separated by fluorescence-activated cell sorting (FACS). The cells were then cultured, and GFP+mCherry- cells that had lost their mouse chromosomes were separated by FACS.

图7A显示了用于验证Igh人源化小鼠的PCR引物(如箭头所示)的放置。Figure 7A shows the placement of PCR primers (as indicated by arrows) used to validate Igh humanized mice.

图7B显示了使用图7A所示的7对引物对示例性Igh人源化小鼠的PCR结果。Figure 7B shows the PCR results of an exemplary Igh humanized mouse using the 7 primer pairs shown in Figure 7A.

图8A显示了Igh人源化小鼠的荧光原位杂交(FISH)结果。Figure 8A shows the fluorescence in situ hybridization (FISH) results of Igh humanized mice.

图8B显示了Igh人源化小鼠的G-显带核型分析。Figure 8B shows the G-banding karyotype analysis of Igh humanized mice.

图9A显示了Igh人源化小鼠的IGH-V的全基因组测序(WGS)分析。显示了位于人Igh的V_H区的每个可变(V)基因区段的WGS序列的拷贝数。Figure 9A shows the whole-genome sequencing (WGS) analysis of IGH-V from Igh humanized mice. The copy number of the WGS sequence for each variable (V) gene segment located in the _VH region of human Igh is shown.

图9B显示了Igh人源化小鼠的IGH-D和IGH-J的WGS分析。显示了位于人Igh的D_H和J_H1-6区上的每个多样性(D)基因区段和6个连接(J)区段的WGS序列的拷贝数。Figure 9B shows the WGS analysis of IGH-D and IGH-J in humanized Igh mice. The copy number of the WGS sequence is shown for each diversity (D) gene segment and the six linking (J) segments located in the _DH and _JH1-6 regions of human Igh.

图10显示了小鼠Igk基因的可变结构域的人源化。Figure 10 shows the humanization of the variable domain of the mouse Igk gene.

图11A-图11B显示了Igk人源化小鼠的PCR验证结果。图11A显示了用于PCR实验的设计引物的位置。图11B，使用图A中列出的5对引物对于Igk人源化小鼠的PCR结果。Figures 11A-11B show the PCR validation results for Igk humanized mice. Figure 11A shows the positions of the primers designed for the PCR experiments. Figure 11B shows the PCR results for Igk humanized mice using the 5 primer pairs listed in Figure A.

图12显示了Igk人源化小鼠的WGS分析结果。WGS序列中位于人IGK基因的V_K和J_k区段上的每个抗体基因的拷贝数。Figure 12 shows the WGS analysis results of Igk humanized mice. Copy number of each antibody gene located in the _VK and _Jk regions of the human IGK gene in the WGS sequence.

具体实施方式Detailed Implementation

本公开提供了用于工程化染色体的方法，其包括在染色体之间转移大的序列片段。使用本文公开的方法，可将至少5兆对(MB)的序列从非无色体(achromosomal)模板转移到靶染色体上。本文公开的方法也可用于产生染色体重排，诸如倒位和易位。本文还提供了通过本公开的方法产生的工程化的染色体，以及包含这些工程化的染色体的细胞和动物，以及使用它们的方法。This disclosure provides methods for engineering chromosomes, which include transferring large sequence fragments between chromosomes. Using the methods disclosed herein, at least 5 trillion pairs (MB) of sequences can be transferred from a non-achromosomal template to a target chromosome. The methods disclosed herein can also be used to generate chromosomal rearrangements, such as inversions and translocations. This document also provides engineered chromosomes generated by the methods of this disclosure, as well as cells and animals containing these engineered chromosomes, and methods for using them.

操纵基因或染色体的大片段为基础和翻译研究以及疗法的发展带来了巨大的希望。遗传人源化是最流行的应用之一，其中模型生物诸如小鼠的基因被其人对应物所替代。例如，携带人源化Ig基因的小鼠为在小鼠背景中产生人抗体提供了强大的平台。然而，大片段操作仍然是基因编辑领域最重要的挑战之一，因为无法获得能够携带高达百万碱基对(MB)的染色体大片段的递送载体。常规递送载体，诸如腺相关病毒载体或其它病毒载体的有效载荷受到载体所源自的病毒基因组大小的限制。Manipulating large segments of genes or chromosomes holds immense promise for basic and translational research and the development of therapies. Genetic humanization is one of the most popular applications, where genes in model organisms such as mice are replaced with their human counterparts. For example, mice carrying humanized Ig genes provide a powerful platform for generating human antibodies in a mouse context. However, manipulating large segments remains one of the most significant challenges in gene editing because delivery vectors capable of carrying up to one million base pairs (MB) of chromosome fragments are unavailable. The payload capacity of conventional delivery vectors, such as adeno-associated virus vectors or other viral vectors, is limited by the size of the viral genome from which the vector originates.

本文公开的方法允许染色体间大序列的高效原位置换。这些方法被称为跨物种大规模片段原位替换技术(Massive fragment Across Species In situ ReplacementTechnolog)(MASIRT)，可用于在单个编辑步骤中替换大部分染色体，在某些情况下可替换高达兆碱基对(MB)的序列。这些方法可用于高效地在物种之间或单个物种的染色体之间转移大序列。在一个实例中，MASIRT用于获得针对小鼠Igh基因的可变结构域人源化的小鼠。人和小鼠在抗体基因的排列和表达方面表现出高度的相似性，并且在这些物种之间重链的基因组结构也是相似的。因此，使用MASIRT将约3MB的含有所有V_H、D_H和J_H基因区段的小鼠基因组序列替换为约1Mb的含有等同人基因片段的连续人基因组序列，获得了人源化小鼠Igh基因。The methods disclosed in this paper allow for efficient in-situ substitution of large sequences between chromosomes. These methods, known as Massive Fragment Across Species In Situ Replacement Technology (MASIRT), can be used to replace large portions of a chromosome in a single editing step, and in some cases, sequences up to megabase pairs (MB). These methods can be used to efficiently transfer large sequences between species or between chromosomes of a single species. In one example, MASIRT was used to obtain humanized mice with variable domains targeting the mouse Igh gene. Humans and mice exhibit high similarity in the arrangement and expression of antibody genes, and the genomic structure of heavy chains is also similar across these species. Therefore, using MASIRT, approximately 3 MB of mouse genome sequence containing all _VH , _DH , and _JH gene segments was replaced with approximately 1 Mb of continuous human genome sequence containing equivalent human gene segments, resulting in the humanized mouse Igh gene.

与仅作用于胚胎干细胞的其它方法不同，本公开的方法可有利地用于替换受精卵中的大序列。胚胎干细胞系通常不适用于除小鼠以外的物种。相反，许多哺乳动物可获得受精卵，因此本公开的方法可用于获得具有人源化的基因或基因片段的动物，诸如兔或牛。另外，本文公开的方法可用于一次替换大的序列片段，例如高达至少5MB的序列，约为本领域已知的其它方法所使用的方法的五倍。这提高了效率，并且减少了产生具有人源化基因的物所需的时间和成本。例如，仅用3轮替换就可产生Igh人源化小鼠。另一个有利方面是，当用于小鼠时，每次替换只需要1-3个月，这只是本领域已知的其它方法所需时间量的一半或三分之一。Unlike other methods that target only embryonic stem cells, the method disclosed herein can be advantageously used to replace large sequences in fertilized eggs. Embryonic stem cell lines are generally not applicable to species other than mice. Instead, fertilized eggs are available in many mammals, thus the method disclosed herein can be used to obtain animals with humanized genes or gene fragments, such as rabbits or cattle. Furthermore, the method disclosed herein can be used to replace large sequence fragments, such as sequences up to at least 5 MB, in a single replacement, approximately five times faster than other methods known in the art. This increases efficiency and reduces the time and cost required to produce animals with humanized genes. For example, Igh humanized mice can be produced with only three rounds of replacement. Another advantage is that, when used in mice, each replacement takes only 1-3 months, which is only half or one-third the time required by other methods known in the art.

定义definition

染色体是包含生物体的全部或部分遗传物质的长DNA分子。大多数真核生物染色体包括称为组蛋白的包装蛋白，其在伴侣蛋白的帮助下，与DNA分子结合并压缩其以保持其完整性。真核生物染色体由与蛋白质缔合的长线性DNA分子组成，形成称为染色质的蛋白质和DNA的紧密复合物。每条染色体都有一个着丝粒，着丝粒上伸出一条或两条臂。染色体的臂终止于端粒，所述端粒是与特化蛋白质缔合的重复核苷酸序列的区域，其保护染色体DNA的末端区域免于进行性降解，并通过防止DNA修复系统将DNA链的最末端误认为双链断裂来确保线性染色体的完整性。Chromosomes are long DNA molecules that contain all or part of an organism's genetic material. Most eukaryotic chromosomes include packaging proteins called histones, which, with the help of chaperone proteins, bind to and compress the DNA molecule to maintain its integrity. Eukaryotic chromosomes consist of long, linear DNA molecules associated with proteins, forming a tight complex of proteins and DNA called chromatin. Each chromosome has a centromere, from which one or two arms extend. The arms of a chromosome terminate at telomeres, regions of repetitive nucleotide sequences associated with specialized proteins that protect the ends of chromosomal DNA from progressive degradation and ensure the integrity of linear chromosomes by preventing the DNA repair system from mistaking the very ends of the DNA strand for double-strand breaks.

“基因”包括编码基因产物(例如蛋白质或非编码RNA)的DNA区域，以及调控基因产物产生的所有DNA区域，无论此类调控序列是否与编码和/或转录序列相邻。因此，基因可包括调控元件序列，包括但不一定限于启动子序列、终止子、翻译调控序列诸如核糖体结合位点和内部核糖体进入位点、增强子、沉默子、隔离子(insulator)、边界元件、复制起点、基质附着位点和基因座控制区。编码序列在转录或转录和翻译时编码基因产物。本公开的编码序列可包含片段，并且不需要包含全长开放阅读框架。基因可包括被转录的链以及含有反密码子的互补链。基因还可包括外显子(其可包括蛋白质编码序列和非翻译区)以及内含子(其通过剪接而被从最终的RNA产物中除去)。A "gene" includes a DNA region that encodes a gene product (e.g., a protein or non-coding RNA) and all DNA regions that regulate the production of the gene product, regardless of whether such regulatory sequences are adjacent to coding and/or transcribed sequences. Therefore, a gene may include regulatory element sequences, including, but not limited to, promoter sequences, terminators, translation regulatory sequences such as ribosome binding sites and internal ribosome entry sites, enhancers, silencers, insulators, boundary elements, origins of replication, matrix attachment sites, and locus control regions. The coding sequence encodes the gene product during transcription or transcription and translation. The coding sequences of this disclosure may comprise fragments and do not necessarily contain a full-length open reading frame. A gene may include the transcribed strand and a complementary strand containing anticodons. A gene may also include exons (which may include protein-coding sequences and untranslated regions) and introns (which are removed from the final RNA product by splicing).

本文使用的术语“启动子”可以指位于编码重组产物的DNA序列邻近的DNA序列。启动子优选与相邻的DNA序列有效连接。与不存在启动子时表达的量相比，启动子通常增加从DNA序列表达的蛋白质或RNA产物的量。来自一种生物体的启动子可用于增强从源自另一种生物体的DNA序列的蛋白质表达。例如，脊椎动物启动子可用于在脊椎动物中表达水母GFP。此外，一个启动子元件可增加串联连接的多个DNA序列表达的重组产物的量。因此，一个启动子元件可增强一种或多种重组产物的表达。多个启动子元件是本领域普通技术人员熟知的。As used herein, the term "promoter" can refer to a DNA sequence located adjacent to the DNA sequence encoding a recombinant product. A promoter is preferably effectively linked to an adjacent DNA sequence. A promoter typically increases the amount of protein or RNA product expressed from a DNA sequence compared to the amount expressed in the absence of a promoter. Promoters from one organism can be used to enhance protein expression from DNA sequences derived from another organism. For example, vertebrate promoters can be used to express GFP from jellyfish in vertebrates. Furthermore, a promoter element can increase the amount of recombinant product expressed from multiple DNA sequences linked in tandem. Thus, a promoter element can enhance the expression of one or more recombinant products. Multiple promoter elements are well known to those skilled in the art.

本文使用的术语“增强子”可指位于编码蛋白质或RNA产物的DNA序列邻近的DNA序列，或者位于编码蛋白质或RNA产物的DNA序列的远端的DNA序列。增强子元件通常位于启动子元件的上游，但也可位于编码DNA序列的下游或内部，诸如内含子内。在一些情况下，增强子可位于距离其所调控表达的基因数千碱基或甚至数十或数百千碱基处。增强子元件可使从DNA序列表达的蛋白质或RNA产物的量增加超过由启动子元件提供的增加的表达。本领域普通技术人员很容易获得多种增强子元件。As used herein, the term "enhancer" can refer to a DNA sequence located adjacent to or distal to the DNA sequence encoding a protein or RNA product. Enhancer elements are typically located upstream of promoter elements, but can also be located downstream of or within the coding DNA sequence, such as within introns. In some cases, enhancers can be located thousands, tens, or even hundreds of kilobases away from the gene they regulate. Enhancer elements can increase the amount of protein or RNA product expressed from the DNA sequence beyond the increase provided by promoter elements. Various enhancer elements are readily available to those skilled in the art.

如本文中所用，术语“外源染色体”或“外源序列”是指相对于动物基因组的外来染色体或外来序列。例如，在小鼠细胞(其中除一条人染色体外，所有染色体都是小鼠染色体)中，人染色体是外源染色体。类似地，在其中一部分小鼠序列已被人序列替代的小鼠染色体中，人序列被称为外源序列。类似地，“内源的”是指源自生物体的染色体或序列，诸如上文所述的小鼠染色体或序列。As used herein, the terms “exogenous chromosome” or “exogenous sequence” refer to a foreign chromosome or sequence relative to the animal genome. For example, in mouse cells (where all chromosomes except one human chromosome are mouse chromosomes), human chromosomes are exogenous chromosomes. Similarly, in mouse chromosomes where a portion of the mouse sequence has been replaced by a human sequence, the human sequence is referred to as an exogenous sequence. Similarly, “endogenous” refers to chromosomes or sequences derived from an organism, such as the mouse chromosomes or sequences described above.

如本文中所用，术语“同源重组”是指一种类型的遗传重组，其中核苷酸序列在称为同源序列或同源臂的两个相似或相同的DNA分子之间交换。同源重组通常涉及以下基本步骤：在两条DNA链上发生双链断裂(DSB)后，DSB的5’末端周围的DNA区段在称为切除的过程中被切掉。在随后的链侵入步骤中，断裂的DNA分子的悬突3’端“侵入”未断裂的相似或相同(或同源)的DNA分子，例如同源臂。在链侵入后，进一步的事件顺序可以遵循两条途径-DSBR(双链断裂修复)途径或SDSA(合成依赖性链退火)途径中的任一途径。As used herein, the term “homologous recombination” refers to a type of genetic recombination in which nucleotide sequences are exchanged between two similar or identical DNA molecules, called homologous sequences or homologous arms. Homologous recombination typically involves the following basic steps: After a double-strand break (DSB) occurs on both DNA strands, a segment of DNA around the 5’ end of the DSB is removed in a process called excision. In the subsequent strand invasion step, the overhanging 3’ end of the broken DNA molecule “invades” an unbroken similar or identical (or homologous) DNA molecule, such as a homologous arm. After strand invasion, the further sequence of events can follow either of two pathways—the DSBR (double-strand break repair) pathway or the SDSA (synthesis-dependent strand annealing) pathway.

如本文中所用，“DNA修复途径”是指允许细胞响应于DNA损伤，诸如DNA中的单链或双链断裂的检测而维持基因组完整性功能的细胞机制。取决于DNA损伤的类型和程度，以及细胞周期阶段，DNA修复途径可包括但不限于以下途径，诸如切除、规范同源定向修复(规范HDR)、同源重组(HR)、替代同源定向修复(alt-HDR)、双链断裂修复(DSBR)、单链退火(SSA)、合成依赖性链退火(SDSA)、断裂诱导的复制(BIR)、替代末端连接(alt-EJ)、微同源性介导的末端连接(MMEJ)、DNA合成依赖性微同源性介导的末端连接(SD-MMEJ)、非同源末端连接(NHEJ)途径，诸如规范非同源末端连接(C-NHEJ)修复、替代非同源末端连接(A-NHEJ)途径、跨损伤DNA合成(TLS)修复、碱基切除修复(BER)、核苷酸切除修复(NER)、错配修复(MMR)、DNA损伤应答(DDR)、平末端连接、单链断裂修复(SSBR)、链间交联修复(ICL)和范科尼贫血途径(Fanconi Anemia pathway)(FA)。As used herein, “DNA repair pathways” refer to cellular mechanisms that allow cells to maintain genomic integrity in response to DNA damage, such as the detection of single-strand or double-strand breaks in DNA. Depending on the type and extent of DNA damage, and the stage of the cell cycle, DNA repair pathways may include, but are not limited to, excision, canonical homology-directed repair (canonical HDR), homology recombination (HR), alternative homology-directed repair (alt-HDR), double-strand break repair (DSBR), single-strand annealing (SSA), synthesis-dependent strand annealing (SDSA), break-induced replication (BIR), alternative end joining (alt-EJ), microhomology-mediated end joining (MMEJ), and DNA synthesis-dependent microhomology-mediated end joining (S… D-MMEJ), non-homologous end joining (NHEJ) pathways, such as canonical non-homologous end joining (C-NHEJ) repair, alternative non-homologous end joining (A-NHEJ) pathway, trans-damage DNA synthesis (TLS) repair, base excision repair (BER), nucleotide excision repair (NER), mismatch repair (MMR), DNA damage response (DDR), blunt end joining, single strand break repair (SSBR), interstrand crosslink repair (ICL), and the Fanconi anemia pathway (FA).

如本文中所用，同源定向修复(HDR)是指使用同源核酸(例如，姊妹染色单体或外源核酸)修复DNA损伤的过程。在正常细胞中，HDR通常涉及一系列步骤，诸如识别断裂、稳定断裂、切除、稳定单链DNA、形成DNA交叉中间体、拆分交叉中间体和连接。As used in this article, homology-directed repair (HDR) refers to the process of repairing DNA damage using homologous nucleic acids (e.g., sister chromatids or exogenous nucleic acids). In normal cells, HDR typically involves a series of steps such as identifying breaks, stabilizing breaks, excision, stabilizing single-stranded DNA, forming DNA cross-linking intermediates, splitting cross-linking intermediates, and joining.

如本文中所用，“同源物”是指执行相同生物学功能的一组蛋白质中的蛋白质，例如属于相同蛋白质家族并提供共同性状或执行相同或相似生物功能的蛋白质。同源物由同源基因表达。同源基因是编码与由第二基因编码的蛋白具有相同或相似生物功能的蛋白质的基因。同源基因可通过物种形成事件(直系同源物)或通过遗传复制事件(旁系同源物)产生。“直系同源物”是指不同物种中通过物种形成从共同的祖先基因进化而来的一组同源基因。正常情况下，直系同源物在进化过程中保持相同的功能。“旁系同源物”是指同一物种中由于基因复制而彼此趋异的一组同源基因。因此，同源基因可来自相同或不同的生物体。同源基因包括自然产生的等位基因和人工产生的变体。同源蛋白质之间的同一性百分比将取决于蛋白质的来源，以及蛋白质所源自的物种趋异的程度。来自亲缘关系更近的物种(例如，诸如人和小鼠的两种哺乳动物)的同源蛋白质通常比来自亲缘关系更远的物种(例如，鸡和小鼠)的蛋白质更相似。当最佳比对时，同源蛋白质在蛋白质全长上通常具有至少约40％的同一性、约50％的同一性、约60％的同一性，在某一情况下具有至少约70％，例如约80％，甚至至少约90％的同一性。在其它情况下，例如当比较来自高度趋异的物种的蛋白质时，同源蛋白质在保守蛋白质结构域(诸如DNA结合结构域)的长度上将具有至少约40％的同一性、约50％的同一性、约60％的同一性、约70％的同一性、约80的％同一性或约90％的同一性。As used herein, a “homologous” is a protein within a group of proteins that perform the same biological function, such as proteins belonging to the same protein family and providing common traits or performing the same or similar biological functions. Homologous proteins are expressed by homologous genes. A homologous gene is a gene that encodes a protein that has the same or similar biological function as a protein encoded by a second gene. Homologous genes can arise through speciation events (orthologous proteins) or through genetic replication events (paralogous proteins). An “orthologous protein” is a group of homologous genes that evolved from a common ancestral gene in different species through speciation. Normally, orthologous proteins maintain the same function during evolution. A “paralogous protein” is a group of homologous genes in the same species that diverge from each other due to gene replication. Therefore, homologous genes can originate from the same or different organisms. Homologous genes include naturally occurring alleles and artificially generated variants. The percentage of identity between homologous proteins will depend on the origin of the protein and the degree of divergence between the species from which the protein originated. Homologous proteins from more closely related species (e.g., two mammals such as humans and mice) are generally more similar than proteins from more distantly related species (e.g., chickens and mice). When optimally aligned, homologous proteins typically share at least about 40%, about 50%, or about 60% similarity across the entire protein length, and in some cases at least about 70%, such as about 80%, or even at least about 90%. In other cases, such as when comparing proteins from highly divergent species, homologous proteins will share at least about 40%, about 50%, about 60%, about 70%, about 80%, or about 90% similarity along the length of conserved protein domains (such as DNA-binding domains).

通过例如手动或通过使用基于计算机的工具比较DNA或氨基酸序列来鉴定同源基因或蛋白质，所述基于计算机的工具使用已知的基于同源性的搜索算法，诸如通常已知的并被称为BLAST、FASTA和Smith-Waterman的那些搜索算法。局部序列比对程序(例如BLAST)可用于搜索序列数据库以寻找相似的序列，并且汇总期望值(summary Expectationvalue)(E值)用于测量序列碱基相似性。因为对于特定生物体而言，具有最佳E值的蛋白质命中可能不一定是直系同源物，即具有相同的功能，或者是唯一的直系同源物，所以可使用互逆查询(reciprocal query)来过滤具有显著E值的命中序列，用于直系同源物鉴定。互逆查询需要针对来自基础生物的氨基酸序列数据库搜索与查询蛋白质序列相似的显著命中。当互逆查询的最佳命中是查询蛋白质本身或在物种形成后由复制的基因编码的蛋白质时，命中可以被识别为直系同源物。Homologous genes or proteins are identified by comparing DNA or amino acid sequences, either manually or using computer-based tools. These computer-based tools employ known homology-based search algorithms, such as those commonly known as BLAST, FASTA, and Smith-Waterman. Local sequence alignment procedures (e.g., BLAST) can be used to search sequence databases for similar sequences, and a summary expectation value (E-value) is used to measure sequence base similarity. Because a protein hit with the best E-value for a particular organism may not necessarily be an ortholog (i.e., having the same function) or a unique ortholog, a reciprocal query can be used to filter hit sequences with significant E-values for ortholog identification. A reciprocal query requires searching a database of amino acid sequences from the basal organism for significant hits similar to the query protein sequence. A hit can be identified as an ortholog when the best hit of the reciprocal query is the query protein itself or a protein encoded by a gene that replicates after speciation.

如本文中所用，“同一性百分比”意指两个最佳比对的DNA或蛋白质区段在整个组分(例如核苷酸序列或氨基酸序列)的比对窗口中不变的程度。测试序列和参考序列的比对片段的“同一性分数”是两个比对区段的序列所共有的相同成分的数量除以比对窗口上参考区段中序列成分的总数，所述比对窗口是完整测试序列或完整参考序列中的较小者。“同一性百分比”(“同一性％”)是同一性分数乘以100。这种最佳比对被理解成被认为是DNA序列的局部比对。对于蛋白质比对，蛋白质序列的局部比对应该允许引入缺口以实现最佳比对。可在不包括由比对本身引入的缺口的比对长度上计算同一性百分比。As used herein, “identity percentage” refers to the degree to which two optimally aligned DNA or protein segments remain unchanged throughout an alignment window containing the entire component (e.g., nucleotide or amino acid sequence). The “identity score” of the aligned segments of the test and reference sequences is the number of common components shared by the sequences of the two aligned segments divided by the total number of sequence components in the reference segment within the alignment window, which is the smaller of the complete test or complete reference sequences. The “identity percentage” (“identity %”) is the identity score multiplied by 100. This optimal alignment is understood to be considered a local alignment of the DNA sequence. For protein alignment, local alignment of the protein sequence should allow for the introduction of gaps to achieve optimal alignment. The identity percentage can be calculated over the alignment length excluding gaps introduced by the alignment itself.

如本文中所用，“特异于”当用于指核苷酸序列诸如引导RNA的同源臂或靶向序列时，是指与另一核苷酸序列或另一核苷酸序列的反向互补序列相同或基本相同的序列。“特异于”另一序列的序列能够通过沃尔森-克里克碱基配对与另一序列或其反向互补序列杂交。因此，本领域技术人员将理解，对另一序列特异的序列与另一序列或其反向互补序列高度相似，但不需要完全相同。例如，与另一序列具有至少80％、至少85％、至少90％、至少95％、至少97％或至少99％同一性的序列，如果其能够与另一序列杂交，仍然对该序列具有特异性。作为另一个实例，根据靶向序列中错配的位置，引导核酸靶序列可包含1个、2个、3个或更多个与靶序列的错配，并且如果其能够将包含gNA和核酸内切酶的核糖核蛋白复合物靶向到靶序列，则其仍然对靶序列具有特异性。As used herein, “specific to” when referring to a nucleotide sequence such as the homologous arm of a guide RNA or a target sequence means a sequence that is identical or substantially identical to another nucleotide sequence or its inverse complementary sequence. A sequence “specific to” another sequence is capable of hybridizing with the other sequence or its inverse complementary sequence via Worson-Crick base pairing. Therefore, those skilled in the art will understand that a sequence specific to another sequence is highly similar to the other sequence or its inverse complementary sequence, but not necessarily identical. For example, a sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% identity with another sequence remains specific to that sequence if it can hybridize with it. As another example, a guide nucleic acid target sequence may contain one, two, three, or more mismatches with the target sequence, depending on the location of the mismatch in the target sequence, and it remains specific to the target sequence if it can target a ribonucleoprotein complex containing gNA and an endonuclease to the target sequence.

如本文中所用，“选择”是指使用本领域已知的任何方法分开两个不同产物的群体。当其应用于细胞、染色体或序列时，可基于标记诸如选择标记进行选择。选择表达选择标记的细胞包括在选择性培养基中培养包括表达标记的细胞和不表达标记的细胞的混合细胞群，从而杀死不表达标记的细胞或抑制其生长。通过将包含标记的序列或染色体置于细胞内并应用选择性方案，可以类似地选择它们。类似地，可以基于检测标记(如荧光蛋白)进行选择。可使用本领域已知的方法，诸如荧光激活细胞分选术(FACS)，基于检测标记，从混合细胞群中物理去除表达检测标记的细胞。可选地，或者另外地，可选地，可以稀释混合细胞群，使得可以分离培养单细胞，并且测定源自分离的细胞的克隆的一种或多种性状诸如标记的存在。As used herein, “selection” refers to separating a population of two distinct products using any method known in the art. When applied to cells, chromosomes, or sequences, selection can be based on markers such as selection markers. Selecting cells expressing a selection marker involves culturing a mixed cell population comprising cells expressing the marker and cells not expressing the marker in a selective medium, thereby killing cells not expressing the marker or inhibiting their growth. They can be similarly selected by placing a marker-containing sequence or chromosome within cells and applying a selection protocol. Similarly, selection can be based on detection markers (such as fluorescent proteins). Methods known in the art, such as fluorescence-activated cell sorting (FACS), can be used to physically remove cells expressing the detection marker from a mixed cell population based on the detection marker. Optionally, or additionally, the mixed cell population can be diluted to allow the isolation and culture of single cells, and the presence of one or more traits, such as markers, of clones derived from the isolated cells can be determined.

如本文中所用，“源自”是指分子实体例如核酸或蛋白质的来源或起源。分子实体的来源可以是天然存在的、重组的、未纯化的或纯化的分子实体。例如，源自第二多肽的多肽可包含与第二蛋白质的氨基酸序列相同或基本相似，例如与其具有超过50％的同源性的氨基酸序列。所来源的分子实体，例如核酸或蛋白质，可包含一个或多个修饰，例如一个或多个氨基酸或核苷酸变化。As used herein, “derived from” refers to the source or origin of a molecular entity, such as a nucleic acid or protein. The source of a molecular entity can be a naturally occurring, recombinant, unpurified, or purified molecular entity. For example, a polypeptide derived from a second polypeptide may contain an amino acid sequence that is identical or substantially similar to that of a second protein, such as having more than 50% homology. The derived molecular entity, such as a nucleic acid or protein, may contain one or more modifications, such as one or more amino acid or nucleotide changes.

“分离自”是指从其来源或起源纯化、取出或分离的分子实体。"Separated from" refers to a molecular entity that has been purified, removed, or separated from its source or origin.

“天然存在的”序列是在自然界中存在的至少一种物种中发现的序列。A “naturally occurring” sequence is a sequence found in at least one species that exists in nature.

“人工序列”是指自然界中不存在的序列。人工序列可与天然序列类似，但相对于天然存在的序列含有一个或多个改变。可选地，人工序列可能与任何天然存在的序列几乎没有或没有相似性。嵌合或重组序列是一类人工序列，其中来自不同来源的两个序列，或从未发现彼此相邻的两个序列，被可操作地连接在一起。"Artificial sequences" are sequences that do not exist in nature. Artificial sequences may resemble natural sequences, but contain one or more alterations relative to naturally occurring sequences. Optionally, artificial sequences may have little or no similarity to any naturally occurring sequence. Chimeric or recombinant sequences are a class of artificial sequences in which two sequences from different sources, or two sequences that have never been found to be adjacent to each other, are operatively joined together.

“有效连接的(Operatively linked)”或“可操作地连接的(operably linked)”是指遗传元件的并置，其中元件处于允许它们以预期方式操作的关系中。例如，如果启动子有助于启动编码序列的转录，则启动子与编码区有效连接。只要保持这种功能关系，启动子与编码区之间可以存在间插残基。"Operatively linked" or "operably linked" refers to the juxtaposition of genetic elements in a relationship that allows them to function in the intended manner. For example, if a promoter facilitates the initiation of transcription of a coding sequence, then the promoter is effectively linked to the coding region. Intercalated residues may exist between the promoter and the coding region as long as this functional relationship is maintained.

本文使用以下分类来指代干细胞。就发育阶段而言，最具多能性和最早的是“胚胎干(ES)细胞”或“ES细胞”。ES细胞可以是新鲜来源的原代细胞，或来自ES细胞系。来自体细胞组织(除生殖细胞组织外的每种组织)的所有其它干细胞被概括地定义为“体细胞干细胞”，但通常可能被称为以下任何或所有细胞：“成体干细胞”、“成熟干细胞”、“祖细胞”、“祖干细胞”、“前体细胞”和“前体干细胞”。另一类非胚胎干细胞被定义为“生殖系干细胞”。最后，本文将非干细胞描述为“成熟细胞”，但也称为“分化细胞”、“成熟分化细胞”、“终末分化细胞”和“体细胞”。成熟细胞也可以是源自组织或永生细胞系或肿瘤来源细胞系的原代分离细胞。本发明还包括“成熟细胞的前体形式”，其包括不符合干细胞或成熟细胞的常用科学定义的所有细胞。可在体外长时间培养ES细胞，并且在将其插入/注射到正常胚泡的腔中之前，诱导其恢复胚胎发育的正常程序，以分化成成年动物的所有细胞类型，包括生殖细胞。This document uses the following classification to refer to stem cells. In terms of developmental stage, the most pluripotent and earliest are “embryonic stem (ES) cells” or “ES cells.” ES cells can be primary cells of fresh origin or derived from ES cell lines. All other stem cells derived from somatic tissues (every tissue except germ cell tissues) are broadly defined as “somatic stem cells,” but may generally be referred to as any or all of the following: “adult stem cells,” “mature stem cells,” “progenitor cells,” “precursor cells,” and “precursor stem cells.” Another class of non-embryonic stem cells is defined as “germline stem cells.” Finally, this document describes non-stem cells as “mature cells,” but also refers to them as “differentiated cells,” “mature differentiated cells,” “terminally differentiated cells,” and “somatic cells.” Mature cells can also be primary isolated cells derived from tissues or immortalized cell lines or tumor-derived cell lines. This invention also includes “precursor forms of mature cells,” which include all cells that do not conform to the commonly used scientific definitions of stem cells or mature cells. ES cells can be cultured in vitro for extended periods and induced to revert to the normal program of embryonic development to differentiate into all cell types of adult animals, including germ cells, before being inserted/injected into the lumen of a normal blastocyst.

如本文中所用，“杂交细胞”是指含有来自两个基因组的元件的细胞。本领域技术人员将会理解，杂交细胞可包含来自不同来源的两个完整或接近完整的基因组。杂交细胞可含有来自不同来源的两个完整或接近完整的基因组。可选地，杂交细胞可含有一种来源的完整基因组，和来自第二来源的仅几条染色体、一条染色体或一条染色体的一部分。含有上述两个极端之间的两个基因组的元件的任何混合物的细胞仍被认为是杂交细胞。杂种中的两个基因组可来自不同的个体，同一物种的不同品系或不同的物种。杂交细胞可通过本领域已知的任何方法产生。这些技术包括但不限于细胞融合和微细胞介导的染色体转移(MMCT)，所述微细胞介导的染色体转移即将少量染色体从一个细胞转移到另一个细胞。As used herein, a "hybrid cell" refers to a cell containing elements from two genomes. Those skilled in the art will understand that a hybrid cell may contain two complete or nearly complete genomes from different sources. Alternatively, a hybrid cell may contain a complete genome from one source and only a few chromosomes, one chromosome, or a portion of one chromosome from a second source. Cells containing any mixture of elements from the two genomes between these two extremes are still considered hybrid cells. The two genomes in a hybrid may originate from different individuals, different strains of the same species, or different species. Hybrid cells can be produced by any method known in the art. These techniques include, but are not limited to, cell fusion and microcell-mediated chromosome transfer (MMCT), which involves transferring a small number of chromosomes from one cell to another.

如本文中所用，“杂交胚胎干(EHS)”细胞是指具有胚胎干细胞特性的杂交细胞。EHS细胞可通过来自两个不同物种的ES细胞的融合产生，或者通过MMCT介导的染色体从一个物种的细胞到另一个物种的干细胞的染色体转移产生。As used in this article, “hybrid embryonic stem (EHS)” cells refer to hybrid cells that possess embryonic stem cell characteristics. EHS cells can be generated by the fusion of ES cells from two different species, or by MMCT-mediated chromosome transfer from cells of one species to stem cells of another species.

本文所用的“癌症”是指特征在于本领域已知的不受调控的细胞生长或复制的疾病、疾患、性状、基因型或表型。癌症包括实体瘤和液体瘤。示例性癌症包括但不限于白血病、乳腺癌、骨癌、脑癌、头颈癌、视网膜癌、食道癌、胃癌、多发性骨髓瘤、卵巢癌、子宫癌、甲状腺癌、睾丸癌、子宫内膜癌、黑色素瘤、结直肠癌、肺癌、膀胱癌、前列腺癌、肺癌(包括小细胞和非小细胞肺癌两者)、胰腺癌、肉瘤、宫颈癌、头颈癌和皮肤癌。As used herein, “cancer” refers to a disease, ailment, trait, genotype, or phenotype characterized by unregulated cell growth or replication known in the art. Cancer includes solid tumors and liquid tumors. Exemplary cancers include, but are not limited to, leukemia, breast cancer, bone cancer, brain cancer, head and neck cancer, retinal cancer, esophageal cancer, gastric cancer, multiple myeloma, ovarian cancer, uterine cancer, thyroid cancer, testicular cancer, endometrial cancer, melanoma, colorectal cancer, lung cancer, bladder cancer, prostate cancer, lung cancer (including both small cell and non-small cell lung cancer), pancreatic cancer, sarcoma, cervical cancer, head and neck cancer, and skin cancer.

本说明书中提及的所有出版物、专利和专利申请通过引用并入本文，其程度如同每个单独出版物、专利或专利申请具体地和单独地表示为通过引用并入。All publications, patents and patent applications mentioned in this specification are incorporated herein by reference to the extent that each individual publication, patent or patent application is specifically and individually indicated as incorporated by reference.

工程化染色体的方法Methods for engineered chromosomes

本公开提供了使用模板染色体、靶染色体、一种或多种核酸分子诸如载体或质粒以及同源定向修复来工程化染色体的方法。核酸酶用于产生双链断裂，其位于模板染色体中模板序列的侧翼，并位于靶序列的侧翼或靶染色体中的靶位置。一种或多种包含标记和同源臂的核酸分子用于指导用模板序列替换靶序列，在靶位置插入模板序列，或通过在双链断裂位点连接靶标与模板序列来产生染色体重排，所述同源臂包含靶染色体和模板染色体的序列。This disclosure provides a method for engineering chromosomes using a template chromosome, a target chromosome, one or more nucleic acid molecules such as vectors or plasmids, and homology-directed repair. A nuclease is used to generate double-strand breaks, located flanking a template sequence in the template chromosome and flanking a target sequence or at a target location in the target chromosome. One or more nucleic acid molecules comprising a marker and homologous arms are used to guide the replacement of the target sequence with the template sequence, the insertion of the template sequence at the target location, or the generation of chromosomal rearrangements by connecting the target and template sequences at double-strand break sites, wherein the homologous arms contain sequences of both the target and template chromosomes.

在一些实施方案中，所述方法包括用模板序列替换靶序列，即通过插入模板序列来删除靶序列。In some implementations, the method includes replacing the target sequence with a template sequence, i.e., deleting the target sequence by inserting a template sequence.

在一些实施方案中，该方法包括用模板序列替换靶序列。任何合适的模板序列和任何合适的靶序列都可用于本文所述的方法。例如，该方法可用于用同源人序列替换模式生物的部分染色体，从而使该部分模式生物的基因组人源化。或者，可在靶位置插入大序列，而几乎没有或没有靶序列的缺失。In some implementations, the method includes replacing the target sequence with a template sequence. Any suitable template sequence and any suitable target sequence can be used in the methods described herein. For example, the method can be used to replace a portion of a chromosome in a model organism with a homologous human sequence, thereby humanizing that portion of the model organism's genome. Alternatively, a large sequence can be inserted at the target site with little or no deletion of the target sequence.

在一些实施方案中，本公开提供了产生工程化的染色体的方法，其包括：(a)提供细胞，其包含含有靶序列的靶染色体和含有模板序列的模板染色体；(b)使细胞与(i)第一核酸分子和(ii)第二核酸分子接触，所述第一核酸分子从5’至3’包含5’同源臂、至少第一标记和3’同源臂，所述5’同源臂含有靶序列5’末端上游的核苷酸序列，所述3’同源臂含有模板序列5’末端上游的核苷酸序列；所述第二核酸分子从5’至3’包含5’同源臂、至少第二标记和3’同源臂，所述5’同源臂含有模板序列3’末端下游的核苷酸序列，所述3’同源臂含有靶序列3’末端下游的核苷酸序列；(c)在靶序列的任一侧或两侧以及模板序列的5’和3’末端产生双链断裂，从而将模板序列以及第一和第二标记插入靶染色体中；以及(d)选择表达第一和第二标记的一个或多个细胞。在一些实施方案中，第一和/或第二核酸分子是质粒。对于本文所述方法的一些实施方案，模板序列、靶序列以及第一和第二核酸分子的同源臂的排列如图4A-图4B所示。在一些实施方案中，在插入模板序列后，第一标记位于模板序列的5’末端，第二标记位于模板序列的3’末端。例如，通过本文所述方法产生的工程化的染色体在插入模板序列和删除靶序列后，从5’至3’包括靶序列上游的靶染色体序列、第一标记、模板序列、第二标记和靶序列下游的靶染色体序列。In some embodiments, this disclosure provides a method for generating engineered chromosomes, comprising: (a) providing cells containing a target chromosome containing a target sequence and a template chromosome containing a template sequence; (b) contacting the cells with (i) a first nucleic acid molecule and (ii) a second nucleic acid molecule, the first nucleic acid molecule comprising, from 5' to 3', a 5' homologous arm, at least a first marker, and a 3' homologous arm, the 5' homologous arm containing a nucleotide sequence upstream of the 5' end of the target sequence, and the 3' homologous arm containing a nucleotide sequence upstream of the 5' end of the template sequence; the second nucleic acid molecule comprising, from 5' to 3', a 5' homologous arm, at least a second marker, and a 3' homologous arm, the 5' homologous arm containing a nucleotide sequence downstream of the 3' end of the template sequence, and the 3' homologous arm containing a nucleotide sequence downstream of the 3' end of the target sequence; (c) generating double-strand breaks on either side of the target sequence and at the 5' and 3' ends of the template sequence, thereby inserting the template sequence and the first and second markers into the target chromosome; and (d) selecting one or more cells expressing the first and second markers. In some embodiments, the first and/or second nucleic acid molecule is a plasmid. For some embodiments of the method described herein, the arrangement of the template sequence, target sequence, and homologous arms of the first and second nucleic acid molecules is shown in Figures 4A-4B. In some embodiments, after inserting the template sequence, the first marker is located at the 5' end of the template sequence, and the second marker is located at the 3' end of the template sequence. For example, the engineered chromosome produced by the method described herein, after inserting the template sequence and deleting the target sequence, includes, from 5' to 3', the target chromosome sequence upstream of the target sequence, the first marker, the template sequence, the second marker, and the target chromosome sequence downstream of the target sequence.

熟练的技术人员将理解许多长度的模板序列适用于本文所述的方法。合适的模板序列可以小到数百个碱基对，或者包含染色体的大部分，因此长度可达数百兆对。在本文所述方法的一些实施方案中，模板序列的长度为至少25KB、至少50KB、至少100KB、至少200KB、至少400KB、至少500KB、至少600KB、至少700KB、至少800KB、至少900KB、至少1MB、至少2MB、至少3MB、至少4MB、至少5MB、至少10MB、至少15MB、至少20MB、至少50MB、至少100MB、至少150MB、至少200MB或至少250MB。在一些实施方案中，模板序列的长度介于在50KB与250MB之间、介于100KB与200MB之间、介于200KB与50MB之间、介于500KB与50MB之间、介于1MB与100MB之间、介于1MB与10MB之间、介于1MB与5MB之间、介于1MB与3MB之间、介于5MB与50MB之间、介于5MB与10MB之间、介于3MB与10MB之间或介于5MB与50MB之间。Skilled technicians will understand that template sequences of many lengths are suitable for the methods described herein. Suitable template sequences can be as small as a few hundred base pairs or contain a large portion of a chromosome, thus reaching lengths of hundreds of mega-pairs. In some embodiments of the methods described herein, the template sequence length is at least 25 KB, at least 50 KB, at least 100 KB, at least 200 KB, at least 400 KB, at least 500 KB, at least 600 KB, at least 700 KB, at least 800 KB, at least 900 KB, at least 1 MB, at least 2 MB, at least 3 MB, at least 4 MB, at least 5 MB, at least 10 MB, at least 15 MB, at least 20 MB, at least 50 MB, at least 100 MB, at least 150 MB, at least 200 MB, or at least 250 MB. In some implementations, the length of the template sequence is between 50KB and 250MB, between 100KB and 200MB, between 200KB and 50MB, between 500KB and 50MB, between 1MB and 100MB, between 1MB and 10MB, between 1MB and 5MB, between 1MB and 3MB, between 5MB and 50MB, between 5MB and 10MB, between 3MB and 10MB, or between 5MB and 50MB.

在本文所述方法的一些实施方案中，模板染色体从5’至3’包含第一核酸分子的3’同源臂序列、模板序列和第二核酸分子的5’同源臂序列。在一些实施方案中，模板染色体从5’至3’包含第一核酸分子的3’同源臂序列、第三核酸内切酶位点、模板序列、第四核酸内切酶位点和第二核酸分子的5’同源臂序列。In some embodiments of the method described herein, the template chromosome comprises, from 5' to 3', the 3' homologous arm sequence of the first nucleic acid molecule, the template sequence, and the 5' homologous arm sequence of the second nucleic acid molecule. In some embodiments, the template chromosome comprises, from 5' to 3', the 3' homologous arm sequence of the first nucleic acid molecule, the third endonuclease site, the template sequence, the fourth endonuclease site, and the 5' homologous arm sequence of the second nucleic acid molecule.

熟练的技术人员将理解许多长度的靶序列适用于本文所述的方法。合适的靶序列可以小到用于产生双链断裂的核酸内切酶位点(靶位置)，或者包含染色体的大部分，因此长度可达数百兆对。在本文所述方法的一些实施方案中，靶序列的长度为至少25KB、至少50KB、至少100KB、至少200KB、至少400KB、至少500KB、至少600KB、至少700KB、至少800KB、至少900KB、至少1MB、至少2MB、至少3MB、至少4MB、至少5MB、至少10MB、至少15MB、至少20MB、至少50MB、至少100MB、至少150MB、至少200MB或至少250MB。在一些实施方案中，靶序列的长度介于50KB与250MB之间、100KB与200MB之间、200KB与50MB之间、500KB与50MB之间、1MB与100MB之间、1MB与10MB之间、1MB与5MB之间、1MB与3MB之间、5MB与50MB之间、5MB与10MB之间、3MB与10MB之间或5MB与50MB之间。Skilled technicians will understand that target sequences of many lengths are suitable for the methods described herein. Suitable target sequences can be as small as endonuclease sites (target sites) used to generate double-strand breaks, or encompass a large portion of the chromosome, thus reaching lengths of hundreds of mega-pairs. In some embodiments of the methods described herein, the target sequence length is at least 25 KB, at least 50 KB, at least 100 KB, at least 200 KB, at least 400 KB, at least 500 KB, at least 600 KB, at least 700 KB, at least 800 KB, at least 900 KB, at least 1 MB, at least 2 MB, at least 3 MB, at least 4 MB, at least 5 MB, at least 10 MB, at least 15 MB, at least 20 MB, at least 50 MB, at least 100 MB, at least 150 MB, at least 200 MB, or at least 250 MB. In some implementations, the length of the target sequence is between 50KB and 250MB, 100KB and 200MB, 200KB and 50MB, 500KB and 50MB, 1MB and 100MB, 1MB and 10MB, 1MB and 5MB, 1MB and 3MB, 5MB and 50MB, 5MB and 10MB, 3MB and 10MB, or 5MB and 50MB.

在本文所述方法的一些实施方案中，靶染色体从5’至3’包含第一核酸分子的5’同源臂序列、靶序列和第二核酸分子的3’同源臂序列。在一些实施方案中，靶染色体从5’至3’包含第一核酸分子的5’同源臂序列、第一核酸内切酶位点、靶序列、第二核酸内切酶位点和第二核酸分子的3’同源臂序列。In some embodiments of the method described herein, the target chromosome comprises, from 5' to 3', a 5' homologous arm sequence of a first nucleic acid molecule, a target sequence, and a 3' homologous arm sequence of a second nucleic acid molecule. In some embodiments, the target chromosome comprises, from 5' to 3', a 5' homologous arm sequence of a first nucleic acid molecule, a first endonuclease site, a target sequence, a second endonuclease site, and a 3' homologous arm sequence of a second nucleic acid molecule.

在一些实施方案中，本文所述方法中使用的核酸分子是DNA分子。在一些实施方案中，本文所述方法中使用的核酸分子是环状的，例如质粒。可选地，可使用另外的核酸内切酶位点来线性化本公开的核酸分子。示例性核酸内切酶位点包括但不限于限制性核酸内切酶，以及本文所述的CRISPR/Cas核酸内切酶、ZFN和TALEN。熟练的技术人员能够将合适的核酸内切酶位点掺入核酸分子中，例如邻近或靠近核酸分子的任一或两个同源臂。熟练的技术人员能够将合适的CRE重组酶位点整合到核酸分子中。In some embodiments, the nucleic acid molecule used in the methods described herein is a DNA molecule. In some embodiments, the nucleic acid molecule used in the methods described herein is circular, such as a plasmid. Optionally, additional endonuclease sites may be used to linearize the nucleic acid molecules of this disclosure. Exemplary endonuclease sites include, but are not limited to, restriction endonucleases, as well as the CRISPR/Cas endonucleases, ZFN, and TALEN described herein. Those skilled in the art can incorporate suitable endonuclease sites into nucleic acid molecules, such as adjacent to or near any one or two homologous arms of the nucleic acid molecule. Those skilled in the art can integrate suitable CRE recombinase sites into nucleic acid molecules.

在一些实施方案中，通过插入模板序列删除靶序列，并且通过CRISPR/Cas核糖核蛋白在模板和靶序列的任一侧切割模板和靶染色体。在一些实施方案中，(a)靶染色体从5’至3’包含第一核酸分子的5’同源臂序列、第一sgRNA靶序列、靶序列、第二sgRNA靶序列和第二核酸分子的3’同源臂序列，以及(b)模板染色体从5’至3’包含第三sgRNA靶序列、第一核酸分子的3’同源臂序列、模板序列、第二核酸分子的5’同源臂序列和第四个sgRNA靶序列。在一些实施方案中，第一、第二、第三和第四sgRNA包含不同的靶向序列。例如，第一sgRNA包含特异于靶染色体上的第一sgRNA靶序列的靶向序列，第二sgRNA包含特异于靶染色体上的第二sgRNA靶序列的靶向序列，第三sgRNA包含特异于模板染色体上的第三sgRNA靶序列的靶向序列，第四sgRNA包含特异于靶染色体上的第四sgRNA靶序列的靶向序列。可选地，一个或多个sgRNA靶序列和相应的sgRNA靶向序列可以是相同的序列。In some embodiments, the target sequence is deleted by inserting a template sequence, and the template and target chromosomes are cleaved on either side of the template and target sequences by a CRISPR/Cas ribonucleoprotein. In some embodiments, (a) the target chromosome from 5' to 3' comprises the 5' homologous arm sequence of a first nucleic acid molecule, a first sgRNA target sequence, a target sequence, a second sgRNA target sequence, and the 3' homologous arm sequence of a second nucleic acid molecule; and (b) the template chromosome from 5' to 3' comprises a third sgRNA target sequence, the 3' homologous arm sequence of the first nucleic acid molecule, the template sequence, the 5' homologous arm sequence of the second nucleic acid molecule, and a fourth sgRNA target sequence. In some embodiments, the first, second, third, and fourth sgRNAs contain different target sequences. For example, the first sgRNA contains a targeting sequence specific to the first sgRNA target sequence on the target chromosome, the second sgRNA contains a targeting sequence specific to the second sgRNA target sequence on the target chromosome, the third sgRNA contains a targeting sequence specific to the third sgRNA target sequence on the template chromosome, and the fourth sgRNA contains a targeting sequence specific to the fourth sgRNA target sequence on the target chromosome. Optionally, one or more sgRNA target sequences and the corresponding sgRNA target sequences may be the same sequence.

在一些实施方案中，插入模板序列包括删除极少靶序列的序列或不删除靶序列的序列。本领域普通技术人员将理解，在双链断裂修复的许多机制中，涉及断裂末端的切除，因此将在本文所述的核酸内切酶位点周围产生缺失。例如，可通过本文所述的方法产生靶位置周围或靶序列侧翼的核酸内切酶位点周围约5bp、10bp、15bp、20bp、25bp、30bp、35bp、40bp、45bp或50bp的缺失。In some implementations, the inserted template sequence includes a sequence with minimal deletion of the target sequence or a sequence without deletion of the target sequence. Those skilled in the art will understand that many mechanisms of double-strand break repair involve the excision of the break ends, thus creating deletions around the endonuclease sites described herein. For example, deletions of approximately 5 bp, 10 bp, 15 bp, 20 bp, 25 bp, 30 bp, 35 bp, 40 bp, 45 bp, or 50 bp can be generated around the target site or flanking the target sequence via the methods described herein.

在一些实施方案(例如，其中通过本文所述的方法几乎未删除靶序列或未删除靶序列的那些实施方案)中，(a)靶染色体从5'至3'包含第一核酸分子的5'同源臂序列、第一sgRNA靶序列和第二核酸分子的3'同源臂序列；以及(b)模板染色体从5'至3'包含第二sgRNA靶序列、第一核酸分子的3'同源臂序列、模板序列、第二核酸分子的5'同源臂序列和第三sgRNA靶序列。在一些实施方案中，第一、第二和第三sgRNA包含不同的靶向序列。例如，第一sgRNA包含对靶染色体上的第一sgRNA靶序列特异的靶向序列，第二sgRNA包含对靶染色体上的第二sgRNA靶序列特异的靶向序列，第三sgRNA包含对模板染色体上的第三sgRNA靶序列特异的靶向序列。In some embodiments (e.g., those in which the target sequence is virtually or not deleted by the methods described herein), (a) the target chromosome from 5' to 3' comprises the 5' homologous arm sequence of the first nucleic acid molecule, the first sgRNA target sequence, and the 3' homologous arm sequence of the second nucleic acid molecule; and (b) the template chromosome from 5' to 3' comprises the second sgRNA target sequence, the 3' homologous arm sequence of the first nucleic acid molecule, the template sequence, the 5' homologous arm sequence of the second nucleic acid molecule, and the third sgRNA target sequence. In some embodiments, the first, second, and third sgRNAs comprise different target sequences. For example, the first sgRNA comprises a target sequence specific to the first sgRNA target sequence on the target chromosome, the second sgRNA comprises a target sequence specific to the second sgRNA target sequence on the target chromosome, and the third sgRNA comprises a target sequence specific to the third sgRNA target sequence on the template chromosome.

在一些实施方案中，插入模板序列破坏了靶序列的一种或多种功能。例如，将模板序列插入基因的编码序列可以通过产生过早终止密码子、蛋白质编码序列中的突变、异常剪接产物等来阻止正确基因产物的表达。类似地，将模板序列插入基因的调控序列，诸如增强子或启动子，可以阻止基因表达。In some implementations, inserting a template sequence disrupts one or more functions of the target sequence. For example, inserting a template sequence into the coding sequence of a gene can prevent the expression of the correct gene product by generating premature stop codons, mutations in protein-coding sequences, aberrant splicing products, etc. Similarly, inserting a template sequence into the regulatory sequence of a gene, such as an enhancer or promoter, can prevent gene expression.

在一些实施方案中，本公开的方法包括在插入靶序列后删除第一和/或第二标记。可通过本领域已知的任何合适的方法删除标记例如，可将包含工程化的染色体的细胞与CRISPR/Cas核糖核蛋白接触，所述CRISPR/Cas核糖核蛋白包含对编码标记的序列特异的gNA靶向序列，从而诱导标记序列的全部或部分缺失。In some embodiments, the method of this disclosure includes deleting a first and/or second marker after inserting the target sequence. The marker can be deleted by any suitable method known in the art; for example, a cell containing an engineered chromosome can be contacted with a CRISPR/Cas ribonucleoprotein containing a gNA targeting sequence specific to the sequence encoding the marker, thereby inducing the complete or partial deletion of the marker sequence.

本公开的方法可用于产生染色体重排，诸如倒位和易位。许多染色体重排在人疾病或病症诸如癌症中起作用。在模式生物(诸如小鼠)中重建此类重排可以促进对这些疾病或病症的研究。所涉及的染色体畸变为本领域技术人员所知，并描述于可在mitelmandatabase.isb-cgc.org/获得的Mitelman数据库中。关于与人疾病相关的染色体畸变的更多信息也可在rarediseases.info.nih.gov/diseases/diseases-by-category/36/chromosome-disorders上获得。The methods disclosed herein can be used to generate chromosomal rearrangements, such as inversions and translocations. Many chromosomal rearrangements play a role in human diseases or conditions such as cancer. Reconstructing such rearrangements in model organisms (such as mice) can facilitate research on these diseases or conditions. The chromosomal aberrations involved are known to those skilled in the art and are described in the Mitelman database, available at mitelmandatabase.isb-cgc.org/. More information on chromosomal aberrations associated with human diseases is also available at rareiseases.info.nih.gov/diseases/diseases-by-category/36/chromosome-disorders.

因此，本公开提供了产生染色体重排的方法，其包括：(a)提供细胞，其包含含有靶位置的靶染色体和含有模板序列的模板染色体；(b)将细胞与核酸分子接触，所述核酸分子从5’至3’包含5’同源臂和3’同源臂，所述5’同源臂包含靶位置5’末端上游的核苷酸序列，所述3’同源臂包含模板序列5’末端上游的核苷酸序列；(c)在靶位置上和模板序列的5’末端产生双链断裂，从而将标记插入5’同源臂序列3’的靶染色体，随后插入模板序列，从而产生染色体重排；以及(c)选择表达该标记的一个或多个细胞。可选地，所述方法包括(a)提供细胞，其包含含有靶位置的靶染色体和含有模板序列的模板染色体；(b)将细胞与核酸分子接触，所述核酸分子从5’至3’包含5’同源臂、标记和3’同源臂，所述5’同源臂包含模板序列3’末端下游核苷酸序列，所述3’同源臂包含靶序列3’末端下游核苷酸序列；(c)在靶位置上和模板序列的3’末端产生双链断裂，从而将标记插入5’同源臂序列3’的靶染色体，随后插入模板序列，从而产生染色体重排；以及(c)选择表达该标记的一个或多个细胞。在一些实施方案中，产生双链断裂包括将细胞与CRISPR/Cas内切核酸酶、至少第一gNA和第二gNA接触，所述第一gNA包含对靶位置特异的靶向序列，使得CRISPR/Cas内切核酸酶切割靶位置，所述第二gNA包含对模板序列5’末端特异的靶向序列。在一些实施方案中，产生双链断裂包括将细胞与CRISPR/Cas内切核酸酶、至少第一gNA和第二gNA接触，所述第一gNA包含对靶位置特异的靶向序列，使得CRISPR/Cas内切核酸酶切割靶位置，所述第二gNA包含对模板序列3’末端特异的靶向序列。在一些实施方案中，核酸分子包括DNA。在一些实施方案中，核酸分子包括质粒。Therefore, this disclosure provides a method for generating chromosomal rearrangements, comprising: (a) providing a cell containing a target chromosome with a target location and a template chromosome containing a template sequence; (b) contacting the cell with a nucleic acid molecule containing a 5' homologous arm and a 3' homologous arm from 5' to 3', the 5' homologous arm containing a nucleotide sequence upstream of the 5' end of the target location, and the 3' homologous arm containing a nucleotide sequence upstream of the 5' end of the template sequence; (c) generating a double-strand break at the target location and at the 5' end of the template sequence, thereby inserting a marker into the target chromosome at the 5' homologous arm sequence 3', followed by insertion into the template sequence, thereby generating chromosomal rearrangements; and (c) selecting one or more cells expressing the marker. Optionally, the method includes (a) providing cells comprising a target chromosome containing a target location and a template chromosome containing a template sequence; (b) contacting the cells with a nucleic acid molecule comprising a 5' homologous arm, a marker, and a 3' homologous arm from 5' to 3', the 5' homologous arm containing a downstream nucleotide sequence from the 3' end of the template sequence, and the 3' homologous arm containing a downstream nucleotide sequence from the 3' end of the target sequence; (c) generating a double-strand break at the target location and at the 3' end of the template sequence, thereby inserting the marker into the target chromosome at the 3' of the 5' homologous arm sequence, followed by insertion into the template sequence, thereby generating a chromosomal rearrangement; and (c) selecting one or more cells expressing the marker. In some embodiments, generating a double-strand break includes contacting the cells with a CRISPR/Cas endonuclease, at least a first gNA and a second gNA, the first gNA containing a target location-specific targeting sequence such that the CRISPR/Cas endonuclease cleaves the target location, and the second gNA containing a target sequence specific to the 5' end of the template sequence. In some embodiments, generating a double-strand break includes contacting a cell with a CRISPR/Cas endonuclease, at least a first gNA and a second gNA, the first gNA containing a target-specific targeting sequence such that the CRISPR/Cas endonuclease cleaves the target site, and the second gNA containing a target sequence specific to the 3' end of a template sequence. In some embodiments, the nucleic acid molecule comprises DNA. In some embodiments, the nucleic acid molecule comprises a plasmid.

本领域已知的合适方法可用于在靶染色体和模板染色体中产生双链断裂。这尤其可通过选择用于指导HDR介导的染色体重排的核酸分子(例如，质粒)的同源臂序列来实现，所述核酸分子与靶染色体和模板染色体上的核酸内切酶位点重叠或包含所述核酸内切酶位点。在一些实施方案中，在(c)中产生双链断裂包括使用CRISPR/Cas核酸内切酶和一种或多种引导核酸(gNA)、一种或多种锌指核酸酶、一种或多种转录激活子样效应因子核酸酶(TALEN)或一种或多种CRE重组酶来诱导双链断裂。例如，Cre重组酶诱导两个LoxP位点之间的染色体区域的倒位，由此模板序列以及第一和第二标记被插入到靶染色体中。在一些实施方案中，CRISPR/Cas核酸内切酶包括CasI、CasIB、Cas2、Cas3、Cas4、Cas5、Cas6、Cas7、Cas8、Cas9、Cas10、CasX、CasY、Cas12a(Cpf1)、Cas13a、CsyI、Csy2、Csy3、CseI、Cse2、CscI、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、CmrI、Cmr3、Cmr4、Cmr5、Cmr6、CsbI、Csb2、Csb3、Csx17、CsxI4、Csx10、Csx16、CsaX、Csx3、Csx1、Csx15、CsfI、Csf2、Csf3、Csf4、Cms1、C2c1、C2c2或C2c3或其同源物、直系同源物或经修饰的形式。在一些实施方案中，CRISPR/Cas核酸内切酶包括Cas9、Cas12a(Cpf1)、Cas13a、CasX、CasY、C2c1或C2c3。在一些实施方案中，CRISPR/Cas内切核酸酶包括Cas9。在一些实施方案中，gNA包括单引导RNA(sgRNA)。Suitable methods known in the art can be used to generate double-strand breaks in the target chromosome and the template chromosome. This can be achieved, in particular, by selecting homologous arm sequences of nucleic acid molecules (e.g., plasmids) for guiding HDR-mediated chromosomal rearrangements, said nucleic acid molecules overlapping with or containing endonuclease sites on the target chromosome and the template chromosome. In some embodiments, generating double-strand breaks in (c) includes inducing double-strand breaks using a CRISPR/Cas endonuclease and one or more guide nucleic acids (gNA), one or more zinc finger nucleases, one or more transcription activator-like effector nucleases (TALEN), or one or more CRE recombinases. For example, a Cre recombinase induces an inversion of a chromosomal region between two LoxP sites, whereby the template sequence, along with first and second markers, is inserted into the target chromosome. In some implementations, CRISPR/Cas endonucleases include CasI, CasIB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, Cas10, CasX, CasY, Cas12a(Cpf1), Cas13a, CsyI, Csy2, Csy3, CseI, Cse2, CscI, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, CmrI, Cmr3, Cmr4, Cmr5, Cmr6, CsbI, Csb2, Csb3, Csx17, CsxI4, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, CsfI, Csf2, Csf3, Csf4, Cms1, C2c1, C2c2, or C2c3, or their homologs, orthologs, or modified forms. In some embodiments, the CRISPR/Cas endonuclease includes Cas9, Cas12a (Cpf1), Cas13a, CasX, CasY, C2c1, or C2c3. In some embodiments, the CRISPR/Cas endonuclease includes Cas9. In some embodiments, gNA includes a single guide RNA (sgRNA).

本领域已知的任何合适的方法都可用于将细胞与本文所述的核酸内切酶接触。例如，包含核酸内切酶和编码gRNA的序列(对于CRISPR/Cas核酸内切酶而言)的核酸分子(例如，质粒等)可用于转染细胞。可选地，可通过电穿孔、脂转染、转导等将核酸内切酶或编码核酸内切酶的核酸分子引入细胞。Any suitable method known in the art can be used to contact cells with the endonucleases described herein. For example, nucleic acid molecules (e.g., plasmids, etc.) containing the endonuclease and a sequence encoding gRNA (for CRISPR/Cas endonucleases) can be used to transfect cells. Alternatively, the endonuclease or a nucleic acid molecule encoding the endonuclease can be introduced into cells via electroporation, lipid transfection, transduction, etc.

用于实施本文所述方法的细胞可以是本领域已知的任何合适的细胞。在一些实施方案中，细胞包括胚胎干(ES)细胞。在一些实施方案中，细胞包括胚胎杂交干(EHS)干细胞。EHS细胞可通过融合来自两个不同物种(例如人和小鼠、人和大鼠，或小鼠和猴)的ES细胞来产生。本领域已知的所有融合方法都被设想为在本公开的范围内，包括但不限于电融合、病毒诱导融合和化学诱导融合。在一些实施方案中，该方法包括将人EH细胞与选自由以下组成的组的EH细胞融合：小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、鸡和猴。在一些实施方案中，该方法包将来自任何两种不同物种的EH细胞融合，所述物种选自由以下组成的组：小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、鸡和猴。The cells used to implement the methods described herein can be any suitable cells known in the art. In some embodiments, the cells comprise embryonic stem (ES) cells. In some embodiments, the cells comprise embryonic hybrid stem (EHS) stem cells. EHS cells can be generated by fusing ES cells from two different species (e.g., human and mouse, human and rat, or mouse and monkey). All fusion methods known in the art are contemplated within the scope of this disclosure, including but not limited to electrofusion, viral-induced fusion, and chemically induced fusion. In some embodiments, the method comprises fusing human EH cells with EH cells selected from the group consisting of: mice, rats, rabbits, guinea pigs, hamsters, sheep, goats, donkeys, cattle, horses, camels, chickens, and monkeys. In some embodiments, the method includes fusing EH cells from any two different species selected from the group consisting of: mice, rats, rabbits, guinea pigs, hamsters, sheep, goats, donkeys, cattle, horses, camels, chickens, and monkeys.

在一些实施方案中，细胞包括受精卵。如本文中所用，术语“受精卵”是指由两个配子(例如哺乳动物的卵子和精子)之间的受精事件形成的真核细胞。单细胞、2细胞、4细胞、8细胞或更进阶段的受精卵可适用于本文所述的方法。In some implementations, the cells include zygotes. As used herein, the term "zygote" refers to a eukaryotic cell formed by a fertilization event between two gametes (e.g., an ovum and a sperm in a mammal). Single-cell, 2-cell, 4-cell, 8-cell, or more advanced stages of zygotes may be suitable for the methods described herein.

如本文所述产生工程化的染色体后，可使用任何合适的方法来回收工程化的染色体。在一些实施方案中，回收本公开的工程化的染色体包括微细胞介导的染色体转移(MMCT)。通过将包含工程化的染色体的微核细胞与靶细胞诸如ES细胞融合，将回收的染色体转移到任何适合下游应用的细胞类型中。下面更详细地描述这些方法。Once engineered chromosomes are generated as described herein, they can be recovered using any suitable method. In some embodiments, the recovery of engineered chromosomes disclosed herein includes microcell-mediated chromosome transfer (MMCT). The recovered chromosomes are transferred to any cell type suitable for downstream applications by fusing micronucleated cells containing engineered chromosomes with target cells such as ES cells. These methods are described in more detail below.

模板染色体template chromosome

本公开提供了用于本文所述方法的包含模板序列的模板染色体。This disclosure provides a template chromosome containing a template sequence for use in the methods described herein.

如本文中所用，“模板染色体”是指含有“模板序列”的染色体。模板序列是指使用本公开的方法引入靶染色体或靶位置的序列。As used herein, "template chromosome" refers to a chromosome containing a "template sequence". A template sequence is a sequence introduced into a target chromosome or target location using the methods of this disclosure.

模板染色体可从任何合适的来源分离或获得。在一些实施方案中，模板染色体来自真核生物。在一些实施方案中，真核生物是脊椎动物，诸如鸟类、爬行动物或哺乳动物。在一些实施方案中，模板染色体来自小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、猴或鸡。在一些实施方案中，模板染色体来自人。The template chromosome can be isolated or obtained from any suitable source. In some embodiments, the template chromosome is derived from a eukaryote. In some embodiments, the eukaryote is a vertebrate, such as a bird, reptile, or mammal. In some embodiments, the template chromosome is derived from a mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cow, horse, camel, monkey, or chicken. In some embodiments, the template chromosome is derived from a human.

在一些实施方案中，模板染色体是外源染色体，模板序列是外源序列。例如，靶染色体是小鼠染色体，模板染色体和相应的模板序列来自非小鼠物种，诸如人。In some implementations, the template chromosome is a foreign chromosome, and the template sequence is a foreign sequence. For example, the target chromosome is a mouse chromosome, and the template chromosome and the corresponding template sequence are derived from a non-mouse species, such as humans.

在一些实施方案中，模板染色体是内源染色体，模板序列是内源序列。例如，模板染色体是小鼠染色体，而靶染色体是第二不同的小鼠染色体。In some implementations, the template chromosome is an endogenous chromosome, and the template sequence is an endogenous sequence. For example, the template chromosome is a mouse chromosome, while the target chromosome is a second, different mouse chromosome.

在一些实施方案中，模板染色体是人工染色体。In some implementations, the template chromosome is an artificial chromosome.

在一些实施方案中，模板染色体是天然存在的染色体。In some implementations, the template chromosome is a naturally occurring chromosome.

在一些实施方案中，模板染色体包含对天然存在的染色体的一个或多个修饰。修饰尤其包括序列的插入、缺失和重排。插入模板染色体的序列的实例尤其包括标记、启动子、cDNA序列、非编码序列等。In some implementations, the template chromosome contains one or more modifications to a naturally occurring chromosome. Modifications include, in particular, sequence insertions, deletions, and rearrangements. Examples of sequences inserted into the template chromosome include, in particular, markers, promoters, cDNA sequences, non-coding sequences, etc.

在一些实施方案中，模板染色体包含位于模板序列5’的核酸内切酶位点。在一些实施方案中，模板染色体包含位于模板序列3’的核酸内切酶位点。在一些实施方案中，核酸内切酶位点紧邻模板序列。在一些实施方案中，核酸内切酶位点位于模板序列附近。In some embodiments, the template chromosome contains an endonuclease site located at the 5' of the template sequence. In some embodiments, the template chromosome contains an endonuclease site located at the 3' of the template sequence. In some embodiments, the endonuclease site is adjacent to the template sequence. In some embodiments, the endonuclease site is located near the template sequence.

在一些实施方案中，模板染色体在模板序列的任一侧包含核酸内切酶位点。例如，模板染色体包含位于模板序列5’的第一核酸内切酶位点和位于模板序列3’的第二核酸内切酶位点。在一些实施方案中，第一和第二核酸内切酶位点都被同一核酸内切酶识别和切割。例如，第一和第二核酸内切酶位点均包含相同的DNA序列，其被同一核酸内切酶识别。在一些实施方案中，第一核酸内切酶位点被第一核酸内切酶切割，第二核酸内切酶位点被第二核酸内切酶切割。例如，第一和第二内切核酸酶位点包含由两种不同的锌指核酸酶(ZFN)识别的不同DNA序列，或由包含含有不同靶向序列的引导核酸(gNA)的CRISPR/Cas核糖核蛋白复合物识别的两种不同的CRISPR/Cas靶序列。在一些实施方案中，第一和/或第二核酸内切酶位点紧邻模板序列。在一些实施方案中，第一和/或第二核酸内切酶位点位于模板序列附近。In some embodiments, the template chromosome contains endonuclease sites on either side of the template sequence. For example, the template chromosome contains a first endonuclease site at the 5' of the template sequence and a second endonuclease site at the 3' of the template sequence. In some embodiments, both the first and second endonuclease sites are recognized and cleaved by the same endonuclease. For example, both the first and second endonuclease sites contain the same DNA sequence, which is recognized by the same endonuclease. In some embodiments, the first endonuclease site is cleaved by a first endonuclease, and the second endonuclease site is cleaved by a second endonuclease. For example, the first and second endonuclease sites contain different DNA sequences recognized by two different zinc finger nucleases (ZFNs), or two different CRISPR/Cas target sequences recognized by a CRISPR/Cas ribonucleoprotein complex containing guide nucleic acids (gNAs) with different target sequences. In some embodiments, the first and/or second endonuclease sites are adjacent to the template sequence. In some embodiments, the first and/or second endonuclease sites are located near the template sequence.

在模板序列的5个碱基对(bp)内、10bp内、15bp内、20bp内、30bp内、40bp内、50bp内、70bp内、80bp内、90bp内、100bp内、120bp内、140bp内、160bp内、180bp内、200bp内、250bp内、300bp内、400bp内或500bp内的序列可被认为靠近模板序列。Sequences within 5 base pairs (bp) of the template sequence, or within 10 bp, 15 bp, 20 bp, 30 bp, 40 bp, 50 bp, 70 bp, 80 bp, 90 bp, 100 bp, 120 bp, 140 bp, 160 bp, 180 bp, 200 bp, 250 bp, 300 bp, 400 bp, or 500 bp, can be considered close to the template sequence.

在一些实施方案中，模板染色体包含用于促进同源定向修复的核酸分子的同源臂的一个或多个序列。在一些实施方案中，模板染色体包含位于模板序列5’末端或模板序列5’末端附近的同源臂序列。在一些实施方案中，同源臂位于模板序列的上游，即模板序列的5’。在一些实施方案中，模板染色体从5’至3’包含核酸内切酶位点、同源臂序列和模板序列。在一些实施方案中，模板染色体包含位于模板序列3’末端或模板序列5’末端附近的同源臂序列。在一些实施方案中，同源臂位于模板序列的下游，即模板序列的3’。在一些实施方案中，模板染色体从5’至3’包含模板序列、同源臂序列和核酸内切酶位点。在一些实施方案中，同源臂序列位于核酸内切酶位点与模板序列之间。In some embodiments, the template chromosome includes one or more sequences of homologous arms of nucleic acid molecules for promoting homology-directed repair. In some embodiments, the template chromosome includes homologous arm sequences located at or near the 5' end of the template sequence. In some embodiments, the homologous arm is located upstream of the template sequence, i.e., at the 5' end of the template sequence. In some embodiments, the template chromosome includes a restriction endonuclease site, homologous arm sequences, and the template sequence from 5' to 3'. In some embodiments, the template chromosome includes homologous arm sequences located at or near the 3' end of the template sequence or at the 5' end of the template sequence. In some embodiments, the homologous arm is located downstream of the template sequence, i.e., at the 3' end of the template sequence. In some embodiments, the template chromosome includes the template sequence, homologous arm sequences, and a restriction endonuclease site from 5' to 3'. In some embodiments, the homologous arm sequences are located between the restriction endonuclease site and the template sequence.

在一些实施方案中，模板染色体包含位于模板序列5’或其附近的第一同源臂序列，和位于模板序列3’或其附近的第二同源臂序列，即，模板染色体包含模板序列上游和下游的同源臂。在一些实施方案中，第一同源臂是第一核酸分子的3’同源臂，所述第一核酸分子从5’至3’包含含有靶序列的5'末端上游的核苷酸序列的5’同源臂、至少第一标记的序列和第一同源臂序列。在一些实施方案中，第二同源臂是第二核酸分子的5’同源臂，所述第二核酸分子从5’至3’包含第二同源臂序列、至少第二标记的序列和包含靶序列3’末端下游的核苷酸序列的3’同源臂。在一些实施方案中，模板染色体从5’至3’包含第一核酸内切酶位点、第一同源臂序列、模板序列、第二同源臂序列和第二核酸内切酶位点。In some embodiments, the template chromosome includes a first homologous arm sequence located at or near the 5' end of the template sequence and a second homologous arm sequence located at or near the 3' end of the template sequence; that is, the template chromosome includes homologous arms upstream and downstream of the template sequence. In some embodiments, the first homologous arm is the 3' homologous arm of a first nucleic acid molecule, which, from 5' to 3', includes a 5' homologous arm containing a nucleotide sequence upstream of the 5' end of the target sequence, at least a first labeled sequence, and the first homologous arm sequence. In some embodiments, the second homologous arm is the 5' homologous arm of a second nucleic acid molecule, which, from 5' to 3', includes a second homologous arm sequence, at least a second labeled sequence, and a 3' homologous arm containing a nucleotide sequence downstream of the 3' end of the target sequence. In some embodiments, the template chromosome, from 5' to 3', includes a first endonuclease site, a first homologous arm sequence, a template sequence, a second homologous arm sequence, and a second endonuclease site.

在一些实施方案中，第一和/或第二同源臂序列紧邻第一和/或第二核酸内切酶位点。在一些实施方案中，第一同源臂序列紧邻第一核酸内切酶位点，第二同源臂序列紧邻第二核酸内切酶位点，其中第一同源臂位于第一核酸内切酶位点与模板序列之间，第二同源臂位于模板序列与第二模板序列之间。在一些实施方案中，第一同源臂位于第一核酸内切酶位点与模板序列之间，第二同源臂位于模板序列与第二模板序列之间。In some embodiments, the first and/or second homologous arm sequences are adjacent to the first and/or second endonuclease sites. In some embodiments, the first homologous arm sequence is adjacent to the first endonuclease site, and the second homologous arm sequence is adjacent to the second endonuclease site, wherein the first homologous arm is located between the first endonuclease site and the template sequence, and the second homologous arm is located between the template sequence and the second template sequence. In some embodiments, the first homologous arm is located between the first endonuclease site and the template sequence, and the second homologous arm is located between the template sequence and the second template sequence.

在一些实施方案中，第一和/或第二同源臂序列位于模板序列附近。在模板序列的0bp内、5个碱基对(bp)内、10bp内、15bp内、20bp内、30bp内、40bp内、50bp内、70bp内、80bp内、90bp内、100bp内、120bp内、140bp内、160bp内、180bp内、200bp内或250bp内的同源臂可被认为靠近模板序列。In some implementations, the first and/or second homologous arm sequences are located near the template sequence. Homologous arms within 0 bp, 5 base pairs (bp), 10 bp, 15 bp, 20 bp, 30 bp, 40 bp, 50 bp, 70 bp, 80 bp, 90 bp, 100 bp, 120 bp, 140 bp, 160 bp, 180 bp, 200 bp, or 250 bp of the template sequence can be considered close to the template sequence.

在一些实施方案中，模板染色体从5’至3’包含第一核酸内切酶位点、第一同源臂、模板序列、第二同源臂和第二核酸内切酶位点。In some implementations, the template chromosome from 5' to 3' includes a first endonuclease site, a first homologous arm, a template sequence, a second homologous arm, and a second endonuclease site.

在一些实施方案中，模板染色体的第一和/或第二同源序列的长度介于约20bp与2,000bp之间、介于约50bp与1,500bp之间、介于约100bp与1,400bp之间、介于约150bp与1,300bp之间、介于约200bp与1,200bp之间、介于约300bp与1,100bp之间、介于约400bp与1,000bp之间或介于约500bp与900bp之间，或介于约600bp bp与1,200bp之间。在一些实施方案中，模板染色体的同源序列长度介于约400bp与1,500bp之间。在一些实施方案中，模板染色体的同源序列长度介于约500bp与1,300bp之间。在一些实施方案中，模板染色体的同源序列长度介于约600bp与1,000bp之间。In some embodiments, the length of the first and/or second homologous sequences of the template chromosome is between about 20 bp and 2,000 bp, between about 50 bp and 1,500 bp, between about 100 bp and 1,400 bp, between about 150 bp and 1,300 bp, between about 200 bp and 1,200 bp, between about 300 bp and 1,100 bp, between about 400 bp and 1,000 bp, or between about 500 bp and 900 bp, or between about 600 bp and 1,200 bp. In some embodiments, the length of the homologous sequences of the template chromosome is between about 400 bp and 1,500 bp. In some embodiments, the length of the homologous sequences of the template chromosome is between about 500 bp and 1,300 bp. In some embodiments, the length of the homologous sequences of the template chromosome is between about 600 bp and 1,000 bp.

模板序列template sequence

模板染色体包含模板序列，并且在本文所述的工程化的染色体和方法中充当模板序列的来源。模板序列可位于模板染色体上任何合适的位置。例如，不希望受理论所束缚，模板序列可位于模板染色体上以常染色质为特征的区域。The template chromosome contains the template sequence and serves as the source of the template sequence in the engineered chromosomes and methods described herein. The template sequence can be located at any suitable location on the template chromosome. For example, if it is not desired to be bound by theory, the template sequence can be located in a region on the template chromosome characterized by euchromatin.

可从任何合适的来源分离或衍生模板序列。在一些实施方案中，模板序列包含内源序列，例如对于模板染色体是内源的序列，或对于产生靶染色体的物种是内源的序列。在一些实施方案中，模板序列是外源序列。例如，模板序列来自对于产生靶染色体的物种是外源的序列。在一些实施方案中，模板序列包含天然存在的序列。在一些实施方案中，模板序列包含对天然存在的序列的一个或多个修饰。修饰尤其包括序列诸如人工序列或标记的插入、缺失和重排。在一些实施方案中，模板序列包含人工序列。在一些实施方案中，模板序列包括天然存在的序列和人工序列。示例性人工序列尤其包括标记、cDNA序列、启动子和重组序列。示例性标记包括但不限于下表3中公开的选择标记，以及可检测的标记，诸如绿色荧光蛋白(GFP)、mCherry等。The template sequence can be isolated or derived from any suitable source. In some embodiments, the template sequence comprises an endogenous sequence, such as a sequence that is endogenous to the template chromosome or to the species that produces the target chromosome. In some embodiments, the template sequence is an exogenous sequence. For example, the template sequence is derived from a sequence that is exogenous to the species that produces the target chromosome. In some embodiments, the template sequence comprises a naturally occurring sequence. In some embodiments, the template sequence comprises one or more modifications to the naturally occurring sequence. Modifications include, in particular, the insertion, deletion, and rearrangement of sequences such as artificial sequences or markers. In some embodiments, the template sequence comprises an artificial sequence. In some embodiments, the template sequence comprises both naturally occurring and artificial sequences. Exemplary artificial sequences include, in particular, markers, cDNA sequences, promoters, and recombinant sequences. Exemplary markers include, but are not limited to, the selection markers disclosed in Table 3 below, as well as detectable markers such as green fluorescent protein (GFP), mCherry, etc.

在一些实施方案中，模板序列来自真核生物。在一些实施方案中，真核生物是脊椎动物，诸如鸟类、爬行动物或哺乳动物。在一些实施方案中，模板序列包含小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、猴或鸡序列。在一些实施方案中，模板序列包含人序列。In some embodiments, the template sequence is derived from a eukaryote. In some embodiments, the eukaryote is a vertebrate, such as a bird, reptile, or mammal. In some embodiments, the template sequence comprises a mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cow, horse, camel, monkey, or chicken sequence. In some embodiments, the template sequence comprises a human sequence.

在一些实施方案中，模板序列的长度为至少25KB、至少50KB、至少100KB、至少200KB、至少400KB、至少500KB、至少600KB、至少700KB、至少800KB、至少900KB、至少1MB、至少2MB、至少3MB、至少4MB、至少5MB、至少6MB、至少7MB、至少8MB、至少9MB、至少10MB、至少15MB、至少20MB、至少25MB、至少30MB、至少40MB、至少50MB、至少60MB、至少70MB、至少80MB、至少90MB、至少100MB、至少120MB、至少140MB、至少160MB、至少180MB、至少200MB、至少220MB或至少250MB。在一些实施方案中，模板序列的长度为至少50KB、至少100KB、至少200KB、至少500KB、至少700KB、至少1MB、至少2MB、至少3MB、至少4MB、至少5MB、至少6MB、至少7MB、至少8MB、至少9MB、至少10MB、至少20MB、至少30MB、至少40MB或至少50MB。在一些实施方案中，模板序列的长度至少为1MB。在一些实施方案中，模板序列的长度至少为2MB。在一些实施方案中，模板序列的长度至少为3MB。在一些实施方案中，模板序列的长度至少为4MB。在一些实施方案中，模板序列的长度至少为5MB。在一些实施方案中，模板序列的长度至少为10MB。在一些实施方案中，模板序列的长度至少为20MB。In some implementations, the template sequence length is at least 25KB, at least 50KB, at least 100KB, at least 200KB, at least 400KB, at least 500KB, at least 600KB, at least 700KB, at least 800KB, at least 900KB, at least 1MB, at least 2MB, at least 3MB, at least 4MB, at least 5MB, at least 6MB, at least 7MB, at least 8MB, at least 9MB, at least 10MB, at least 15MB, at least 20MB, at least 25MB, at least 30MB, at least 40MB, at least 50MB, at least 60MB, at least 70MB, at least 80MB, at least 90MB, at least 100MB, at least 120MB, at least 140MB, at least 160MB, at least 180MB, at least 200MB, at least 220MB, or at least 250MB. In some embodiments, the template sequence length is at least 50KB, at least 100KB, at least 200KB, at least 500KB, at least 700KB, at least 1MB, at least 2MB, at least 3MB, at least 4MB, at least 5MB, at least 6MB, at least 7MB, at least 8MB, at least 9MB, at least 10MB, at least 20MB, at least 30MB, at least 40MB, or at least 50MB. In some embodiments, the template sequence length is at least 1MB. In some embodiments, the template sequence length is at least 2MB. In some embodiments, the template sequence length is at least 3MB. In some embodiments, the template sequence length is at least 4MB. In some embodiments, the template sequence length is at least 5MB. In some embodiments, the template sequence length is at least 10MB. In some embodiments, the template sequence length is at least 20MB.

在一些实施方案中，模板序列的长度介于50KB与250MB之间、介于50KB与100MB之间、介于50KB与50MB之间、介于50KB与20MB之间、介于50KB与10MB之间、介于50KB与5MB之间、介于50KB与3MB之间、介于50KB与2MB之间、介于50KB与1MB之间、介于100KB与200MB之间、介于100KB与100MB之间、介于100KB与50MB之间、介于100KB与20MB之间、介于100KB与10MB之间、介于100KB与5MB之间、介于100KB与3MB之间、介于100KB与2MB之间、介于100KB与1MB之间、介于100KB与500KB之间、介于200KB与100MB之间、介于200KB与50MB之间、介于200KB与20MB之间、介于200KB与10MB之间、介于200KB与5MB之间、介于200KB与3MB之间、介于200KB与2MB之间、介于200KB与1MB之间、介于200KB与500KB之间、介于500KB与100MB之间、介于500KB与50MB之间、介于500KB与20MB之间、介于500KB与10MB之间、介于500KB与5MB之间、介于500KB与3MB之间、介于500KB与2MB之间、介于500KB与1MB之间、介于1MB与100MB之间、介于1MB与50MB之间、介于1MB与20MB之间、介于1MB与10MB之间、介于1MB与5MB之间、介于1MB与3MB之间、介于1MB与2MB之间、介于3MB与100MB之间、介于3MB与50MB之间、介于3MB与20MB之间、介于3MB与10MB之间、介于3MB与5MB之间、介于5MB与100MB之间、介于5MB与50MB之间、介于5MB与20MB之间、介于5MB与10MB之间、介于10MB与100MB之间、介于10MB与50MB之间或介于10MB与20Mb之间。在一些实施方案中，模板序列的长度介于50KB与250MB之间。在一些实施方案中，模板序列的长度介于500KB与200MB之间。在一些实施方案中，模板序列的长度介于200KB与50MB之间、介于1MB与20MB之间、介于1MB与10MB之间、介于1MB与5MB之间、介于1MB与3MB之间、介于3MB与20MB之间、介于3MB与10MB之间、介于3MB与7MB之间或介于3MB与5MB之间。在一些实施方案中，模板序列的长度介于1MB与10MB之间。在一些实施方案中，模板序列的长度介于1MB与5MB之间。在一些实施方案中，模板序列的长度介于3MB与5MB之间。In some implementations, the template sequence length is between 50KB and 250MB, between 50KB and 100MB, between 50KB and 50MB, between 50KB and 20MB, between 50KB and 10MB, between 50KB and 5MB, between 50KB and 3MB, between 50KB and 2MB, between 50KB and 1MB, between 100KB and 200MB, between 100KB and 100MB, between 100KB and 50MB, between 100KB and 200MB. Between MB, between 100KB and 10MB, between 100KB and 5MB, between 100KB and 3MB, between 100KB and 2MB, between 100KB and 1MB, between 100KB and 500KB, between 200KB and 100MB, between 200KB and 50MB, between 200KB and 20MB, between 200KB and 10MB, between 200KB and 5MB, between 200KB and 3MB, between 200KB and 2MB, between... Between 200KB and 1MB, Between 200KB and 500KB, Between 500KB and 100MB, Between 500KB and 50MB, Between 500KB and 20MB, Between 500KB and 10MB, Between 500KB and 5MB, Between 500KB and 3MB, Between 500KB and 2MB, Between 500KB and 1MB, Between 1MB and 100MB, Between 1MB and 50MB, Between 1MB and 20MB, Between 1MB and 10MB The template sequence length is between 1MB and 5MB, between 1MB and 3MB, between 1MB and 2MB, between 3MB and 100MB, between 3MB and 50MB, between 3MB and 20MB, between 3MB and 10MB, between 3MB and 5MB, between 5MB and 100MB, between 5MB and 50MB, between 5MB and 20MB, between 5MB and 10MB, between 10MB and 100MB, between 10MB and 50MB, or between 10MB and 20MB. In some embodiments, the template sequence length is between 50KB and 250MB. In some embodiments, the template sequence length is between 500KB and 200MB. In some embodiments, the template sequence length is between 200KB and 50MB, between 1MB and 20MB, between 1MB and 10MB, between 1MB and 5MB, between 1MB and 3MB, between 3MB and 20MB, between 3MB and 10MB, between 3MB and 7MB, or between 3MB and 5MB. In some embodiments, the template sequence length is between 1MB and 10MB. In some embodiments, the template sequence length is between 1MB and 5MB. In some embodiments, the template sequence length is between 3MB and 5MB.

在一些实施方案中，模板序列包含一个或多个基因的序列。在一些实施方案中，模板序列包含多个基因的序列。在一些实施方案中，模板序列包含至少2个、3个、4个、5个、6个、7个、8个、9个、10个、15个、20个、25个、30个、35个、40个、45个、50个、60个、70个、80个、90个、100个、150个、200个、250个、300个、350个、400个、450个、500个、600个、700个、800个、900个、1000个、1500个或2000个基因的序列。In some embodiments, the template sequence comprises the sequences of one or more genes. In some embodiments, the template sequence comprises the sequences of multiple genes. In some embodiments, the template sequence comprises the sequences of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, 1500, or 2000 genes.

在一些实施方案中，模板序列包含人序列，诸如一个或多个人基因的序列。在一些实施方案中，模板序列包含人基因的子序列。在一些实施方案中，模板序列包含人基因的子序列和人工序列，诸如标记或融合蛋白。在一些实施方案中，模板序列包含一个或多个人基因的序列和人工序列。In some embodiments, the template sequence comprises a human sequence, such as the sequence of one or more human genes. In some embodiments, the template sequence comprises a subsequence of a human gene. In some embodiments, the template sequence comprises a subsequence of a human gene and an artificial sequence, such as a marker or fusion protein. In some embodiments, the template sequence comprises the sequence of one or more human genes and an artificial sequence.

在一些实施方案中，模板序列包含人基因的序列。设想所有人基因都在本公开的范围内。不希望受理论所束缚，将参与疾病发病机理的或作为潜在治疗靶标的人基因转移到模式生物诸如小鼠中，可以促进对疾病的研究和合适疗法的开发。In some implementations, the template sequence contains a human gene sequence. It is envisioned that all human genes are within the scope of this disclosure. Without being bound by theory, transferring human genes involved in disease pathogenesis or serving as potential therapeutic targets into model organisms such as mice can facilitate disease research and the development of appropriate therapies.

包含在模板序列中的示例性基因包括但不限于免疫球蛋白基因、T细胞受体(TCR)基因、免疫检验点基因、细胞因子、趋化因子、受体、转录因子、细胞骨架基因、细胞周期检查基因、癌基因以及与发育、免疫学或神经生物学相关的基因。示例性免疫检查点基因包括BTLA、CTLA-4、TIM-3、PD-1和PD-L1。示例性细胞因子包括白细胞介素(CTNF、IL-16、IL-1B、IL-6、IL-12、IL-17F、IL-2、IL-3、IL-9、IL-12B、IL18BP、IL-21、IL33、瘦素、IL-13、IL1A、IL-23、IL-4)、干扰素(IFNA10、IFN-α7、IFNa4Fc、IFNβ、IFNα4、IFNγ、IFNα5、IFNω)和肿瘤坏死因子(TNFs,例如BAFF,TNFβ、CD30配体、TNFα、CD40配体、TNFSF10、CD27配体)。示例性趋化因子包括CXC、CC CX3C和C家族趋化因子。示例性受体包括G蛋白偶联受体、配体门控离子通道(离子型受体)、激酶连接的受体和相关受体以及核受体。示例性转录因子包括但不限于螺旋-转角-螺旋转录因子(例如Oct-1)、螺旋-环-螺旋转录因子(例如E2A)、锌指转录因子(例如糖皮质激素受体、GATA蛋白)、碱性蛋白-亮氨酸拉链转录因子(例如环AMP应答元件结合因子(CREB)和激活蛋白-1(AP-1))和β-折叠基序转录因子(例如核因子-κB(NF-κB))。示例性细胞周期调节基因包括但不限于细胞周期蛋白、细胞周期蛋白依赖性激酶和细胞周期检查点基因。Exemplary genes included in the template sequence include, but are not limited to, immunoglobulin genes, T-cell receptor (TCR) genes, immune checkpoint genes, cytokines, chemokines, receptors, transcription factors, cytoskeleton genes, cell cycle check genes, oncogenes, and genes related to development, immunology, or neurobiology. Exemplary immune checkpoint genes include BTLA, CTLA-4, TIM-3, PD-1, and PD-L1. Exemplary cytokines include interleukins (CTNF, IL-16, IL-1B, IL-6, IL-12, IL-17F, IL-2, IL-3, IL-9, IL-12B, IL-18BP, IL-21, IL-33, leptin, IL-13, IL-1A, IL-23, IL-4), interferons (IFN10, IFN-α7, IFNa4Fc, IFNβ, IFNα4, IFNγ, IFNα5, IFNω), and tumor necrosis factors (TNFs, such as BAFF, TNFβ, CD30 ligand, TNFα, CD40 ligand, TNFSF10, CD27 ligand). Exemplary chemokines include CXC, CC, CX3C, and C family chemokines. Exemplary receptors include G protein-coupled receptors, ligand-gated ion channels (ionotropic receptors), kinase-linked receptors and related receptors, and nuclear receptors. Exemplary transcription factors include, but are not limited to, helix-turn-helix transcription factors (e.g., Oct-1), helix-loop-helix transcription factors (e.g., E2A), zinc finger transcription factors (e.g., glucocorticoid receptor, GATA protein), basic protein-leucine zipper transcription factors (e.g., cyclic AMP response element binding factor (CREB) and activator protein-1 (AP-1)), and β-sheet motif transcription factors (e.g., nuclear factor-κB (NF-κB)). Exemplary cell cycle regulatory genes include, but are not limited to, cyclins, cyclin-dependent kinases, and cell cycle checkpoint genes.

在一些实施方案中，模板序列包含癌基因或肿瘤抑制基因。适合包含在模板序列中的示例性癌基因和肿瘤抑制基因列于下表1中。In some implementations, the template sequence contains an oncogene or a tumor suppressor gene. Exemplary oncogenes and tumor suppressor genes suitable for inclusion in the template sequence are listed in Table 1 below.

表1.癌基因和肿瘤抑制因子Table 1. Oncogenes and tumor suppressor factors

在一些实施方案中，模板序列包含与遗传疾病或病症相关的人基因的序列。在一些实施方案中，模板序列包含与遗传疾病或病症相关的人染色体区域的序列。与疾病或病症相关的基因和染色体区域的非限制性实例示于下表2中。In some embodiments, the template sequence comprises sequences of human genes associated with a genetic disease or condition. In some embodiments, the template sequence comprises sequences of human chromosomal regions associated with a genetic disease or condition. Non-limiting examples of genes and chromosomal regions associated with diseases or conditions are shown in Table 2 below.

表2.遗传疾病或病症，以及相关的基因或基因组区域Table 2. Genetic diseases or conditions, and related genes or genomic regions.

在一些实施方案中，模板序列包含免疫球蛋白序列。表面免疫球蛋白和分泌型免疫球蛋白都被认为在本发明的范围内。免疫球蛋白识别外来抗原并启动免疫反应。在人中，每个免疫球蛋白分子由两条相同的重链和两条相同的轻链组成，所述重链由14号染色体上的IGH基因座编码，所述轻链由2号染色体上的免疫球蛋白κ基因座(IGK)和22号染色体上的免疫球蛋白λ基因座(IGL)编码。IGH基因座包括V(可变)区、D(多样性)区、J(连接)区和C(恒定)区。V、D和J区各自含有多个不同的基因区段，在本文中统称为IGH可变区。在B细胞发育期间，DNA水平上的重组事件将单个D区段与J区段连接；然后将这个部分重排的D-J区的融合D-J外显子与V区段连接。然后转录包含融合的V-D-J外显子的重排的V-D-J区，并通过RNA剪接将其与恒定区融合。该转录物编码μ重链。在发育晚期，B细胞产生V-D-J-Cμ-Cδ前信使RNA，其被选择性剪接成编码μ或δ重链。淋巴结中的成熟B细胞经历转换重组(switchrecombination)，使得融合的V-D-J基因区段接近IGHG、IGHA或IGHE基因区段之一，并且每个细胞表达γ、α或ε重链。许多不同的V区段与几个J区段的潜在重组提供了广泛的抗原识别。额外的多样性是通过连接多样性获得的，连接多样性是由末端脱氧核糖核苷转移酶随机添加核苷酸和体细胞超突变产生的。每个轻链由两个串联的免疫球蛋白结构域、恒定结构域(C_L)和可变结构域(V_L)组成。对于轻链，V结构域由两个独立的DNA区段编码。第一区段被称为V基因区段，因为其编码大部分V结构域。第二区段编码V结构域的剩余部分，并被称为连接或J基因区段。像重链一样，轻链经过重排将V区段连接到J基因区段，并使V基因靠近恒定区序列，然后仅由内含子分开。IGHV、IGHD、IGHJ、IGHG或IGHA中任一种的IGH序列，或其任意组合，被认为是在本公开的模板序列的范围内。IGK或IGL或其组合的轻链序列被认为在本公开的模板序列的范围内。In some embodiments, the template sequence comprises an immunoglobulin sequence. Both surface immunoglobulins and secreted immunoglobulins are considered to be within the scope of this invention. Immunoglobulins recognize foreign antigens and initiate an immune response. In humans, each immunoglobulin molecule consists of two identical heavy chains and two identical light chains, the heavy chains being encoded by the IGH locus on chromosome 14, and the light chains being encoded by the immunoglobulin κ locus (IGK) on chromosome 2 and the immunoglobulin λ locus (IGL) on chromosome 22. The IGH locus includes a V (variable) region, a D (diversity) region, a J (connection) region, and a C (constant) region. Regions V, D, and J each contain multiple distinct gene segments, collectively referred to herein as the IGH variable region. During B cell development, a DNA-level recombination event connects a single D segment to a J segment; then, the fused DJ exon of this partially rearranged DJ region is connected to the V segment. The rearranged VDJ region, containing the fused VDJ exon, is then transcribed and fused to the constant region via RNA splicing. This transcript encodes the μ heavy chain. During late development, B cells produce VDJ-Cμ-Cδ premessenger RNA, which is selectively spliced to encode either the μ or δ heavy chain. Mature B cells in lymph nodes undergo switch recombination, resulting in a fused VDJ gene segment that approximates one of the IGHG, IGHA, or IGHE gene segments, with each cell expressing either the γ, α, or ε heavy chain. The potential recombination of many different V segments with several J segments provides broad antigen recognition. Additional diversity is achieved through linker diversity, which arises from the random addition of nucleotides by terminal deoxyribonucleoside transferases and somatic hypermutation. Each light chain consists of two tandem immunoglobulin domains: a constant domain ( _CL1 ) and a variable domain ( _V2 ). For light chains, the V domain is encoded by two separate DNA segments. The first segment is called the V gene segment because it encodes the majority of the V domain. The second segment encodes the remainder of the V domain and is referred to as the linker or J gene segment. Like the heavy chain, the light chain rearranges to link the V segment to the J gene segment and bring the V gene closer to the constant region sequence, then separates only by introns. IGH sequences of any one of IGHV, IGHD, IGHJ, IGHG, or IGHA, or any combination thereof, are considered within the scope of the template sequences disclosed herein. Light chain sequences of IGK or IGL, or combinations thereof, are considered within the scope of the template sequences disclosed herein.

在一些实施方案中，工程化的染色体包括其中一个或多个非编码序列可能已被引入所述染色体的小鼠染色体。例如，一个或多个能够调节抗体产生、成熟和/或多样化的非编码序列可能已被引入所述染色体中。例如，一个或多个能够调节抗体多样化的非编码序列可能已被引入所述染色体中。例如，一个或多个能够调节抗体类别转换的非编码序列可能已被引入所述染色体。例如，转换区内的一个或多个非编码序列可能已被引入所述染色体中。例如，当一个或多个非编码序列已被引入所述染色体时，类别转换重组、体细胞超突变和/或激活诱导的胞苷脱氨酶可被调节。例如，当一个或多个非编码序列被引入所述染色体时，Ig序列库的多样性可被调节。例如，重链、κ轻链和λ轻链基因座上含有重排基因的约2kb的可变区，和/或重链基因座上含有大量富含G:C的DNA区段的约4kb的转换区可能已被引入所述染色体中。In some embodiments, the engineered chromosome includes a mouse chromosome in which one or more non-coding sequences may have been introduced. For example, one or more non-coding sequences capable of regulating antibody production, maturation, and/or diversification may have been introduced into the chromosome. For example, one or more non-coding sequences capable of regulating antibody diversification may have been introduced into the chromosome. For example, one or more non-coding sequences capable of regulating antibody class switching may have been introduced into the chromosome. For example, one or more non-coding sequences within a switching region may have been introduced into the chromosome. For example, when one or more non-coding sequences have been introduced into the chromosome, class switching recombination, somatic hypermutation, and/or activation-induced cytidine deaminase can be regulated. For example, when one or more non-coding sequences have been introduced into the chromosome, the diversity of the Ig sequence library can be regulated. For example, approximately 2 kb of variable regions containing rearranged genes at the heavy chain, κ light chain, and λ light chain loci, and/or approximately 4 kb of switching regions containing a large number of G:C-rich DNA segments at the heavy chain loci, may have been introduced into the chromosome.

在一些实施方案中，模板序列包含人IGH序列。人IGH跨越人基因组的GRCh38.p13装配体的14号染色体的核苷酸位置105,586,437至106,879,844。本领域技术人员将会理解，具有5’和3’边界的人IGH序列是合适的模板序列，所述边界偏离上文所述的那些例如至少100bp、500bp、1,000bp、2,000bp、5,000bp、10,000bp或更多。In some embodiments, the template sequence comprises a human IGH sequence. Human IGH spans nucleotide positions 105,586,437 to 106,879,844 on chromosome 14 of the GRCh38.p13 assembly of the human genome. Those skilled in the art will understand that a human IGH sequence with 5' and 3' boundaries is a suitable template sequence, said boundaries deviating from those described above by, for example, at least 100 bp, 500 bp, 1,000 bp, 2,000 bp, 5,000 bp, 10,000 bp, or more.

在一些实施方案中，模板序列包含人IGH可变区序列。在一些实施方案中，人IGH可变区序列包含编码人V_H、D_H和J_H1-6基因区段的序列和间插非编码序列。在一些实施方案中，人IGH可变区序列包含人基因组的GRCh38.p13装配体的14号染色体的核苷酸位置105,862,994至106,811,028。在一些实施方案中，人IGH可变区序列包含人基因组的GRCh38.p13装配体的14号染色体的核苷酸位置105,862,994至106,811,028，从5’末端、3’末端或两端减去至少约50bp、100bp、500bp、1,000bp、2,000bp、5,000bp、7,000bp、10,000bp、15,000bp、20,000bp或50,000bp。在一些实施方案中，人IGH可变区序列包含人基因组的GRCh38.p13组装体的14号染色体的核苷酸位置105,862,994至106,811,028，以及在5’末端、3’末端或两端的至少约50bp、100bp、500bp、1,000bp、2,000bp、5,000bp、7,000bp、10,000bp、15,000bp、20,000bp或50,000bp的额外侧翼序列。在一些实施方案中，人IGH可变区序列包含人基因组的GRCh38.p13装配体的14号染色体的核苷酸位置105,862,994至106,811,028，以及对其的一个或多个修饰。示例性修饰包括但不限于缺失(诸如一个或多个V、D或J区段的缺失)、插入(诸如标记的插入)、重排或其组合。In some embodiments, the template sequence comprises a human IGH variable region sequence. In some embodiments, the human IGH variable region sequence comprises sequences encoding human _VH , _DH , and _JH 1–6 gene segments and intercalated non-coding sequences. In some embodiments, the human IGH variable region sequence comprises nucleotide positions 105,862,994 to 106,811,028 on chromosome 14 of the GRCh38.p13 assembly of the human genome. In some implementations, the human IGH variable region sequence comprises nucleotide positions 105,862,994 to 106,811,028 of chromosome 14 of the GRCh38.p13 assembly of the human genome, with at least about 50 bp, 100 bp, 500 bp, 1,000 bp, 2,000 bp, 5,000 bp, 7,000 bp, 10,000 bp, 15,000 bp, 20,000 bp, or 50,000 bp removed from the 5' end, 3' end, or both ends. In some embodiments, the human IGH variable region sequence comprises nucleotide positions 105,862,994 to 106,811,028 on chromosome 14 of the GRCh38.p13 assembly of the human genome, and additional flanking sequences of at least about 50 bp, 100 bp, 500 bp, 1,000 bp, 2,000 bp, 5,000 bp, 7,000 bp, 10,000 bp, 15,000 bp, 20,000 bp, or 50,000 bp at the 5' end, 3' end, or both ends. In some embodiments, the human IGH variable region sequence comprises nucleotide positions 105,862,994 to 106,811,028 on chromosome 14 of the GRCh38.p13 assembly of the human genome, and one or more modifications thereof. Exemplary modifications include, but are not limited to, deletions (such as deletions of one or more V, D, or J segments), insertions (such as the insertion of markers), rearrangements, or combinations thereof.

在一些实施方案中，模板序列包含T细胞受体亚单位(TCR)的序列。T细胞受体(TCR)是在T细胞或T淋巴细胞表面发现的蛋白质复合物，[1]其负责将抗原片段识别为与主要组织相容性复合物(MHC)分子结合的肽。TCR包含二硫键连接的膜结合异二聚体蛋白，在大多数情况下其由高度可变的α和β链组成，所述α和β链作为与不变CD3链分子(CD3δ、CD3ε、CD3γ和CD3ζ)的复合物的一部分表达。表达这两条链的T细胞被称为α:β(或αβ)T细胞。少数T细胞表达由可变γ和σ链形成的替代受体，称为γσT细胞。TCR发育通过淋巴细胞特异性基因重组过程发生，所述过程从大量潜在区段组装成最终序列，这通过胸腺中的T细胞中的TCR基因区段的重组发生。TCRα基因座包含可变(V)和连接(J)基因区段(Vβ和Jβ)，而TCRβ基因座除了Vα和Jα区段之外还包含D基因区段。因此，α链由VJ重组产生，β链参与VDJ重组。这与γδTCR的开发类似，其中TCRγ链参与VJ重组，TCRδ基因由VDJ重组产生。TCRα链基因座由46个可变区段、8个连接区段和恒定区组成。TCRβ链基因座由48个可变区段、继之以两个多样性区段、12个连接区段和两个恒定区组成。包含本文所述的任何TCR亚单位的序列、其子序列或其组合的模板序列被认为在本公开的范围内。在一些实施方案中，模板序列包含TCRα链可变区序列(由T细胞受体α基因座或TRA编码)、TCRβ链可变区序列(由T细胞受体β基因座或TRB编码)、TCRγ可变区序列(由T细胞受体γ基因座或TRG编码)或TCRδ可变区序列(由T细胞受体δ基因座或TRD编码)。In some implementations, the template sequence contains the sequence of a T cell receptor subunit (TCR). The T cell receptor (TCR) is a protein complex found on the surface of T cells or T lymphocytes,[1] which is responsible for recognizing antigen fragments as peptides that bind to major histocompatibility complex (MHC) molecules. The TCR contains a membrane-bound heterodimer protein linked by disulfide bonds, which in most cases consists of highly variable α and β chains expressed as part of a complex with invariant CD3 chain molecules (CD3δ, CD3ε, CD3γ, and CD3ζ). T cells expressing these two chains are called α:β (or αβ) T cells. A minority of T cells express alternative receptors formed by variable γ and σ chains, called γσ T cells. TCR development occurs through a lymphocyte-specific gene recombination process that assembles a large number of potential segments into a final sequence, which occurs through recombination of TCR gene segments in T cells in the thymus. The TCRα locus contains variable (V) and linker (J) gene segments (Vβ and Jβ), while the TCRβ locus contains a D gene segment in addition to the Vα and Jα segments. Therefore, the α chain is generated by VJ recombination, and the β chain participates in VDJ recombination. This is similar to the development of γδTCR, where the TCRγ chain participates in VJ recombination, and the TCRδ gene is generated by VDJ recombination. The TCRα chain locus consists of 46 variable segments, 8 linker segments, and constant regions. The TCRβ chain locus consists of 48 variable segments, followed by two diversity segments, 12 linker segments, and two constant regions. Template sequences containing any TCR subunit described herein, its subsequences, or combinations thereof are considered to be within the scope of this disclosure. In some implementations, the template sequence includes a TCRα chain variable region sequence (encoded by the T cell receptor α locus or TRA), a TCRβ chain variable region sequence (encoded by the T cell receptor β locus or TRB), a TCRγ variable region sequence (encoded by the T cell receptor γ locus or TRG), or a TCRδ variable region sequence (encoded by the T cell receptor δ locus or TRD).

在一些实施方案中，模板序列包含编码抗体或抗原结合片段的序列。In some implementations, the template sequence contains a sequence encoding an antibody or antigen-binding fragment.

如本文中所用，术语“抗体”是指与特定抗原特异性结合或与特定抗原发生免疫反应的免疫球蛋白分子，包括多克隆抗体、单克隆抗体、基因工程抗体和以其它方式修饰的抗体形式，包括但不限于嵌合抗体、人源化抗体、杂缀合抗体(heteroconjugate antibody)(例如，双-三-和四-特异性抗体、双链抗体、三链抗体和四链抗体)，以及抗体的抗原结合片段，包括例如Fab′、F(ab′)₂、Fab、Fv、rlgG和scFv片段。除非另有说明，否则术语“单克隆抗体”(mAb)意味着包括完整分子，以及能够与靶蛋白特异性结合的抗体片段(包括，例如，Fab和F(ab′)₂片段)。如本文中所用，Fab和F(ab’)₂片段是指缺少完整抗体的Fc片段的抗体片段。本文描述了这些抗体片段的实例。As used herein, the term "antibody" refers to an immunoglobulin molecule that specifically binds to or reacts with a specific antigen, including polyclonal antibodies, monoclonal antibodies, genetically engineered antibodies, and other modified antibody forms, including but not limited to chimeric antibodies, humanized antibodies, heteroconjugate antibodies (e.g., bi-, tri-, and tetra-specific antibodies, double-chain antibodies, triple-chain antibodies, and quadruple-chain antibodies), and antigen-binding fragments of antibodies, including, for example, Fab′, F(ab′) ₂ , Fab, Fv, rlgG, and scFv fragments. Unless otherwise stated, the term "monoclonal antibody" (mAb) means including the complete molecule as well as antibody fragments (including, for example, Fab and F(ab′) ₂ fragments) capable of specifically binding to a target protein. As used herein, Fab and F(ab′) ₂ fragments refer to antibody fragments lacking the Fc fragment of the complete antibody. Examples of such antibody fragments are described herein.

如本文中所用，术语“抗原结合片段”是指保留了与靶抗原特异性结合的能力的抗体的一个或多个片段。抗体的抗原结合功能可通过全长抗体的片段来实现。抗体片段可以是例如Fab、F(ab′)₂、scFv、双链抗体、三链抗体、亲和体(affibody)、纳米抗体、适体或结构域抗体。抗体的术语“抗原结合片段”所包含的结合片段的实例包括但不限于：(i)Fab片段，由VL、VH、CL和CH1结构域组成的单价片段；(ii)F(ab′)2片段，含有在铰链区通过二硫键连接两个Fab片段的二价片段；(iii)由VH和CH1结构域组成的Fd片段；(iv)由抗体单臂的VL和VH结构域组成的Fv片段，(v)包括VH和VL结构域的dAb；(vi)由VH结构域组成的dAb片段(参见，例如，Ward等人，Nature 341:544-546，1989)；(vii)由VH或VL域组成的dAb；(viii)分离的互补决定区(CDR)；和(ix)两个或更多个(例如，两个、三个、四个、五个或六个)分离的CDR的组合，所述CDR可以任选地通过合成接头连接。此外，尽管Fv片段的两个结构域VL和VH是由独立的基因编码的，但是它们可以使用重组方法通过接头连接，所述接头使它们能够成为单个蛋白链，其中VL和VH区配对形成单价分子(称为单链Fv(scFv))；参见，例如，Bird等人，Science 242:423-426,1988and Huston等人，Proc.Natl.Acad.Sci.USA 85:5879-5883,1988)。可使用本领域技术人员已知的常规技术获得这些抗体片段，并且可以以与完整抗体相同的方式筛选所述片段的实用性。抗原结合片段可通过重组DNA技术、对完整免疫球蛋白的酶促或化学切割，或者在某些情况下，通过本领域已知的化学肽合成方法来产生。As used herein, the term "antigen-binding fragment" refers to one or more fragments of an antibody that retain the ability to specifically bind to a target antigen. The antigen-binding function of an antibody can be achieved through fragments of a full-length antibody. Antibody fragments can be, for example, Fab, F(ab′) ₂ , scFv, double-stranded antibodies, triple-stranded antibodies, affibody, nanobody, aptamer, or domain antibody. Examples of binding fragments included in the term "antigen-binding fragment" of an antibody include, but are not limited to: (i) Fab fragments, monovalent fragments consisting of VL, VH, CL, and CH1 domains; (ii) F(ab′)2 fragments, containing a divalent fragment containing two Fab fragments linked by disulfide bonds in a hinge region; (iii) Fd fragments consisting of VH and CH1 domains; (iv) Fv fragments consisting of VL and VH domains of an antibody arm; (v) dAb fragments including VH and VL domains; (vi) dAb fragments consisting of VH domains (see, for example, Ward et al., Nature 341:544-546, 1989); (vii) dAbs consisting of VH or VL domains; (viii) isolated complementarity-determining regions (CDRs); and (ix) combinations of two or more (e.g., two, three, four, five, or six) isolated CDRs, which may optionally be linked by synthetic linkers. Furthermore, although the two domains VL and VH of the Fv fragment are encoded by independent genes, they can be linked by a linker using recombination methods, which allows them to become a single protein chain, where the VL and VH regions pair to form a monovalent molecule (called a single-stranded Fv (scFv)); see, for example, Bird et al., Science 242:423-426, 1988 and Huston et al., Proc. Natl. Acad. Sci. USA 85:5879-5883, 1988). These antibody fragments can be obtained using conventional techniques known to those skilled in the art, and the applicability of the fragments can be screened in the same manner as intact antibodies. Antigen-binding fragments can be generated by recombinant DNA technology, enzymatic or chemical cleavage of intact immunoglobulins, or, in some cases, by chemical peptide synthesis methods known in the art.

如本文中所用，术语“互补决定区”(CDR)是指在抗体的轻链和重链可变结构域中都存在的高变区。可变结构域的更高度保守的部分被称为框架区(FR)。描述抗体高变区的氨基酸位置可以变化，这取决于上下文和本领域已知的各种定义。可变结构域内的一些位置可被视为杂交高变位置，因为这些位置在一组标准下可被视为在高变区内，而在另一组标准下被视为在高变区外。这些位置中的一个或多个也可存在于延伸的高变区中。本文描述的抗体可在这些杂合高变位置上包含修饰。天然重链和轻链的可变结构域各自包含通过三个CDR连接的四个主要采用β-折叠构型的框架区，所述CDR形成连接β-折叠结构的环，在某些情况下形成β-折叠结构的一部分。每条链中的CDR通过框架区以FR1-CDR1-FR2-CDR2-FR3-CDR3-FR4的顺序紧密结合在一起，并且与来自另一条抗体链的CDR一起促成抗体的靶结合部位的形成(参见Kabat等人，Sequences of Proteins of Immunological Interest,National Institute of Health,Bethesda,Md.,1987)。如本文中所用，除非另有说明，否则根据Kabat等人的免疫球蛋白氨基酸残基编号系统进行免疫球蛋白氨基酸残基的编号。As used herein, the term “complementarity-determining region” (CDR) refers to a hypervariable region present in both the light and heavy chain variable domains of an antibody. The more conserved portion of the variable domain is called the frame region (FR). The amino acid positions describing the antibody’s hypervariable region can vary depending on the context and various definitions known in the art. Some positions within the variable domain can be considered hybrid hypervariable positions because they are considered within the hypervariable region under one set of criteria and outside the hypervariable region under another set of criteria. One or more of these positions can also be present in extended hypervariable regions. The antibodies described herein may contain modifications at these hybrid hypervariable positions. The variable domains of the native heavy and light chains each contain four frame regions predominantly employing a β-sheet configuration, linked by three CDRs that form loops connecting β-sheet structures and, in some cases, form part of a β-sheet structure. CDRs in each chain are tightly bound together via framework regions in the sequence FR1-CDR1-FR2-CDR2-FR3-CDR3-FR4, and together with CDRs from other antibody chains, contribute to the formation of the antibody's target binding site (see Kabat et al., Sequences of Proteins of Immunological Interest, National Institute of Health, Bethesda, Md., 1987). As used herein, unless otherwise stated, immunoglobulin amino acid residues are numbered according to the immunoglobulin amino acid residue numbering system of Kabat et al.

在一些实施方案中，抗体或抗原结合片段包括人抗体或抗原结合片段。在一些实施方案中，对抗体或抗原结合片段进行人源化。In some embodiments, the antibody or antigen-binding fragment includes a human antibody or antigen-binding fragment. In some embodiments, the antibody or antigen-binding fragment is humanized.

本领域普通技术人员将理解，模板序列还可包括在特定组织、细胞类型或生物体中表达基因(诸如抗体)所必需的序列。此类序列包括但不限于启动子、增强子、非翻译序列诸如信使RNA(mRNA)的5’和3’非翻译区、多腺苷酸化(polyA)序列、内含子、内部核糖体进入位点(IRES)等。合适序列的选择对本领域普通技术人员来说是显而易见的。Those skilled in the art will understand that the template sequence may also include sequences necessary for gene expression (such as antibodies) in a particular tissue, cell type, or organism. Such sequences include, but are not limited to, promoters, enhancers, untranslated sequences such as the 5' and 3' untranslated regions of messenger RNA (mRNA), polyadenylation (polyA) sequences, introns, internal ribosome entry sites (IRES), etc. The selection of a suitable sequence will be apparent to those skilled in the art.

在一些实施方案中，模板序列包含启动子。在一些实施方案中，启动子包含内源启动子，即启动子是通常与包含在模板序列中的基因相关的启动子。在一些实施方案中，启动子不是内源启动子，例如，从模板序列中与启动子可操作地连接的基因之外的另一个基因或生物中分离或衍生的启动子。例如，模板序列包含编码抗体或抗原结合片段的序列，该序列与不是免疫球蛋白启动子的启动子可操作地连接。在一些实施方案中，启动子是组成型启动子、诱导型启动子或组织特异性启动子。在一些实施方案中，启动子分离自或衍生自哺乳动物基因，例如在淋巴细胞中表达的基因。In some embodiments, the template sequence contains a promoter. In some embodiments, the promoter contains an endogenous promoter, i.e., a promoter that is typically associated with a gene contained in the template sequence. In some embodiments, the promoter is not an endogenous promoter, for example, a promoter isolated or derived from a gene other than the gene operatively linked to the promoter in the template sequence, or from an organism. For example, the template sequence contains a sequence encoding an antibody or antigen-binding fragment that is operatively linked to a promoter that is not an immunoglobulin promoter. In some embodiments, the promoter is a constitutive promoter, an inducible promoter, or a tissue-specific promoter. In some embodiments, the promoter is isolated from or derived from a mammalian gene, such as a gene expressed in lymphocytes.

可用于表达模板序列的基因的示例性启动子包括但不限于SV40早期启动子区、劳斯肉瘤病毒的3’长末端重复序列中包含的启动子、金属硫蛋白基因的调控序列、四环素(Tet)启动子、来自酵母或其它真菌的启动子元件诸如Gal启动子、ADC(乙醇脱氢酶)启动子、PGK(磷酸甘油激酶)启动子、碱性磷酸酶启动子和下列动物转录控制区，所述转录控制区表现出组织特异性并已被用于转基因动物：在胰腺腺泡细胞中有活性的弹性蛋白酶I基因控制区；在胰腺β细胞中有活性的胰岛素基因控制区、在淋巴样细胞中有活性的免疫球蛋白基因控制区、在睾丸细胞、乳腺细胞、淋巴样细胞和肥大细胞中有活性的小鼠乳腺肿瘤病毒控制区、在肝脏中有活性的白蛋白基因控制区、在肝脏中有活性的甲胎蛋白基因控制区、在肝脏中有活性的α1-抗胰蛋白酶基因控制区、在髓样细胞中有活性的β-珠蛋白基因控制区、在大脑少突胶质细胞中有活性的髓鞘碱性蛋白基因控制区、在骨骼肌中有活性的肌球蛋白轻链-2基因控制区、在神经元细胞中有活性的神经元特异性烯醇化酶(NSE)、在神经元细胞中有活性的脑源性神经营养因子(BDNF)基因控制区、在星形胶质细胞中有活性的神经胶质原纤维酸性蛋白质(GFAP)启动子，以及在下丘脑中有活性的促性腺激素激素基释放因控制区。Exemplary promoters that can be used to express the template sequence include, but are not limited to, the SV40 early promoter region, promoters contained in the 3' long terminal repeat sequence of Rous sarcoma virus, regulatory sequences of metallothionein genes, tetracycline (Tet) promoters, promoter elements from yeast or other fungi such as Gal promoters, ADC (alcohol dehydrogenase) promoters, PGK (glycerol phosphate kinase) promoters, alkaline phosphatase promoters, and the following animal transcription control regions that exhibit tissue specificity and have been used in transgenic animals: elastase I gene control regions active in pancreatic acinar cells; insulin gene control regions active in pancreatic β cells; immunoglobulin gene control regions active in lymphoid cells; and regions active in testicular cells, mammary cells, lymphoid cells, and... The study included active mouse mammary tumor virus control regions in mast cells, active albumin gene control regions in the liver, active alpha-fetoprotein gene control regions in the liver, active α1-antitrypsin gene control regions in the liver, active β-globin gene control regions in myeloid cells, active myelin basic protein gene control regions in oligodendrocytes of the brain, active myosin light chain-2 gene control regions in skeletal muscle, active neuron-specific enolase (NSE) and brain-derived neurotrophic factor (BDNF) gene control regions in neuronal cells, active glial fibrillary acidic protein (GFAP) promoters in astrocytes, and active gonadotropin-releasing gene control regions in the hypothalamus.

靶染色体target chromosome

本公开提供了用于本文所述方法的包含靶序列的靶染色体。This disclosure provides a target chromosome containing a target sequence for use in the methods described herein.

如本文中所用，“靶染色体”是指含有“靶序列”的染色体，或者，在其中通过插入模板序列没有明显删除靶序列的情况下，是指“靶位置”。靶序列是指通过使用本文所述方法插入模板序列而删除的靶染色体序列。靶位置是指靶染色体中模板序列被插入(用于插入)或与其连接(用于染色体易位或重排)的位置。As used herein, "target chromosome" refers to a chromosome containing a "target sequence," or, where there is no explicit deletion of the target sequence by inserting a template sequence, a "target location." A target sequence is a target chromosome sequence deleted by inserting a template sequence using the methods described herein. A target location is the position in the target chromosome where the template sequence is inserted (for insertion) or connected to it (for chromosomal translocation or rearrangement).

靶染色体可从任何合适的来源分离或衍生。在一些实施方案中，靶染色体来自真核生物。在一些实施方案中，真核生物是脊椎动物，诸如鸟类、爬行动物或哺乳动物。在一些实施方案中，靶染色体来自小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、猴或鸡。在一些实施方案中，靶染色体来自小鼠。在一些实施方案中，靶染色体来自大鼠。在一些实施方案中，靶染色体来自猴子。The target chromosome can be isolated or derived from any suitable source. In some embodiments, the target chromosome is derived from a eukaryote. In some embodiments, the eukaryote is a vertebrate, such as a bird, reptile, or mammal. In some embodiments, the target chromosome is derived from a mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cow, horse, camel, monkey, or chicken. In some embodiments, the target chromosome is derived from a mouse. In some embodiments, the target chromosome is derived from a rat. In some embodiments, the target chromosome is derived from a monkey.

在一些实施方案中，模板染色体和靶染色体来自不同物种。例如，模板染色体来自人，靶染色体来自小鼠。在一些实施方案中，模板染色体和靶染色体来自同一物种。In some implementations, the template chromosome and the target chromosome are from different species. For example, the template chromosome is from a human, and the target chromosome is from a mouse. In other implementations, the template chromosome and the target chromosome are from the same species.

在一些实施方案中，靶染色体是人工染色体。In some implementations, the target chromosome is an artificial chromosome.

在一些实施方案中，靶染色体是天然存在的染色体。In some implementations, the target chromosome is a naturally occurring chromosome.

在一些实施方案中，靶染色体包含对天然存在的染色体的一个或多个修饰。修饰尤其包括序列的插入、缺失和重排。插入靶染色体中的序列的实例尤其包括标记、启动子、cDNA序列、非编码序列等。合适的标记包括选择标记，诸如表3中公开的那些，以及可检测的标记，诸如GFP、mCherry等。In some implementations, the target chromosome contains one or more modifications to a naturally occurring chromosome. Modifications include, in particular, sequence insertions, deletions, and rearrangements. Examples of sequences inserted into the target chromosome include, in particular, markers, promoters, cDNA sequences, non-coding sequences, etc. Suitable markers include selection markers, such as those disclosed in Table 3, and detectable markers, such as GFP, mCherry, etc.

在一些实施方案中，靶染色体包含位于模板序列5’的核酸内切酶位点。在一些实施方案中，靶染色体包含位于靶序列3’的核酸内切酶位点。在一些实施方案中，核酸内切酶位点紧邻靶序列。在一些实施方案中，核酸内切酶位点位于靶序列附近。In some embodiments, the target chromosome contains an endonuclease site located at the 5' of the template sequence. In some embodiments, the target chromosome contains an endonuclease site located at the 3' of the target sequence. In some embodiments, the endonuclease site is adjacent to the target sequence. In some embodiments, the endonuclease site is located near the target sequence.

在一些实施方案中，靶染色体在靶序列的任一侧包含核酸内切酶位点。例如，靶染色体包含位于靶序列5’的第一核酸内切酶位点和位于靶序列3’的第二核酸内切酶位点。在一些实施方案中，第一和第二核酸内切酶位点都被同一核酸内切酶识别和切割。例如，第一和第二核酸内切酶位点均包含相同的DNA序列，其被同一核酸内切酶识别。在一些实施方案中，第一核酸内切酶位点被第一核酸内切酶切割，第二核酸内切酶位点被第二核酸内切酶切割。例如，第一和第二内切核酸酶位点包含由两种不同的锌指核酸酶(ZFN)识别的不同DNA序列，或由包含含有不同靶向序列的引导核酸(gNA)的CRISPR/Cas核糖核蛋白复合物识别的两种不同的CRISPR/Cas靶序列。在一些实施方案中，第一和/或第二核酸内切酶位点紧邻靶序列。在一些实施方案中，第一和/或第二核酸内切酶位点位于靶序列附近。In some embodiments, the target chromosome contains endonuclease sites on either side of the target sequence. For example, the target chromosome contains a first endonuclease site at the 5' of the target sequence and a second endonuclease site at the 3' of the target sequence. In some embodiments, both the first and second endonuclease sites are recognized and cleaved by the same endonuclease. For example, both the first and second endonuclease sites contain the same DNA sequence, which is recognized by the same endonuclease. In some embodiments, the first endonuclease site is cleaved by a first endonuclease, and the second endonuclease site is cleaved by a second endonuclease. For example, the first and second endonuclease sites contain different DNA sequences recognized by two different zinc finger nucleases (ZFNs), or two different CRISPR/Cas target sequences recognized by a CRISPR/Cas ribonucleoprotein complex containing guide nucleic acids (gNAs) with different target sequences. In some embodiments, the first and/or second endonuclease sites are adjacent to the target sequence. In some embodiments, the first and/or second endonuclease sites are located near the target sequence.

模板序列的5个碱基对(bp)内、10bp内、15bp内、20bp内、30bp内、40bp内、50bp内、70bp内、80bp内、90bp内、100bp内、120bp内、140bp内、160bp内、180bp内、200bp内、250bp内、300bp内、400bp内或500bp内的核酸内切酶位点被认为靠近靶序列。Endonuclease sites within 5 base pairs (bp), 10bp, 15bp, 20bp, 30bp, 40bp, 50bp, 70bp, 80bp, 90bp, 100bp, 120bp, 140bp, 160bp, 180bp, 200bp, 250bp, 300bp, 400bp, or 500bp of the template sequence are considered to be close to the target sequence.

在一些实施方案中，靶染色体包含用于促进同源定向修复的核酸分子同源臂的一个或多个序列。在一些实施方案中，靶染色体包含位于靶序列5’的同源臂序列。在一些实施方案中，靶染色体从5’至3’包含同源臂序列、核酸内切酶位点和靶序列。在一些实施方案中，靶染色体包含位于靶序列3’的同源臂序列。在一些实施方案中，靶染色体从5’至3’包含靶序列、核酸内切酶位点和同源臂序列。在一些实施方案中，核酸内切酶位点位于同源臂序列与靶序列之间。In some embodiments, the target chromosome includes one or more sequences of homologous arms of nucleic acid molecules for promoting homology-directed repair. In some embodiments, the target chromosome includes a homologous arm sequence located at the 5' of the target sequence. In some embodiments, the target chromosome includes a homologous arm sequence, a nuclease site, and the target sequence from 5' to 3'. In some embodiments, the target chromosome includes a homologous arm sequence located at the 3' of the target sequence. In some embodiments, the target chromosome includes the target sequence, a nuclease site, and a homologous arm sequence from 5' to 3'. In some embodiments, the nuclease site is located between the homologous arm sequence and the target sequence.

在一些实施方案中，靶染色体包含靶序列的5’第一同源臂序列和靶序列的3’第二同源臂序列。即，靶染色体在靶序列的上游和下游都包含同源臂。在一些实施方案中，第一同源臂是第一核酸分子的5’同源臂，所述第一核酸分子从5’至3’包含第一同源臂、至少第一标记的序列和包含模板序列5’末端上游的核苷酸序列的3’同源臂。在一些实施方案中，第二同源臂是第二核酸分子的3’同源臂，所述第二核酸分子从5’至3’包含含有模板序列3’末端下游的核苷酸序列的5’同源臂、至少第二标记的序列和第二同源臂。在一些实施方案中，靶染色体从5’至3’包含第一同源臂序列、第一核酸内切酶位点、靶序列、第二核酸内切酶位点和第二同源臂序列。In some embodiments, the target chromosome comprises a 5' first homologous arm sequence and a 3' second homologous arm sequence of the target sequence. That is, the target chromosome contains homologous arms both upstream and downstream of the target sequence. In some embodiments, the first homologous arm is the 5' homologous arm of a first nucleic acid molecule, which comprises, from 5' to 3', a first homologous arm, at least a first labeled sequence, and a 3' homologous arm containing a nucleotide sequence upstream of the 5' end of the template sequence. In some embodiments, the second homologous arm is the 3' homologous arm of a second nucleic acid molecule, which comprises, from 5' to 3', a 5' homologous arm containing a nucleotide sequence downstream of the 3' end of the template sequence, at least a second labeled sequence, and a second homologous arm. In some embodiments, the target chromosome comprises, from 5' to 3', a first homologous arm sequence, a first endonuclease site, the target sequence, a second endonuclease site, and a second homologous arm sequence.

在一些实施方案中，靶染色体的第一和/或第二同源臂序列紧邻第一和/或第二核酸内切酶位点。在一些实施方案中，第一同源臂序列紧邻第一核酸内切酶位点，第二同源臂序列紧邻第二核酸内切酶位点，其中第一核酸内切酶位点位于第一同源臂与靶序列之间，第二核酸内切酶位点位于靶序列与第二同源臂之间。In some embodiments, the first and/or second homologous arm sequence of the target chromosome is adjacent to the first and/or second endonuclease site. In some embodiments, the first homologous arm sequence is adjacent to the first endonuclease site, and the second homologous arm sequence is adjacent to the second endonuclease site, wherein the first endonuclease site is located between the first homologous arm and the target sequence, and the second endonuclease site is located between the target sequence and the second homologous arm.

在一些实施方案中，第一和/或第二同源臂序列位于靶序列附近。位于靶序列的5bp内、10bp内、15bp内、20bp内、30bp内、40bp内、50bp内、70bp内、80bp内、90bp内、100bp内、120bp内、140bp内、160bp内、180bp内、200bp内或250bp内的核酸内切酶位点可被认为靠近靶序列。In some implementations, the first and/or second homologous arm sequences are located near the target sequence. Endonuclease sites located within 5 bp, 10 bp, 15 bp, 20 bp, 30 bp, 40 bp, 50 bp, 70 bp, 80 bp, 90 bp, 100 bp, 120 bp, 140 bp, 160 bp, 180 bp, 200 bp, or 250 bp of the target sequence can be considered close to the target sequence.

在一些实施方案中，靶染色体从5’至3’包含第一同源臂、第一核酸内切酶位点、靶序列、第二核酸内切酶位点和第二同源臂。In some implementations, the target chromosome from 5' to 3' includes a first homologous arm, a first endonuclease site, a target sequence, a second endonuclease site, and a second homologous arm.

在一些实施方案中，当插入模板序列时，几乎没有或没有靶染色体序列被删除，并且靶序列在本文中可互换地称为“靶位点”或“靶位置”。本领域普通技术人员将理解，在这些情况下，同源臂和核酸内切酶位点的排列类似于上文所述的那些排列，除了同源臂在靶位置处位于核酸内切酶位点的侧翼，而不是靶序列本身的侧翼为核酸内切酶位点。在一些实施方案中，靶染色体从5’至3’包含第一同源臂的序列、核酸内切酶位点和第二同源臂的序列。在一些实施方案中，第一同源臂是第一核酸分子的5’同源臂，所述第一核酸分子从5’至3’包含第一同源臂、至少第一标记的序列和包含模板序列5’末端上游的核苷酸序列的3’同源臂。在一些实施方案中，第二同源臂是第二核酸分子的3’同源臂，所述第二核酸分子从5’至3’包含含有模板序列3’末端下游的核苷酸序列的5’同源臂、至少第二标记的序列和第二同源臂。In some embodiments, when the template sequence is inserted, little or no target chromosome sequence is deleted, and the target sequence is interchangeably referred to herein as a "target site" or "target location." Those skilled in the art will understand that in these cases, the arrangement of homologous arms and endonuclease sites is similar to those described above, except that the homologous arms at the target location are flanked by the endonuclease site, rather than flanking the target sequence itself. In some embodiments, the target chromosome from 5' to 3' comprises the sequence of a first homologous arm, the endonuclease site, and the sequence of a second homologous arm. In some embodiments, the first homologous arm is the 5' homologous arm of a first nucleic acid molecule, which from 5' to 3' comprises the first homologous arm, at least a first labeled sequence, and a 3' homologous arm comprising a nucleotide sequence upstream of the 5' end of the template sequence. In some embodiments, the second homologous arm is the 3' homologous arm of a second nucleic acid molecule, which from 5' to 3' comprises a 5' homologous arm containing a nucleotide sequence downstream of the 3' end of the template sequence, at least a second labeled sequence, and a second homologous arm.

在一些实施方案中，模板序列与靶序列连接产生染色体重排或易位。在一些实施方案中，靶染色体从5’至到3’包含靶染色体同源臂序列和核酸内切酶位点。在一些实施方案中，靶染色体同源臂包含核酸分子的5’同源臂，所述核酸分子从5’至3’包含靶序列同源臂、至少一个标记和包含模板序列5’末端上游的核苷酸序列的3’同源臂。在一些实施方案中，靶染色体从5’至3’包含核酸内切酶位点和靶染色体同源臂序列。在一些实施方案中，靶染色体同源臂包含核酸分子的3’同源臂，所述核酸分子从5’至3’包含含有模板序列3’末端下游的核苷酸序列的5’同源臂、至少第一标记和靶序列同源臂。In some embodiments, the connection between the template sequence and the target sequence results in chromosomal rearrangements or translocations. In some embodiments, the target chromosome includes a target chromosome homologous arm sequence and a nuclease site from 5' to 3'. In some embodiments, the target chromosome homologous arm includes a 5' homologous arm of a nucleic acid molecule, said nucleic acid molecule including a target sequence homologous arm from 5' to 3', at least one marker, and a 3' homologous arm containing a nucleotide sequence upstream of the 5' end of the template sequence. In some embodiments, the target chromosome includes a nuclease site and a target chromosome homologous arm sequence from 5' to 3'. In some embodiments, the target chromosome homologous arm includes a 3' homologous arm of a nucleic acid molecule, said nucleic acid molecule including a 5' homologous arm containing a nucleotide sequence downstream of the 3' end of the template sequence, at least a first marker, and a target sequence homologous arm.

在一些实施方案中，靶染色体的第一和/或第二同源臂序列的长度介于约20bp与2,000bp之间、介于约50bp与1,500bp之间、介于约100bp与1,400bp之间、介于约150bp与1,300bp之间、介于约200bp与1,200bp之间、介于约300bp与1,100bp之间、介于约400bp与1,000bp之间或介于约500bp与900bp或介于约600bp与800bp之间。在一些实施方案中，靶染色体的同源序列的长度介于约400bp与1,500bp之间。在一些实施方案中，靶染色体的同源序列的长度介于约500bp与1,300bp之间。在一些实施方案中，靶染色体的同源序列的长度在约600bp与1,000bp之间。In some embodiments, the length of the first and/or second homologous arm sequence of the target chromosome is between about 20 bp and 2,000 bp, between about 50 bp and 1,500 bp, between about 100 bp and 1,400 bp, between about 150 bp and 1,300 bp, between about 200 bp and 1,200 bp, between about 300 bp and 1,100 bp, between about 400 bp and 1,000 bp, or between about 500 bp and 900 bp, or between about 600 bp and 800 bp. In some embodiments, the length of the homologous sequence of the target chromosome is between about 400 bp and 1,500 bp. In some embodiments, the length of the homologous sequence of the target chromosome is between about 500 bp and 1,300 bp. In some embodiments, the length of the homologous sequence of the target chromosome is between about 600 bp and 1,000 bp.

靶序列或靶位置Target sequence or target location

靶染色体包含其中插入了模板序列的靶序列或位置，或通过本文所述方法将模板序列与其连接的靶序列或位置。靶序列可位于靶染色体上任何合适的位置。The target chromosome contains a target sequence or location in which the template sequence is inserted, or a target sequence or location to which the template sequence is linked by the methods described herein. The target sequence may be located at any suitable location on the target chromosome.

靶序列可从任何合适的来源分离或衍生。在一些实施方案中，靶序列和模板序列来自不同的物种。例如，模板序列来自人，而靶序列来自小鼠。在一些实施方案中，靶序列和模板序列来自同一物种。The target sequence can be isolated or derived from any suitable source. In some embodiments, the target sequence and the template sequence are from different species. For example, the template sequence is from a human, while the target sequence is from a mouse. In some embodiments, the target sequence and the template sequence are from the same species.

在一些实施方案中，靶序列包括天然存在的序列。在一些实施方案中，靶序列包含一个或多个对天然存在的序列的修饰。修饰尤其包括序列诸如人工序列或标记的插入、缺失和重排。在一些实施方案中，靶序列包括人工序列。在一些实施方案中，靶序列包括天然存在的序列和人工序列。示例性人工序列尤其包括标记、cDNA序列、启动子和重组序列。示例性标记包括但不限于下表3中公开的选择标记，以及可检测的标记，诸如绿色荧光蛋白(GFP)、mCherry等。In some embodiments, the target sequence includes a naturally occurring sequence. In some embodiments, the target sequence includes one or more modifications to the naturally occurring sequence. Modifications include, in particular, the insertion, deletion, and rearrangement of sequences such as artificial sequences or markers. In some embodiments, the target sequence includes an artificial sequence. In some embodiments, the target sequence includes both naturally occurring and artificial sequences. Exemplary artificial sequences include, in particular, markers, cDNA sequences, promoters, and recombinant sequences. Exemplary markers include, but are not limited to, the selection markers disclosed in Table 3 below, as well as detectable markers such as green fluorescent protein (GFP), mCherry, etc.

在一些实施方案中，靶序列来自真核生物。在一些实施方案中，真核生物是脊椎动物，诸如鸟类、爬行动物或哺乳动物。在一些实施方案中，模板序列包含小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、猴或鸡序列。在一些实施方案中，靶序列包含小鼠序列。在一些实施方案中，靶序列包含大鼠序列。在一些实施方案中，靶序列包含猴子序列。In some embodiments, the target sequence is derived from a eukaryote. In some embodiments, the eukaryote is a vertebrate, such as a bird, reptile, or mammal. In some embodiments, the template sequence comprises a mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cow, horse, camel, monkey, or chicken sequence. In some embodiments, the target sequence comprises a mouse sequence. In some embodiments, the target sequence comprises a rat sequence. In some embodiments, the target sequence comprises a monkey sequence.

在一些实施方案中，靶序列的长度为至少25KB、至少50KB、至少100KB、至少200KB、至少400KB、至少500KB、至少600KB、至少700KB、至少800KB、至少900KB、至少1MB、至少2MB、至少3MB、至少4MB、至少5MB、至少6MB、至少7MB、至少8MB、至少9MB、至少10MB、至少15MB、至少20MB、至少25MB、至少30MB、至少40MB、至少50MB、至少60MB、至少70MB、至少80MB、至少90MB、至少100MB、至少120MB、至少140MB、至少160MB、至少180MB、至少200MB、至少220MB或至少250MB。在一些实施方案中，靶序列的长度为至少50KB、至少100KB、至少200KB、至少500KB、至少700KB、至少1MB、至少2MB、至少3MB、至少4MB、至少5MB、至少6MB、至少7MB、至少8MB、至少9MB、至少10MB、至少20MB、至少30MB、至少40MB或至少50MB。在一些实施方案中，靶序列的长度为至少1MB。在一些实施方案中，靶序列的长度为至少2MB。在一些实施方案中，靶序列的长度为至少3MB。在一些实施方案中，靶序列的长度为至少4MB。在一些实施方案中，目标序列的长度至少为5MB。在一些实施方案中，靶序列的长度为至少10MB。在一些实施方案中，靶序列的长度为至少20MB。In some implementations, the target sequence length is at least 25KB, at least 50KB, at least 100KB, at least 200KB, at least 400KB, at least 500KB, at least 600KB, at least 700KB, at least 800KB, at least 900KB, at least 1MB, at least 2MB, at least 3MB, at least 4MB, at least 5MB, at least 6MB, at least 7MB, at least 8MB, at least 9MB, at least 10MB, at least 15MB, at least 20MB, at least 25MB, at least 30MB, at least 40MB, at least 50MB, at least 60MB, at least 70MB, at least 80MB, at least 90MB, at least 100MB, at least 120MB, at least 140MB, at least 160MB, at least 180MB, at least 200MB, at least 220MB, or at least 250MB. In some embodiments, the target sequence length is at least 50KB, at least 100KB, at least 200KB, at least 500KB, at least 700KB, at least 1MB, at least 2MB, at least 3MB, at least 4MB, at least 5MB, at least 6MB, at least 7MB, at least 8MB, at least 9MB, at least 10MB, at least 20MB, at least 30MB, at least 40MB, or at least 50MB. In some embodiments, the target sequence length is at least 1MB. In some embodiments, the target sequence length is at least 2MB. In some embodiments, the target sequence length is at least 3MB. In some embodiments, the target sequence length is at least 4MB. In some embodiments, the target sequence length is at least 5MB. In some embodiments, the target sequence length is at least 10MB. In some embodiments, the target sequence length is at least 20MB.

在一些实施方案中，靶序列的长度介于50KB与250MB之间、介于50KB与100MB之间、介于50KB与50MB之间、介于50KB与20MB之间、介于50KB与10MB之间、介于50KB与5MB之间、介于50KB与3MB之间、介于50KB与2MB之间、介于50KB与1MB之间、介于100KB与200MB之间、介于100KB与100MB之间、介于100KB与50MB之间、介于100KB与20MB之间、介于100KB与10MB之间、介于100KB与5MB之间、介于100KB与3MB之间、介于100KB与2MB之间、介于100KB与1MB之间、介于100KB与500KB之间、介于200KB与100MB之间、介于200KB与50MB之间、介于200KB与20MB之间、介于200KB与10MB之间、介于200KB与5MB之间、介于200KB与3MB之间、介于200KB与2MB之间、介于200KB与1MB之间、介于200KB与500KB之间、介于500KB与100MB之间、介于500KB与50MB之间、介于500KB与20MB之间、介于500KB与10MB之间、介于500KB与5MB之间、介于500KB与3MB之间、介于500KB与2MB之间、介于500KB与1MB之间、介于1MB与100MB之间、介于1MB与50MB之间、介于1MB与20MB之间、介于1MB与10MB之间、介于1MB与5MB之间、介于1MB与3MB之间、介于1MB与2MB之间、介于3MB与100MB之间、介于3MB与50MB之间、介于3MB与20MB之间、介于3MB与10MB之间、介于3MB与5MB之间、介于5MB与100MB之间、介于5MB与50MB之间、介于5MB与20MB之间、介于5MB与10MB之间、介于10MB与100MB之间、介于10MB与50MB之间或介于10MB与20MB之间。在一些实施方案中，靶序列的长度介于200KB与50MB之间、介于1MB与20MB之间、介于1MB与10MB之间、介于1MB与5MB之间、介于1MB与3MB之间、介于3MB与20MB之间、介于3MB与10Mb之间、介于3MB与7MB之间或介于3MB与5MB之间。在一些实施方案中，靶序列的长度介于1MB与10MB之间。在一些实施方案中，靶序列的长度介于1MB与5MB之间。在一些实施方案中，靶序列的长度介于3MB与5MB之间。In some implementations, the target sequence length is between 50KB and 250MB, between 50KB and 100MB, between 50KB and 50MB, between 50KB and 20MB, between 50KB and 10MB, between 50KB and 5MB, between 50KB and 3MB, between 50KB and 2MB, between 50KB and 1MB, between 100KB and 200MB, between 100KB and 100MB, between 100KB and 50MB, between 100KB and 20MB. Between B, between 100KB and 10MB, between 100KB and 5MB, between 100KB and 3MB, between 100KB and 2MB, between 100KB and 1MB, between 100KB and 500KB, between 200KB and 100MB, between 200KB and 50MB, between 200KB and 20MB, between 200KB and 10MB, between 200KB and 5MB, between 200KB and 3MB, between 200KB and 2MB, between 2 Between 0KB and 1MB, between 200KB and 500KB, between 500KB and 100MB, between 500KB and 50MB, between 500KB and 20MB, between 500KB and 10MB, between 500KB and 5MB, between 500KB and 3MB, between 500KB and 2MB, between 500KB and 1MB, between 1MB and 100MB, between 1MB and 50MB, between 1MB and 20MB, between 1MB and 10MB. Between 1MB and 5MB, between 1MB and 3MB, between 1MB and 2MB, between 3MB and 100MB, between 3MB and 50MB, between 3MB and 20MB, between 3MB and 10MB, between 3MB and 5MB, between 5MB and 100MB, between 5MB and 50MB, between 5MB and 20MB, between 5MB and 10MB, between 10MB and 100MB, between 10MB and 50MB, or between 10MB and 20MB. In some embodiments, the target sequence length is between 200KB and 50MB, between 1MB and 20MB, between 1MB and 10MB, between 1MB and 5MB, between 1MB and 3MB, between 3MB and 20MB, between 3MB and 10Mb, between 3MB and 7MB, or between 3MB and 5MB. In some embodiments, the target sequence length is between 1MB and 10MB. In some implementations, the target sequence length is between 1 MB and 5 MB. In some implementations, the target sequence length is between 3 MB and 5 MB.

在一些实施方案中，靶序列包含一个或多个基因的序列。在一些实施方案中，靶序列包含多个基因的序列。在一些实施方案中，靶序列包含至少2个、3个、4个、5个、6个、7个、8个、9个、10个、15个、20个、25个、30个、35个、40个、45个、50个、60个、70个、80个、90个、100个、150个、200个、250个、300个、350个、400个、450个、500个、600个、700个、800个、900个、1000个、1500个或2000个基因的序列。In some embodiments, the target sequence comprises the sequences of one or more genes. In some embodiments, the target sequence comprises the sequences of multiple genes. In some embodiments, the target sequence comprises the sequences of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, 1500, or 2000 genes.

在一些实施方案中，靶序列包含与模板序列同源的序列。例如，模板染色体是包含人模板序列的人染色体，所述人模板序列包含上文表1和表2中描述的一个或多个基因，而靶染色体是包含小鼠靶序列的小鼠染色体，并且小鼠靶序列包含与人模板序列同源的小鼠序列。作为另外的实例，模板染色体是包含人IGH序列的人染色体，而靶染色体是小鼠染色体，并且靶序列包含同源小鼠Igh序列。作为又一另外的实例，模板染色体是包含人TCR序列的人染色体，而靶染色体是小鼠染色体，并且靶序列包含同源小鼠TCR序列。In some implementations, the target sequence comprises a sequence homologous to the template sequence. For example, the template chromosome is a human chromosome containing a human template sequence comprising one or more genes described in Tables 1 and 2 above, while the target chromosome is a mouse chromosome containing a mouse target sequence, and the mouse target sequence comprises a mouse sequence homologous to the human template sequence. As another example, the template chromosome is a human chromosome containing a human IGH sequence, while the target chromosome is a mouse chromosome, and the target sequence comprises a homologous mouse Igh sequence. As yet another example, the template chromosome is a human chromosome containing a human TCR sequence, while the target chromosome is a mouse chromosome, and the target sequence comprises a homologous mouse TCR sequence.

在一些实施方案中，靶染色体来自小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、猴或鸡，并且靶序列包含模板序列的小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、猴或鸡同源物。In some implementations, the target chromosome is derived from a mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cow, horse, camel, monkey, or chicken, and the target sequence contains a homolog of the template sequence from the mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cow, horse, camel, monkey, or chicken.

在一些实施方案中，靶序列包含小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、猴或鸡基因的序列。所有小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、猴或鸡的基因都被认为在本公开的范围内。不希望受理论束缚，将参与疾病发病机理的或作为潜在治疗靶标的人基因转移到模式生物诸例如小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、猴或鸡，可以促进对疾病的研究和合适疗法的开发。在一些实施方案中，靶序列包含与人模板序列同源的小鼠序列。在一些实施方案中，靶序列包含与人模板序列同源的大鼠序列。在一些实施方案中，靶序列包含与人模板序列同源的猴序列。In some embodiments, the target sequence comprises a sequence of a gene from a mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cow, horse, camel, monkey, or chicken. All genes from mice, rats, rabbits, guinea pigs, hamsters, sheep, goats, donkeys, cows, horses, camels, monkeys, or chickens are considered to be within the scope of this disclosure. It is not desirable to be bound by theory; transferring human genes involved in disease pathogenesis or serving as potential therapeutic targets into model organisms such as mice, rats, rabbits, guinea pigs, hamsters, sheep, goats, donkeys, cows, horses, camels, monkeys, or chickens can facilitate disease research and the development of appropriate therapies. In some embodiments, the target sequence comprises a mouse sequence homologous to a human template sequence. In some embodiments, the target sequence comprises a rat sequence homologous to a human template sequence. In some embodiments, the target sequence comprises a monkey sequence homologous to a human template sequence.

在一些实施方案中，靶序列包含免疫球蛋白序列，诸如小鼠免疫球蛋白序列。在一些实施方案中，靶序列包含小鼠Igh序列。小鼠Igh跨越小鼠基因组的GRCm39装配体的12号染色体的核苷酸位置1112,947,269至116,248,693。本领域技术人员将会理解，具有5’和3’边界的小鼠Igh序列是合适的模板序列，所述边界偏离上文所述的那些，例如至少100bp、500bp、1,000bp、2,000bp、5,000bp、10,000bp或更多。In some embodiments, the target sequence comprises an immunoglobulin sequence, such as a mouse immunoglobulin sequence. In some embodiments, the target sequence comprises a mouse Igh sequence. The mouse Igh sequence spans nucleotide positions 1112,947,269 to 116,248,693 on chromosome 12 of the GRCm39 assembly of the mouse genome. Those skilled in the art will understand that a mouse Igh sequence with 5' and 3' boundaries is a suitable template sequence, said boundaries deviating from those described above, for example, by at least 100 bp, 500 bp, 1,000 bp, 2,000 bp, 5,000 bp, 10,000 bp, or more.

在一些实施方案中，靶序列包含小鼠Igh可变区序列。在一些实施方案中，小鼠Igh可变区序列包含编码V_H、D_H和J_H1-6基因区段的小鼠同源物的序列和间插非编码序列。在一些实施方案中，小鼠Igh可变区序列包含小鼠基因组的GRCm39装配体的12号染色体的核苷酸位置113,391,842至115,973,952。在一些实施方案中，小鼠Igh可变区序列包含小鼠基因组的GRCm39装配体的12号染色体的核苷酸位置113,391,842至115,973,952，从5’末端、3’末端或两端减去至少约50bp、100bp、500bp、1,000bp、2,000bp、5,000bp、7,000bp、10,000bp、15,000bp、20,000bp或50,000bp。在一些实施方案中，人IGH可变区序列包含小鼠基因组的GRCm39装配体的12号染色体的核苷酸位置113,391,842至115,973,952，以及在5’末端、3’末端或两端的至少约50bp、100bp、500bp、1,000bp、2,000bp、5,000bp、7,000bp、10,000bp、15,000bp、20,000bp或50,000bp的额外侧翼序列。在一些实施方案中，小鼠Igh可变区序列包含小鼠基因组的GRCm39装配体的12号染色体的核苷酸位置113,391,842至115,973,952，以及对其的一个或多个修饰。示例性修饰包括但不限于缺失(诸如一个或多个V、D或J区段的缺失)、插入(诸如标记的插入)、重排或其组合。在一些实施方案中，靶序列包含小鼠Igl可变区序列。在一些实施方案中，靶序列包含小鼠Igk可变区序列。在一些实施方案中，模板序列包含人IGL可变区序列。在一些实施方案中，模板序列包含人IGK可变区序列。In some embodiments, the target sequence comprises a mouse Igh variable region sequence. In some embodiments, the mouse Igh variable region sequence comprises sequences encoding mouse homologs of the _VH , _DH , and _JH 1-6 gene segments and intercalated non-coding sequences. In some embodiments, the mouse Igh variable region sequence comprises nucleotide positions 113,391,842 to 115,973,952 on chromosome 12 of the GRCm39 assembly of the mouse genome. In some implementations, the mouse Igh variable region sequence comprises nucleotide positions 113,391,842 to 115,973,952 of chromosome 12 of the GRCm39 assembly of the mouse genome, with at least about 50 bp, 100 bp, 500 bp, 1,000 bp, 2,000 bp, 5,000 bp, 7,000 bp, 10,000 bp, 15,000 bp, 20,000 bp, or 50,000 bp removed from the 5' end, 3' end, or both ends. In some embodiments, the human IGH variable region sequence comprises nucleotide positions 113,391,842 to 115,973,952 on chromosome 12 of the GRCm39 assembly of the mouse genome, and additional flanking sequences of at least about 50 bp, 100 bp, 500 bp, 1,000 bp, 2,000 bp, 5,000 bp, 7,000 bp, 10,000 bp, 15,000 bp, 20,000 bp, or 50,000 bp at the 5' end, 3' end, or both ends. In some embodiments, the mouse Igh variable region sequence comprises nucleotide positions 113,391,842 to 115,973,952 on chromosome 12 of the GRCm39 assembly of the mouse genome, and one or more modifications thereof. Exemplary modifications include, but are not limited to, deletions (such as deletions of one or more V, D, or J segments), insertions (such as the insertion of markers), rearrangements, or combinations thereof. In some embodiments, the target sequence comprises a mouse Igk variable region sequence. In some embodiments, the target sequence comprises a mouse Igk variable region sequence. In some embodiments, the template sequence comprises a human IGL variable region sequence. In some embodiments, the template sequence comprises a human IGK variable region sequence.

在一些实施方案(例如其中通过本文所述方法几乎不删除或不删除靶染色体序列的那些实施方案)中，靶染色体包含靶位置。靶位置是模板序列插入的位置，或者是模板序列与其连接的位置。靶染色体上的任何位置都可以是合适的位置。在一些实施方案中，靶位置包含用于在靶位置产生双链断裂的核酸内切酶位点。In some embodiments (e.g., those in which the method described herein deletes little or no target chromosome sequence), the target chromosome contains a target location. A target location is either the site where the template sequence is inserted or the site where the template sequence is attached. Any location on the target chromosome can be suitable. In some embodiments, the target location contains an endonuclease site for generating a double-strand break at the target location.

工程化的染色体engineered chromosomes

本公开提供了通过本文所述方法产生的工程化的染色体。This disclosure provides engineered chromosomes produced by the methods described herein.

在一些实施方案中，工程化的染色体包括含有一个或多个人源化序列的小鼠染色体。在一些实施方案中，人源化序列包含一个或多个与人的疾病或病症相关的基因，诸如与遗传疾病或病症相关的基因，或癌基因。在一些实施方案中，工程化的染色体包括含有一个或多个人源化序列的大鼠染色体。在一些实施方案中，工程化的染色体包括含有一个或多个人源化序列的猴染色体。In some embodiments, the engineered chromosome comprises a mouse chromosome containing one or more humanized sequences. In some embodiments, the humanized sequences comprise one or more genes associated with human diseases or conditions, such as genes associated with genetic diseases or conditions, or oncogenes. In some embodiments, the engineered chromosome comprises a rat chromosome containing one or more humanized sequences. In some embodiments, the engineered chromosome comprises a monkey chromosome containing one or more humanized sequences.

在一些实施方案中，工程化的染色体包括其中一个或多个免疫球蛋白序列已被人源化的小鼠染色体。在一些实施方案中，免疫球蛋白序列包含IGH序列，诸如IGH可变区。在一些实施方案中，工程化的染色体包含小鼠12号染色体，其中小鼠Igh可变区已被来自14号染色体的人IGH可变区替换。在一些实施方案中，小鼠Igh可变区包含V_H、D_H和J_H1-6基因区段和间插非编码序列。在一些实施方案中，人IGH可变区包含V_H、D_H和J_H1-6基因区段和间插非编码序列。在一些实施方案中，工程化的染色体包含小鼠12号染色体，其中大致包含小鼠基因组的GRCm39装配体的12号染色体的113,391,842至115,973,952的核苷酸序列的小鼠Igh可变区已被大致包含人基因组的GRCh38.p13装配体的14号染色体的105,862,994至106,811,028的核苷酸序列的人IGH可变区替换。在一些实施方案中，工程化的染色体是小鼠6号染色体，其包含替代小鼠Igk可变区的人IGK可变区的序列。在一些实施方案中，小鼠Igk可变区序列包含编码小鼠V_k和J_k1-5基因区段的序列和间插非编码序列。在一些实施方案中，模板序列包含人IGK可变区序列。在一些实施方案中，人IGK可变区序列包含编码人V_k和J_k1-5基因区段的序列和间插非编码序列。In some embodiments, the engineered chromosome comprises a mouse chromosome in which one or more immunoglobulin sequences have been humanized. In some embodiments, the immunoglobulin sequences comprise IGH sequences, such as the IGH variable region. In some embodiments, the engineered chromosome comprises mouse chromosome 12, wherein the mouse Igh variable region has been replaced by a human IGH variable region derived from chromosome 14. In some embodiments, the mouse Igh variable region comprises _VH , _DH , and _JH 1-6 gene segments and intercalated non-coding sequences. In some embodiments, the human IGH variable region comprises _VH , _DH , and _JH 1-6 gene segments and intercalated non-coding sequences. In some embodiments, the engineered chromosome comprises mouse chromosome 12, wherein the mouse Igh variable region of chromosome 12, comprising nucleotide sequences from 113,391,842 to 115,973,952 of the GRCm39 assembly of the mouse genome, has been replaced by the human IGH variable region of chromosome 14, comprising nucleotide sequences from 105,862,994 to 106,811,028 of the GRCh38.p13 assembly of the human genome. In some embodiments, the engineered chromosome is mouse chromosome 6, which contains a sequence of the human IGK variable region replacing the mouse Igk variable region. In some embodiments, the mouse Igk variable region sequence comprises sequences encoding mouse _Vk and _Jk1-5 gene segments and intercalated non-coding sequences. In some embodiments, the template sequence comprises the human IGK variable region sequence. In some embodiments, the human IGK variable region sequence comprises sequences encoding human _Vk and _Jk1-5 gene segments and intercalated non-coding sequences.

核酸分子、质粒和载体Nucleic acid molecules, plasmids and vectors

本公开提供了用于本文所述方法的核酸分子。核酸分子，有时被称为多核苷酸，是指组成单个分子的连接的核苷酸的链。本公开的核酸分子可以是脱氧核糖核酸(DNA)或核糖核酸(RNA)。本发明的示例性核酸分子包含特异于或邻近靶序列和模板序列的同源臂，以便有利于模板序列插入靶序列，或通过双链断裂修复连接模板与靶序列。This disclosure provides nucleic acid molecules for use in the methods described herein. A nucleic acid molecule, sometimes referred to as a polynucleotide, is a chain of linked nucleotides that makes up a single molecule. The nucleic acid molecules of this disclosure may be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). Exemplary nucleic acid molecules of the present invention contain homologous arms specific to or adjacent to the target sequence and template sequence to facilitate template sequence insertion into the target sequence or to repair double-strand breaks connecting the template and target sequences.

本公开提供了包含对靶染色体和模板染色体特异的同源臂的核酸分子，其促进了本文所述的HDR介导的染色体重排。在一些实施方案中，核酸分子从5’至3’包含5’同源臂、至少第一标记和3’同源臂，所述5’同源臂含有靶序列5’末端上游的核苷酸序列，所述3’同源臂含有模板序列5’末端上游的核苷酸序列。在一些实施方案中，核酸分子从5’至3’包含5’同源臂、至少第二标记和3’同源臂，所述5’同源臂含有模板序列3’末端下游的核苷酸序列，所述3’同源臂含有靶序列3’末端下游的核苷酸序列。This disclosure provides nucleic acid molecules comprising homologous arms specific to both the target chromosome and the template chromosome, which facilitate HDR-mediated chromosomal rearrangements as described herein. In some embodiments, the nucleic acid molecule comprises a 5' homologous arm, at least a first marker, and a 3' homologous arm from 5' to 3', the 5' homologous arm containing a nucleotide sequence upstream of the 5' end of the target sequence, and the 3' homologous arm containing a nucleotide sequence upstream of the 5' end of the template sequence. In some embodiments, the nucleic acid molecule comprises a 5' homologous arm, at least a second marker, and a 3' homologous arm from 5' to 3', the 5' homologous arm containing a nucleotide sequence downstream of the 3' end of the template sequence, and the 3' homologous arm containing a nucleotide sequence downstream of the 3' end of the target sequence.

本公开提供了包含本文所述核酸分子的载体。根据本公开，载体是能够转运与其连接的其它核酸的核酸分子。例如，质粒是一种类型的载体。载体序列尤其包括从宿主细胞诸如细菌中产生载体所必需的序列，诸如复制起点和选择标记。This disclosure provides vectors comprising the nucleic acid molecules described herein. According to this disclosure, a vector is a nucleic acid molecule capable of transporting other nucleic acids linked to it. For example, a plasmid is one type of vector. The vector sequence particularly includes sequences necessary for generating the vector from a host cell such as bacteria, such as origin of replication and selection markers.

在一些实施方案中，载体是质粒。在一些实施方案中，质粒从5’至3’包含5’同源臂、至少第一标记和3’同源臂，所述5’同源臂含有靶序列5’末端上游的核苷酸序列，所述3’同源臂包含模板序列5’末端上游的核苷酸序列。在一些实施方案中，质粒从5’至3’包含5’同源臂、至少第二种标记和3’同源臂，所述5’同源臂包含模板序列3’末端下游的核苷酸序列，所述3’同源臂包含靶序列3’末端下游的核苷酸序列。In some embodiments, the vector is a plasmid. In some embodiments, the plasmid comprises a 5' homologous arm, at least a first marker, and a 3' homologous arm from 5' to 3', wherein the 5' homologous arm contains a nucleotide sequence upstream of the 5' end of the target sequence, and the 3' homologous arm contains a nucleotide sequence upstream of the 5' end of the template sequence. In some embodiments, the plasmid comprises a 5' homologous arm, at least a second marker, and a 3' homologous arm from 5' to 3', wherein the 5' homologous arm contains a nucleotide sequence downstream of the 3' end of the template sequence, and the 3' homologous arm contains a nucleotide sequence downstream of the 3' end of the target sequence.

在一些实施方案中，载体包含位于模板序列5’末端或其附近的同源臂序列。在一些实施方案中，同源臂位于模板序列的上游，即模板序列的5’。在一些实施方案中，载体包含位于模板序列3’末端或其附近的同源臂序列。在一些实施方案中，同源臂位于模板序列的下游，即模板序列的3’。在一些实施方案中，载体中模板同源臂的序列与模板序列中同源臂的序列相同或基本相同。In some embodiments, the vector includes a homologous arm sequence located at or near the 5' end of the template sequence. In some embodiments, the homologous arm is located upstream of the template sequence, i.e., at the 5' end. In some embodiments, the vector includes a homologous arm sequence located at or near the 3' end of the template sequence. In some embodiments, the homologous arm is located downstream of the template sequence, i.e., at the 3' end. In some embodiments, the sequence of the template homologous arm in the vector is identical or substantially identical to the sequence of the homologous arm in the template sequence.

在一些实施方案中，载体包含位于靶序列或位置5’(即靶序列或位置的上游)的同源臂序列。在一些实施方案中，载体包含位于靶序列或位置3’(即靶序列或位置的下游)的同源臂序列。In some embodiments, the vector includes a homologous arm sequence located at position 5' (i.e., upstream of the target sequence or position). In some embodiments, the vector includes a homologous arm sequence located at position 3' (i.e., downstream of the target sequence or position).

熟练的技术人员将理解，在载体中的同源臂序列与模板染色体或靶染色体中的等同序列之间可存在一定程度的错配，并且载体仍将促进来自载体的模板染色体或靶染色体中双链断裂的修复。例如，与模板染色体中的等同序列具有至少95％同一性、至少96％同一性、至少97％同一性、至少98％同一性或至少99％同一性的载体同源臂序列将适用于本公开的方法。Those skilled in the art will understand that a degree of mismatch may exist between the homologous arm sequence in the vector and its equivalent sequence in the template chromosome or target chromosome, and the vector will still facilitate the repair of double-strand breaks in the template chromosome or target chromosome derived from the vector. For example, vector homologous arm sequences having at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with their equivalent sequence in the template chromosome will be suitable for the methods of this disclosure.

在一些实施方案中，本文所述的核酸分子、质粒或载体包含一个或多个核酸内切酶位点。In some implementations, the nucleic acid molecules, plasmids, or vectors described herein contain one or more endonuclease sites.

在一些实施方案中，本公开提供了(i)第一核酸分子，其从5’至3’包含5’同源臂、至少第一标记和3’同源臂，所述5’同源臂含有靶序列5’末端上游的核苷酸序列，所述3’同源臂含有模板序列5’末端上游的核苷酸序列；和(ii)第二核酸分子，其从5’至3’包含5’同源臂、至少第二标记和3’同源臂，所述5’同源臂含有模板序列3’末端下游的核苷酸序列，所述3’同源臂含有靶序列3’末端下游的核苷酸序列。在一些实施方案中，第一和第二核酸分子是质粒。在一些实施方案中，第一核酸分子从5’至3’包含含有靶序列5’末端上游的核苷酸序列的5’同源臂、第一核酸内切酶位点、至少第一标记、第二核酸内切酶位点和含有模板序列5’末端上游的核苷酸序列的3’同源臂，其中第一和第二核酸内切酶位点与同源臂重叠，使得核酸分子上的第一和第二核酸内切酶位点以及模板染色体和靶染色体上的相应核酸内切酶位点被相同的核酸内切酶切割。在一些实施方案中，第二核酸分子从5’至3’包含含有模板序列3’末端下游的核苷酸序列的5’同源臂、第三核酸内切酶位点、至少第二标记、第四核酸内切酶位点、含有靶序列3’末端下游的核苷酸序列的3’同源臂，其中第二和第三核酸内切酶位点与同源臂重叠，使得核酸分子上的第三和第四核酸内切酶位点以及模板染色体和靶染色体上的相应核酸内切酶位点被相同的核酸内切酶切割。在一些实施方案中，第一和第二标记不是相同的标记。在一些实施方案中，第一核酸分子上的第一标记包括选择标记和可检测标记的组合。在一些实施方案中，第一标记包括eGFP和嘌呤霉素抗性。在一些实施方案中，第二标记包括选择标记。在一些实施方案中，第二标记包括潮霉素抗性。In some embodiments, this disclosure provides (i) a first nucleic acid molecule comprising, from 5' to 3', a 5' homologous arm, at least a first label, and a 3' homologous arm, said 5' homologous arm containing a nucleotide sequence upstream of the 5' end of a target sequence, said 3' homologous arm containing a nucleotide sequence upstream of the 5' end of a template sequence; and (ii) a second nucleic acid molecule comprising, from 5' to 3', a 5' homologous arm, at least a second label, and a 3' homologous arm, said 5' homologous arm containing a nucleotide sequence downstream of the 3' end of a template sequence, said 3' homologous arm containing a nucleotide sequence downstream of the 3' end of a target sequence. In some embodiments, the first and second nucleic acid molecules are plasmids. In some embodiments, the first nucleic acid molecule comprises, from 5' to 3', a 5' homologous arm containing a nucleotide sequence upstream of the 5' end of the target sequence, a first endonuclease site, at least a first marker, a second endonuclease site, and a 3' homologous arm containing a nucleotide sequence upstream of the 5' end of the template sequence, wherein the first and second endonuclease sites overlap with the homologous arm, such that the first and second endonuclease sites on the nucleic acid molecule, as well as the corresponding endonuclease sites on the template chromosome and the target chromosome, are cleaved by the same endonuclease. In some embodiments, the second nucleic acid molecule comprises, from 5' to 3', a 5' homologous arm containing a nucleotide sequence downstream of the 3' end of the template sequence, a third endonuclease site, at least a second marker, a fourth endonuclease site, and a 3' homologous arm containing a nucleotide sequence downstream of the 3' end of the target sequence, wherein the second and third endonuclease sites overlap with the homologous arm, such that the third and fourth endonuclease sites on the nucleic acid molecule, as well as the corresponding endonuclease sites on the template chromosome and the target chromosome, are cleaved by the same endonuclease. In some embodiments, the first and second markers are not the same markers. In some embodiments, the first label on the first nucleic acid molecule includes a combination of a selectable label and a detectable label. In some embodiments, the first label includes eGFP and puromycin resistance. In some embodiments, the second label includes a selectable label. In some embodiments, the second label includes hygromycin resistance.

在一些实施方案中，核酸分子上的同源臂序列对应于位于模板序列、靶序列或靶位置附近的序列。模板序列、靶序列或靶位置的0bp、5个碱基对(bp)内、10bp内、15bp内、20bp内、30bp内、40bp内、50bp内、70bp内、80bp内、90bp内、100bp内、120bp内、140bp内、160bp内、180bp内、200bp内或250bp内的同源臂可被认为是靠近所述序列。In some implementations, homologous arm sequences on a nucleic acid molecule correspond to sequences located near the template sequence, target sequence, or target site. Homologous arms within 0 bp, 5 base pairs (bp), 10 bp, 15 bp, 20 bp, 30 bp, 40 bp, 50 bp, 70 bp, 80 bp, 90 bp, 100 bp, 120 bp, 140 bp, 160 bp, 180 bp, 200 bp, or 250 bp of the template sequence, target sequence, or target site can be considered to be close to the said sequence.

在一些实施方案中，对应于模板或靶染色体序列的核酸分子同源序列的长度介于约20bp与2,000bp之间、介于约50bp与1,500bp之间、介于约100bp与1,400bp之间、介于约150bp与1,300bp之间、介于约200bp与1,200bp之间、介于约300bp与1,100bp之间、介于约400bp与1,000bp之间，或介于约500bp与900bp之间，或介于在约600bp与800bp之间。在一些实施方案中，核酸分子同源序列的长度介于约400bp与1,500bp之间。在一些实施方案中，核酸分子同源序列的长度介于约500bp与1,300bp之间。在一些实施方案中，核酸分子同源序列的长度介于约600bp与1,000bp之间。In some embodiments, the length of the homologous sequence of the nucleic acid molecule corresponding to the template or target chromosome sequence is between about 20 bp and 2,000 bp, between about 50 bp and 1,500 bp, between about 100 bp and 1,400 bp, between about 150 bp and 1,300 bp, between about 200 bp and 1,200 bp, between about 300 bp and 1,100 bp, between about 400 bp and 1,000 bp, or between about 500 bp and 900 bp, or between about 600 bp and 800 bp. In some embodiments, the length of the homologous sequence of the nucleic acid molecule is between about 400 bp and 1,500 bp. In some embodiments, the length of the homologous sequence of the nucleic acid molecule is between about 500 bp and 1,300 bp. In some embodiments, the length of the homologous sequence of the nucleic acid molecule is between about 600 bp and 1,000 bp.

在一些实施方案中，核酸分子包含适于在哺乳动物细胞中表达的标记。在一些实施方案中，标记位于核酸分子的同源臂之间，由此标记被插入到靶序列中。在一些实施方案中，标记是选择标记。合适的选定标记包括二氢叶酸还原酶(DHFR)、谷氨酰胺合酶(GS)、嘌呤霉素乙酰转移酶、杀稻瘟素脱氨酶、组氨醇脱氢酶、潮霉素磷酸转移酶(hph)、博来霉素抗性基因、氨基糖苷酶磷酸转移酶(新霉素抗性基因)，并在下表3中进一步详细描述。In some embodiments, the nucleic acid molecule contains a marker suitable for expression in mammalian cells. In some embodiments, the marker is located between homologous arms of the nucleic acid molecule, thereby being inserted into the target sequence. In some embodiments, the marker is a selection marker. Suitable selection markers include dihydrofolate reductase (DHFR), glutamine synthase (GS), puromycin acetyltransferase, blastcin deaminase, histidine dehydrogenase, hygromycin phosphotransferase (hph), bleomycin resistance gene, aminoglycoside phosphotransferase (neomycin resistance gene), and are further described in detail in Table 3 below.

在一些实施方案中，标记包括可检测的标记(或报告分子)。可检测标记包括但不限于介导发光反应的酶(luxA、luxB、luxAB、luc、rue、nluc)、介导比色反应的酶(lacZ、HRP)和荧光蛋白，诸如绿色荧光蛋白(GFP)、eGFP、黄色荧光蛋白(YFP)、红色荧光蛋白(RFP)、青色荧光蛋白(CFP)、蓝色荧光蛋白(BFP)、dsRed、mCherry、tdTomato、近红外荧光蛋白等。合适的可检测标记的选择是本领域普通技术人员已知的。In some embodiments, the label includes a detectable label (or reporter molecule). Detectable labels include, but are not limited to, enzymes that mediate luminescent reactions (luxA, luxB, luxAB, luc, rue, nluc), enzymes that mediate colorimetric reactions (lacZ, HRP), and fluorescent proteins such as green fluorescent protein (GFP), eGFP, yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent protein (CFP), blue fluorescent protein (BFP), dsRed, mCherry, tdTomato, near-infrared fluorescent protein, etc. The selection of suitable detectable labels is known to those skilled in the art.

可使用本领域已知的任何合适的启动子(包括但不限于巨细胞病毒早期(CMV)启动子、PGK启动子和EF1a启动子)来表达标记。The tag can be expressed using any suitable promoter known in the art (including, but not limited to, the early cytomegalovirus (CMV) promoter, the PGK promoter, and the EF1a promoter).

表3.选择标记Table 3. Selection Markers

选择标记Select tag 选择试剂Selecting reagents 二氢叶酸还原酶(DHFR)Dihydrofolate reductase (DHFR) 甲硫氨酸砜亚胺(MSX)Methionine sulfone imine (MSX) 谷氨酰胺合酶(GS)Glutamine synthase (GS) 甲氨蝶呤(MTX)Methotrexate (MTX) 嘌呤霉素乙酰转移酶Puromycin acetyltransferase 嘌呤霉素Purinomycin 杀稻瘟菌素脱氨酶Rice blast fungicide deaminase 杀稻瘟菌素Rice blast fungicide 组氨醇脱氢酶Histidine dehydrogenase 组氨醇histidine 潮霉素磷酸转移酶(hph)Hygromycin phosphotransferase (hph) 潮霉素Hygromycin 博莱霉素抗性基因Bleomycin resistance gene 争光霉素bleomycin 氨基糖苷酶磷酸转移酶Aminoglycoside enzyme phosphotransferase 新霉素(G418)Neomycin (G418)

在一些实施方案(例如其中使用两种核酸分子(具有第一标记的第一核酸分子和具有第二标记的第二核酸分子)的方法的那些实施方案)中，第一种或第二种标记包含与能够在细胞中表达荧光蛋白的启动子可操作地连接的荧光蛋白。在一些实施方案中，荧光蛋白包括绿色荧光蛋白(GFP)。在一些实施方案中，第一标记还包括选择标记。在一些实施方案中，第二标记还包括选择标记。在一些实施方案中，选择标记选自由以下组成的组：二氢叶酸还原酶(DHFR)、谷氨酰胺合酶(GS)、嘌呤霉素乙酰转移酶、杀稻瘟菌素脱氨酶、组氨醇脱氢酶、潮霉素磷酸转移酶(hph)、博来霉素抗性基因和氨基糖苷磷酸转移酶。在一些实施方案中，第一和第二标记不是相同的选择标记。在一些实施方案中，第一标记包含与能够在细胞中表达GFP的启动子和嘌呤霉素乙酰转移酶可操作地连接的GFP，并且第二种标记包含潮霉素磷酸转移酶。In some embodiments (e.g., those using methods employing two nucleic acid molecules (a first nucleic acid molecule having a first label and a second nucleic acid molecule having a second label), the first or second label comprises a fluorescent protein operatively linked to a promoter capable of expressing a fluorescent protein in a cell. In some embodiments, the fluorescent protein comprises green fluorescent protein (GFP). In some embodiments, the first label further comprises a selection marker. In some embodiments, the second label further comprises a selection marker. In some embodiments, the selection marker is selected from the group consisting of dihydrofolate reductase (DHFR), glutamine synthase (GS), puromycin acetyltransferase, blastcin deaminase, histamine dehydrogenase, hygromycin phosphotransferase (hph), bleomycin resistance gene, and aminoglycoside phosphotransferase. In some embodiments, the first and second labels are not the same selection marker. In some embodiments, the first label comprises GFP operatively linked to a promoter capable of expressing GFP in a cell and puromycin acetyltransferase, and the second label comprises hygromycin phosphotransferase.

产生双链断裂的方法Methods to induce double-strand breaks

本文提供了在模板染色体和靶染色体中产生双链断裂的方法。本文提供的方法使用用于在细胞环境中进行双链断裂修复的修复途径来促进大序列在染色体之间的转移。This article presents a method for generating double-strand breaks in template and target chromosomes. The method described herein utilizes repair pathways designed for double-strand break repair in the cellular environment to facilitate the transfer of large sequences between chromosomes.

本领域已知的在DNA序列中产生双链断裂的任何方法，以及修复这些双链断裂的任何修复途径，都被认为在本公开的范围内。Any method known in the art for generating double-strand breaks in DNA sequences, and any repair pathway for repairing such double-strand breaks, is considered to be within the scope of this disclosure.

在一些实施方案中，模板染色体和靶染色体中的双链断裂是使用一种或多种核酸内切酶产生的。在一些实施方案中，核酸内切酶还切割一种或多种包含本文所述方法中使用的同源臂的核酸分子。在一些实施方案中，一种或多种核酸内切酶选自由以下组成的组：CRISPR/Cas核酸内切酶和一种或多种引导核酸(gNA)、一种或多种锌指核酸酶(ZFN)或一种或多种转录激活子样效应因子核酸酶(TALEN)。在一些实施方案中，使用一种或多种CRE重组酶产生模板染色体和靶染色体中的双链断裂，以产生染色体重排。In some embodiments, double-strand breaks in the template and target chromosomes are generated using one or more endonucleases. In some embodiments, the endonucleases also cleave one or more nucleic acid molecules containing homologous arms used in the methods described herein. In some embodiments, one or more endonucleases are selected from the group consisting of CRISPR/Cas endonucleases and one or more guide nucleic acids (gNA), one or more zinc finger nucleases (ZFN), or one or more transcription activator-like effector nucleases (TALEN). In some embodiments, one or more CRE recombinases are used to generate double-strand breaks in the template and target chromosomes to produce chromosomal rearrangements.

不同的分子能够将双链和/或单链断裂引入基因组核酸。本公开的核酸酶包括但不限于归巢核酸内切酶、限制性内切酶、锌指核酸酶或锌指切口酶、大范围核酸酶或大范围切口酶(meganickases)、转录激活子样效应因子(TALE)核酸酶引导的，特别是核酸引导的核酸酶或切口酶，诸如RNA引导的核酸酶、DNA引导的核酸酶、megaTAL核酸酶、BurrH核酸酶、其修饰或嵌合形式或变体及其组合。RNA引导的核酸酶或RNA引导的切口酶任选地是基于CRISPR的系统的一部分。Different molecules are capable of introducing double-stranded and/or single-stranded breaks into genomic nucleic acids. The nucleases disclosed herein include, but are not limited to, homing endonucleases, restriction endonucleases, zinc finger nucleases or zinc finger nickases, meganickases or large-scale nucleases, transcription activator-like effector (TALE) nuclease-guided, particularly nucleic acid-guided, nucleases or nickases, such as RNA-guided nucleases, DNA-guided nucleases, megaTAL nucleases, BurrH nucleases, their modified or chimeric forms or variants, and combinations thereof. RNA-guided nucleases or RNA-guided nickases are optionally part of a CRISPR-based system.

核酸酶能够切割核酸的单体之间的磷酸二酯键。许多核酸酶通过识别损伤位点并将它们从周围的DNA上切割下来而参与DNA修复。这些酶可以是复合物的一部分。核酸内切酶是作用于靶分子中心区域的核酸酶。脱氧核糖核酸酶作用于DNA。许多参与DNA修复的核酸酶不是序列特异性的。然而，在本说明书中，序列特异性核酸酶是优选的。在一些实施方案中，一种或多种序列特异性核酸酶对靶基因组中相当大的核苷酸串(诸如10个或更多个核苷酸，或15个、20个、25个、30个、35个、40个、45个或甚至50个或更多个核苷酸)是特异性的，靶基因组中作为靶序列的5-50个、10-50个、15-50个、15-40个、15-30个的范围是优选的。这种“识别序列”越大，基因组中的靶位点就越少，核酸酶在基因组中形成的切割就越特异，因此切割变成位点特异性的。位点特异性核酸酶通常在基因组中具有少于10个、5个、4个、3个、2个或仅仅一个(1)靶位点。已被工程化用于改变一个或多个基因组核酸(包括通过切割特定的基因组靶序列)的核酸酶在本文中被称为工程化的核酸酶。基于CRISPR的系统是一种类型的工程化的核酸酶。然而，这种工程化的核酸酶可基于本文所述的任何核酸酶。Nucleases are capable of cleaving phosphodiester bonds between monomers of nucleic acids. Many nucleases participate in DNA repair by recognizing damage sites and cleaving them from the surrounding DNA. These enzymes can be part of a complex. Endonucleases are nucleases that act on the central region of the target molecule. Deoxyribonucleases act on DNA. Many nucleases involved in DNA repair are not sequence-specific. However, in this specification, sequence-specific nucleases are preferred. In some embodiments, one or more sequence-specific nucleases are specific to fairly large nucleotide strings (such as 10 or more nucleotides, or 15, 20, 25, 30, 35, 40, 45, or even 50 or more nucleotides) in the target genome, with the range of 5-50, 10-50, 15-50, 15-40, and 15-30 nucleotides as target sequences in the target genome being preferred. The larger this "recognition sequence" is, the fewer target sites there are in the genome, and the more specific the cleavage formed by the nuclease in the genome becomes, thus making the cleavage site-specific. Site-specific nucleases typically have fewer than 10, 5, 4, 3, 2, or just one (1) target site in the genome. Nucleases that have been engineered to alter one or more genomic nucleic acids (including by cutting specific genomic target sequences) are referred to herein as engineered nucleases. CRISPR-based systems are one type of engineered nuclease. However, such engineered nucleases can be based on any nuclease described herein.

识别大于12个碱基对的序列的核酸内切酶被称为大范围核酸酶。大范围核酸酶/-切口酶是以大识别位点(例如12至40个碱基对，诸如20至40个或30至40个碱基对的双链DNA序列)为特征的内切脱氧核糖核酸酶；因此，这个位点在任何给定的基因组中可能只出现一次。Endonucleases that recognize sequences of more than 12 base pairs are called macronucleases. Macronucleases/-nicking enzymes are endonucleases characterized by large recognition sites (e.g., 12 to 40 base pairs, such as double-stranded DNA sequences of 20 to 40 or 30 to 40 base pairs); therefore, this site may only appear once in any given genome.

“归巢核酸内切酶”是大范围核酸酶的一种形式，是具有大的不对称识别位点和通常嵌入内含子或内含肽的编码序列的双链DNA酶。归巢核酸内切酶识别位点在基因组中极其罕见，使得它们在非常少的位置切割，有时在基因组中的单一位置切割(WO2004067736，也参见美国专利第8,697,395B2号)。"Homing endonucleases" are a type of broad-spectrum nuclease, a double-stranded DNAase with large asymmetric recognition sites and coding sequences that typically embed introns or intepids. The recognition sites of homing endonucleases are extremely rare in the genome, resulting in cleavage at very few locations, sometimes even at a single location in the genome (WO2004067736, see also US Patent No. 8,697,395B2).

锌指核酸酶/-切口酶(ZFN)是通过将锌指DNA结合结构域与DNA切割结构域融合而产生的人工限制性内切酶。锌指结构域可被工程化以靶向特定的所需DNA序列。Zinc finger nucleases/-nickases (ZFNs) are artificial restriction endonucleases created by fusing a zinc finger DNA-binding domain with a DNA-cutting domain. The zinc finger domain can be engineered to target specific desired DNA sequences.

RNA引导的核酸酶/-切口酶，特别是核酸内切酶包括例如Cas9或Cpf1。已对CRISPR系统进行了详细描述。任何基于CRISPR的系统都是本公开的一部分。在使用另外的一种或多种RNA引导的核酸内切酶的情况下，可使用合适的引导RNA、sgRNA或crRNA或其它合适的RNA序列，其与RNA引导的核酸内切酶相互作用并靶向基因组核酸中的基因组靶位点。RNA-guided nucleases/-nickases, particularly endonucleases, include, for example, Cas9 or Cpf1. CRISPR systems have been described in detail. Any CRISPR-based system is part of this disclosure. When using one or more additional RNA-guided endonucleases, suitable guide RNA, sgRNA, or crRNA, or other suitable RNA sequences, can be used that interact with the RNA-guided endonuclease and target genomic target sites in the genomic nucleic acid.

如本文中所用，术语“CRISPR相关蛋白”或“CRISPR/Cas”蛋白是指与在某些细菌(诸如化脓性链球菌(Streptococcus pyogenes)和其它细菌)中发现的CRISPR(成簇的规则间隔的短回文重复序列)II型适应性免疫系统相关的核酸引导的DNA核酸内切酶。CRISPR/Cas蛋白，诸如Cas9，不限于在细菌中发现的野生型(wt)蛋白。包含对野生型CRISPR/Cas序列的突变或其衍生物的CRISPR/Cas蛋白被认为在本公开的范围内。来自化脓性链球菌的原始II型CRISPR系统包含Cas9蛋白和由两种RNA：成熟CRISPR RNA(crRNA)和部分互补的反式作用RNA(tracrRNA)组成的引导RNA。Cas9将外源DNA解旋并检查与引导RNA的20个碱基对间隔区互补的位点。Cas9靶向已经被简化，并且大多数基于Cas的系统已被工程化成仅需要一个或两个嵌合引导RNA或单个引导RNA(chiRNA，通常也简称为引导RNA或gRNA或sgRNA)，其由crRNA和tracrRNA的融合产生。可以根据需要对间隔区进行工程化。As used herein, the term "CRISPR-associated protein" or "CRISPR/Cas" protein refers to a nucleic acid-guided DNA endonuclease associated with the CRISPR (clustered, regularly spaced short palindromic repeats) type II adaptive immune system found in certain bacteria, such as Streptococcus pyogenes and others. CRISPR/Cas proteins, such as Cas9, are not limited to wild-type (wt) proteins found in bacteria. CRISPR/Cas proteins containing mutations to wild-type CRISPR/Cas sequences or derivatives thereof are considered to be within the scope of this disclosure. The primitive type II CRISPR system from Streptococcus pyogenes comprises the Cas9 protein and a guide RNA consisting of two RNAs: mature CRISPR RNA (crRNA) and partially complementary trans-acting RNA (tracrRNA). Cas9 unwinds foreign DNA and checks for sites complementary to the 20-base-pair spacer region of the guide RNA. Cas9 targeting has been simplified, and most Cas-based systems have been engineered to require only one or two chimeric guide RNAs or a single guide RNA (chiRNA, often simply referred to as guide RNA, gRNA, or sgRNA), which is generated by the fusion of crRNA and tracrRNA. The spacer region can be engineered as needed.

如本文中所用，术语“Cas9编码序列”是指能够被转录和/或翻译(根据在宿主细胞/宿主哺乳动物中有功能的遗传密码)以产生Cas9蛋白的多核苷酸。Cas9编码序列可以是DNA(诸如质粒)或RNA(诸如mRNA)。As used herein, the term "Cas9 coding sequence" refers to a polynucleotide capable of being transcribed and/or translated (according to the genetic code that functions in the host cell/host mammal) to produce the Cas9 protein. The Cas9 coding sequence can be DNA (such as a plasmid) or RNA (such as mRNA).

如本文中所用，术语CRISPR/Cas核糖核蛋白是指由CRISPR/Cas蛋白和相关引导核酸组成的蛋白质/核酸复合物。例如，Cas9核糖核蛋白是指与其相关引导RNA复合的Cas9。As used herein, the term CRISPR/Cas ribonucleoprotein refers to a protein/nucleic acid complex composed of a CRISPR/Cas protein and an associated guide RNA. For example, Cas9 ribonucleoprotein refers to Cas9 complexed with its associated guide RNA.

在一些实施方案中，核酸酶是RNA引导的核酸酶。用于本公开的RNA引导的核酸酶(包括核酸引导的核酸酶)的非限制性实例包括但不限于CasI、CasIB、Cas2、Cas3、Cas4、Cas5、Cas6、Cas7、Cas8、Cas9、Cas10、CasX、CasY、Cas12a(Cpf1)、Cas12b、Cas13a、CsyI、Csy2、Csy3、CseI、Cse2、CscI、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、CmrI、Cmr3、Cmr4、Cmr5、Cmr6、CsbI、Csb2、Csb3、Csx17、Csx14、Csx10、Csx16、CsaX、Csx3、Csx1、Csx15、CsfI、Csf2、Csf3、Csf4、Cms1、C2c1、C2c2、C2c3或其同源物、直系同源物或经修饰的形式。In some embodiments, the nuclease is an RNA-guided nuclease. Non-limiting examples of RNA-guided nucleases (including nucleic acid-guided nucleases) used in this disclosure include, but are not limited to, CasI, CasIB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, Cas10, CasX, CasY, Cas12a(Cpf1), Cas12b, Cas13a, CsyI, Csy2, Csy3, CseI, Cse2, CscI, Csc2, and Csa5. Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, CmrI, Cmr3, Cmr4, Cmr5, Cmr6, CsbI, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, CsfI, Csf2, Csf3, Csf4, Cms1, C2c1, C2c2, C2c3 or their homologs, direct homologs or modified forms.

“megaTAL核酸酶/-切口酶”是指包含工程化的TALE DNA结合结构域的工程化的核酸酶和工程化的大范围核酸酶或工程化的归巢核酸内切酶。TALE DNA结合结构域可被设计用于结合基因组中核酸序列的几乎任何基因座处的DNA，并且如果这种DNA结合结构域与工程化的大范围核酸酶融合，则切割靶序列。例如，megaTAL核酸酶的说明性实例和TALE DNA结合结构域的设计由Boissel等人(MegaTALs:a rare-cleaving nuclease architecturefor therapeutic genome engineering(2013),Nucleic Acids Research 42(4):2591-2601)和本文引用的参考文献公开，所有这些文献均通过引用以其整体并入本文。megaTAL核酸酶任选地包含一个或多个接头和/或额外的功能结构域，例如C末端结构域(CTD)多肽、N末端结构域(NTD)多肽、展示5-3’核酸外切酶或3-5’核酸外切酶的末端加工酶促结构域、或其它非核酸酶结构域，例如解旋酶结构域。“megaTAL nucleases/-cleaving enzymes” refers to engineered nucleases and engineered broad-based nucleases or engineered homing endonucleases that contain engineered TALE DNA-binding domains. TALE DNA-binding domains can be designed to bind DNA at virtually any locus of a nucleic acid sequence in the genome, and if such a DNA-binding domain is fused with an engineered broad-based nuclease, it cleaves the target sequence. Illustrative examples of megaTAL nucleases and the design of TALE DNA-binding domains are disclosed, for example, by Boissel et al. (MegaTALs: a rare-cleaving nuclease architecture for therapeutic genome engineering (2013), Nucleic Acids Research 42(4):2591-2601) and the references cited herein, all of which are incorporated herein by reference in their entirety. megaTAL nucleases optionally include one or more adapters and/or additional functional domains, such as C-terminal domain (CTD) peptides, N-terminal domain (NTD) peptides, terminal processing enzymatic domains displaying 5-3' exonuclease or 3-5' exonuclease, or other non-nuclease domains, such as helicase domains.

转录激活子样效应因子(TALE)核酸酶/-切口酶是限制性内切酶，其可被工程化以切割特定的DNA序列。转录激活子样效应因子(TALE)可被工程化以与几乎任何所需的DNA序列结合，因此当与DNA切割结构域结合时，DNA可在特定的位置被切割。Transcription activator-like effector (TALE) nucleases/-cleavage enzymes are restriction endonucleases that can be engineered to cleave specific DNA sequences. TALEs can be engineered to bind to virtually any desired DNA sequence, so that when they bind to DNA cleavage domains, the DNA can be cleaved at a specific location.

“TALE DNA结合结构域”是转录激活子样效应因子(TALE或TAL-效应子)的DNA结合部分，其模拟植物转录激活子来操纵植物转录组。在一些实施方案中考虑的TALE DNA结合结构域是从头工程化的或来自天然存在的TALE，包括但不限于来自野油菜黄单胞菌疮痂致病变种(Xanthomonas campestris pv.vesicatoria)、加得那黄单胞菌(Xanthomonasgardneri)、半透明黄单胞菌(Xanthomonas translucens)、地毯草黄单胞菌(Xanthomonasaxonopodis)、穿孔黄单胞菌(Xanthomonas perforans)、苜蓿叶斑病黄单胞菌(Xanthomonas alfalfa)、柑桔溃疡病菌(Xanthomonas citri)、辣椒疮痂病菌(Xanthomonas euvesicatoria)和水稻黄单胞菌(Xanthomonas oryzae)的AvrBs3、以及来自青枯雷尔氏菌(Ralstonia solanacearum)的brg11和hpx17。用于衍生和设计DNA结合结构域的TALE蛋白的说明性实例公开于美国专利第9,017,967号和其中引用的参考文献中，所有这些文献通过引用以其整体并入本文。The “TALE DNA-binding domain” is the DNA-binding portion of transcription activator-like effectors (TALE or TAL-effectors) that mimics plant transcription activators to manipulate the plant transcriptome. In some implementations, the TALE DNA-binding domains considered are de novo engineered or derived from naturally occurring TALEs, including but not limited to AvrBs3 from Xanthomonas campestris pv. vesicatoria, Xanthomonas gardeneri, Xanthomonas translucens, Xanthomonas axonopodis, Xanthomonas perforans, Xanthomonas alfalfa, Xanthomonas citri, Xanthomonas euvesicatoria, and Xanthomonas oryzae, as well as brg11 and hpx17 from Ralstonia solanacearum. Illustrative examples of TALE proteins for deriving and designing DNA-binding domains are disclosed in U.S. Patent No. 9,017,967 and the references cited therein, all of which are incorporated herein by reference in their entirety.

“BurrH-核酸酶”是指具有核酸酶活性的融合蛋白，其包含模块化碱基/碱基特异性核酸结合结构域(MBBBD)。这些结构域源自细菌胞内共生体发根伯克霍尔德菌(Burkholderia Rhizoxinica)的蛋白质或从海洋生物中鉴定的其它类似蛋白质。通过将这些结合结构域的不同模块组合在一起，模块化碱基/碱基结合结构域可被工程化为具有与特定核酸序列的结合特性，诸如DNA结合结构域。因此，可将这种工程化的MBBBD与核酸酶催化结构域融合，以在基因组中核酸序列的几乎任何位点切割DNA。在WO 2014/018601和US2015225465 A1以及其中引用的参考文献中公开了BurrH-核酸酶和MBBBD设计的说明性实例，所有这些文献通过引用以其整体并入本文。"BurrH-nuclease" refers to a fusion protein with nuclease activity containing a modular base/base-specific nucleic acid binding domain (MBBBD). These domains are derived from proteins of the bacterial endocellular symbiont Burkholderia Rhizoxinica or other similar proteins identified from marine organisms. By combining different modules of these binding domains, the modular base/base binding domain can be engineered to have binding properties to specific nucleic acid sequences, such as DNA-binding domains. Therefore, such engineered MBBBDs can be fused with nuclease catalytic domains to cleave DNA at virtually any site in the nucleic acid sequence of the genome. Illustrative examples of BurrH-nuclease and MBBBD design are disclosed in WO 2014/018601 and US2015225465 A1, and the references cited therein, all of which are incorporated herein by reference in their entirety.

本公开的相关方面提供了适合在细胞中产生CRISPR/Cas介导的双链断裂(DSB)的核酸分子，诸如载体。在一些实施方案中，载体包含编码CRISPR/Cas蛋白例如Cas9的序列和引导核酸(Cas9单一引导RNA，或sgRNA)的序列(其与适合它们在细胞中表达的启动子可操作地连接)以及诸如复制起点和选择标记等其它载体成分。在一些实施方案中，细胞是本文所述的胚胎干细胞或胚胎杂交干细胞。Relevant aspects of this disclosure provide nucleic acid molecules, such as vectors, suitable for generating CRISPR/Cas-mediated double-strand breaks (DSBs) in cells. In some embodiments, the vector comprises a sequence encoding a CRISPR/Cas protein, such as Cas9, and a sequence of a guide nucleic acid (Cas9 single guide RNA, or sgRNA) operatively linked to a promoter suitable for their expression in cells, as well as other vector components such as origin of replication and selection markers. In some embodiments, the cells are embryonic stem cells or embryonic hybrid stem cells as described herein.

根据本公开，通过由核酸内切酶产生的双链断裂(DSB)促进同源重组。在一些实施方案中，核酸内切酶包含CRISPR/Cas9和一种或多种单一指导RNA(简称“sgRNA”或“gRNA”)。本领域技术人员或普通技术人员将能够选择引导RNA，其具有靶位于模板序列和靶序列的侧翼，或位于靶位置上的靶向序列，如上文针对核酸内切酶位点所述。According to this disclosure, homologous recombination is facilitated by double-strand breaks (DSBs) generated by a nuclease. In some embodiments, the nuclease comprises CRISPR/Cas9 and one or more single guide RNAs (referred to as "sgRNA" or "gRNA"). Those skilled in the art or of ordinary skill will be able to select the guide RNA having a target located flanking the template sequence and the target sequence, or a target sequence located at the target site, as described above for nuclease sites.

在一些实施方案中，可通过引入核酸分子(诸如一种或多种编码CRISPR/Cas蛋白的载体或编码序列)以及一种或多种sgRNA来引入酶。在一些实施方案中，编码CRISPR/Cas蛋白的载体或编码序列是CRISPR/Cas mRNA。在一些实施方案中，编码CRISPR/Cas蛋白的载体或编码序列是载体诸如质粒，其包含编码CRISPR/Cas蛋白和gRNA的DNA序列。在一些实施方案中，CRISPR/Cas蛋白是Cas9。In some embodiments, the enzyme can be introduced by introducing a nucleic acid molecule (such as one or more vectors or coding sequences encoding CRISPR/Cas proteins) and one or more sgRNAs. In some embodiments, the vector or coding sequence encoding the CRISPR/Cas protein is CRISPR/Cas mRNA. In some embodiments, the vector or coding sequence encoding the CRISPR/Cas protein is a vector such as a plasmid containing a DNA sequence encoding the CRISPR/Cas protein and gRNA. In some embodiments, the CRISPR/Cas protein is Cas9.

在某些实施方案中，可将分离的CRISPR/Cas蛋白直接引入细胞(例如，受精卵或ES细胞，通过显微注射或电穿孔)。CRISPR/Cas蛋白可呈CRISPR/Cas核糖核蛋白的形式，其为CRISPR/Cas蛋白/gNA(引导核酸)复合物。或者CRISPR/Cas蛋白可以不含任何gNA，使得将CRISPR/Cas蛋白和一种或多种gNA共引入受精卵或ES细胞，以允许在细胞内原位形成CRISPR/Cas蛋白/gNA复合物。在一些实施方案中，CRISPR/Cas蛋白和gNA由载体编码，所述载体通过转染、电穿孔或转导引入细胞。在一些实施方案中，CRISPR/Cas蛋白是Cas9。In some embodiments, isolated CRISPR/Cas proteins can be directly introduced into cells (e.g., fertilized eggs or ES cells, via microinjection or electroporation). The CRISPR/Cas protein may be in the form of a CRISPR/Cas ribonucleoprotein, which is a CRISPR/Cas protein/gNA (guide nucleic acid) complex. Alternatively, the CRISPR/Cas protein may not contain any gNA, allowing the co-introduction of the CRISPR/Cas protein and one or more gNAs into fertilized eggs or ES cells to allow in situ formation of the CRISPR/Cas protein/gNA complex within the cell. In some embodiments, the CRISPR/Cas protein and gNA are encoded by a vector introduced into cells via transfection, electroporation, or transduction. In some embodiments, the CRISPR/Cas protein is Cas9.

为了在本公开的方法中用作核酸内切酶，CRISPR/Cas蛋白需要与gRNA形成功能复合物。In order to be used as a nuclease in the methods disclosed herein, the CRISPR/Cas protein needs to form a functional complex with gRNA.

根据一些实施方案，使用多个gNA，每个gNA靶向特定的CRISPR/Cas切割位点。例如，可使用四种gNA，两种具有对模板序列的任一侧上的gNA靶序列特异的靶向序列，两种具有对靶序列的任一侧上的gNA靶序列特异的靶向序列。可选地，可使用三种gNA，一种具有对靶位置上的gNA靶序列特异的靶向序列，两种具有对模板序列的任一侧上的gNA靶序列特异的靶向序列。作为又一个实例，可使用两种gNA，一种具有对与模板序列相邻的gNA靶序列特异性的靶向序列，一种具有对与靶序列相邻的gNA靶序列特异的靶向序列。According to some implementation schemes, multiple gNAs are used, each gNA targeting a specific CRISPR/Cas cleavage site. For example, four gNAs can be used: two with target sequences specific to the gNA target sequence on either side of the template sequence, and two with target sequences specific to the gNA target sequence on either side of the target sequence. Alternatively, three gNAs can be used: one with a target sequence specific to the gNA target sequence at the target site, and two with target sequences specific to the gNA target sequence on either side of the template sequence. As yet another example, two gNAs can be used: one with a target sequence specific to the gNA target sequence adjacent to the template sequence, and one with a target sequence specific to the gNA target sequence adjacent to the target sequence.

优选地，不依赖于用于产生DSB的gNA的数量，在某些实施方案中，基于它们与模板和靶序列的5’和3’末端或靶位置的接近程度，独立地选择每种gNA。Preferably, regardless of the number of gNAs used to generate the DSB, in some embodiments each gNA is selected independently based on its proximity to the 5' and 3' ends of the template and target sequences or the target location.

可使用公知的原则或在线工具，基于用户输入(诸如靶基因组和序列类型)进行gNA的选择和设计。一般来说，对于Cas9，gRNA是短的合成RNA，由Cas9结合所必需的“支架”序列和用户定义的约20个核苷酸的“间隔区”或“靶向”序列组成，所述间隔区或靶向序列定义了要被靶向序列结合或修饰的基因组靶标。为简单起见，“gRNA靶向Cas9切割位点”是指gRNA的间隔区或靶向序列被设计成与基因组靶序列结合并在切割位点切割其的事实。gRNAs can be selected and designed based on user input (such as target genome and sequence type) using well-known principles or online tools. Generally, for Cas9, gRNAs are short synthetic RNAs consisting of a "scaffold" sequence necessary for Cas9 binding and a user-defined "spacer" or "target" sequence of approximately 20 nucleotides, which defines the genomic target to be bound or modified by the target sequence. For simplicity, "gRNA targeting a Cas9 cleavage site" means that the spacer or target sequence of the gRNA is designed to bind to and cleave the genomic target sequence at the cleavage site.

根据本公开的引导核酸(包括gRNA和gDNA)的长度可以是10个核苷酸以上的任何多个核苷酸，包括10-50个核苷酸、10-40个、10-30个、10-20个、15-25个、16-24个、17-23个、18-22个、19-21个和20个核苷酸。The length of the guide nucleic acid (including gRNA and gDNA) according to this disclosure can be any number of nucleotides more than 10 nucleotides, including 10-50 nucleotides, 10-40 nucleotides, 10-30 nucleotides, 10-20 nucleotides, 15-25 nucleotides, 16-24 nucleotides, 17-23 nucleotides, 18-22 nucleotides, 19-21 nucleotides and 20 nucleotides.

优选地，靶向序列足够独特，使得理论上其与独特的(与基因组的其余部分相比)基因组靶序列结合。靶标应该紧邻前间隔序列邻近基序(或“PAM”序列)的上游(或5’)存在。PAM序列对于靶结合是绝对必要的，确切的序列取决于Cas9的种类。在最广泛使用的化脓性链球菌Cas9中，PAM序列是5′-NGG-3′(“N”表示4种标准核苷酸中的任一种)。不同物种中其它Cas9的其它PAM序列是本领域已知的。参见下表4中列出的示例性PAM序列。Preferably, the target sequence is unique enough that it theoretically binds to a unique genomic target sequence (compared to the rest of the genome). The target should be located immediately upstream (or 5') of the prespacer motif (or "PAM" sequence). The PAM sequence is absolutely necessary for target binding, and the exact sequence depends on the species of Cas9. In the most widely used Streptococcus pyogenes Cas9, the PAM sequence is 5′-NGG-3′ ("N" represents any of the four standard nucleotides). Other PAM sequences for other Cas9s in different species are known in the art. See the exemplary PAM sequences listed in Table 4 below.

表4.PAM序列Table 4. PAM sequences

Cas9的种类/变体Types/Variants of Cas9 PAM序列PAM sequence 化脓性链球菌(SP)；SpCas9Streptococcus pyogenes (SP); SpCas9 NGGNGG SpCas9 D1135E变体SpCas9 D1135E variant NGG(减少的NAG结合)NGG (Reduced NAG Binding) SpCas9 VRER变体SpCas9 VRER variant NGCGNGCG SpCas9 EQR变体SpCas9 EQR variant NGAGNGAG SpCas9 VQR变体SpCas9 VQR variant NGAN或NGNGNGAN or NGNG 金黄色葡萄球菌(SA)；SaCas9Staphylococcus aureus (SA); SaCas9 NNGRRT或NNGRR(N)NNGRRT or NNGRR(N) 脑膜炎奈瑟菌(NM)Neisseria meningitidis (NM) NNNNGATTNNNNGATT 嗜热链球菌(ST)Streptococcus thermophilus (ST) NNAGAAWNNAGAAW 齿垢密螺旋体(TD)Treponema denticulata (TD) NAAAACNAAAAC

Cas9-gRNA复合物将结合具有PAM的任何靶基因组序列，但是如果在gRNA间隔区与靶基因组序列之间存在足够的同源性，则Cas9仅切割靶基因组序列。Cas9介导的DNA切割的最终结果是靶基因组序列内位于PAM序列上游约3-4个核苷酸的切割位点的双链断裂(DSB)。The Cas9-gRNA complex will bind to any target genome sequence containing the PAM, but if there is sufficient homology between the gRNA spacer region and the target genome sequence, Cas9 will only cleave the target genome sequence. The final result of Cas9-mediated DNA cleavage is a double-strand break (DSB) at a cleavage site approximately 3-4 nucleotides upstream of the PAM sequence within the target genome sequence.

在一些实施方案中，双链断裂在靶序列上或两侧产生。例如，在其中靶染色体包含靶位置(诸如在模板序列将被插入其中而几乎没有或没有靶染色体的缺失的位置)的那些实施方案中，那么双链断裂在靶位置上产生。示例性靶位置包含本文所述的任何核酸酶的切割位点。作为另外的实例，在其中靶染色体包含靶序列(诸如将因模板序列的插入而被替换或删除的序列)的那些实施方案中，那么双链断裂在靶序列的任一侧(即，靶序列的5’和3’)产生。In some embodiments, double-strand breaks are generated on or between the target sequence. For example, in those embodiments where the target chromosome contains a target location (such as a location where the template sequence will be inserted with little or no deletion of the target chromosome), then the double-strand break is generated at the target location. Exemplary target locations include cleavage sites of any nucleases described herein. As another example, in those embodiments where the target chromosome contains a target sequence (such as a sequence that will be replaced or deleted due to the insertion of the template sequence), then the double-strand break is generated on either side of the target sequence (i.e., the 5' and 3' of the target sequence).

在某些实施方案中，任何选择的核酸内切酶的切割位点(例如gNA靶向序列)在靶序列或位置的约10bp、约20bp、约30bp、约50bp、约70bp、约100bp、约200bp、约300bp、约400bp或约500bp内。In some implementations, the cleavage site of any selected endonuclease (e.g., the gNA target sequence) is within approximately 10 bp, approximately 20 bp, approximately 30 bp, approximately 50 bp, approximately 70 bp, approximately 100 bp, approximately 200 bp, approximately 300 bp, approximately 400 bp, or approximately 500 bp of the target sequence or location.

在某些实施方案中，任何选择的核酸内切酶的切割位点(例如gNA靶向序列)在模板序列的约100bp、约200bp、约300bp、约400bp、约500bp、约600bp、约700bp、约800bp、约900bp、约1,000bp、约1,100bp、约1,200bp、约1,300bp、约1,400bp、约1,500bp、约1,600bp、约1,700bp、约1,800bp、约1,900bp或约2,000内。In some implementations, the cleavage site of any selected endonuclease (e.g., the gNA target sequence) is within approximately 100 bp, approximately 200 bp, approximately 300 bp, approximately 400 bp, approximately 500 bp, approximately 600 bp, approximately 700 bp, approximately 800 bp, approximately 900 bp, approximately 1,000 bp, approximately 1,100 bp, approximately 1,200 bp, approximately 1,300 bp, approximately 1,400 bp, approximately 1,500 bp, approximately 1,600 bp, approximately 1,700 bp, approximately 1,800 bp, approximately 1,900 bp, or approximately 2,000 bp of the template sequence.

在一些实施方案中，双链断裂通过至少一种DNA修复途径来修复，所述DNA修复途径选自由以下组成的组：切除、错配修复(MMR)、核苷酸切除修复(NER)、碱基切除修复(BER)、规范非同源末端连接(规范NHEJ)、替代非同源末端连接(ALT-NHEJ)、规范同源定向修复(规范HDR)、替代同源定向修复(ALT-HDR)、微同源性介导的末端连接(MMEJ)、平末端连接、合成依赖性微同源性介导的末端连接、单链退火(SSA)、霍利迪连接模型(Hollidayjunction model)或双链断裂修复(DSBR)、合成依赖性链退火(SDSA)、单链断裂修复(SSBR)、跨损伤合成修复(TLS)和链间交联修复(ICL)以及DNA/RNA加工。In some implementations, double-strand breaks are repaired via at least one DNA repair pathway selected from the group consisting of: excision, mismatch repair (MMR), nucleotide excision repair (NER), base excision repair (BER), canonical non-homologous end joining (canonical NHEJ), alternative non-homologous end joining (ALT-NHEJ), canonical homologous directed repair (canonical HDR), alternative homologous directed repair (ALT-HDR), microhomology-mediated end joining (MMEJ), blunt end joining, synthesis-dependent microhomology-mediated end joining, single-strand annealing (SSA), Holliday junction model or double-strand break repair (DSBR), synthesis-dependent strand annealing (SDSA), single-strand break repair (SSBR), trans-damage synthetic repair (TLS), and interstrand crosslinking repair (ICL), as well as DNA/RNA processing.

工程化的染色体的回收Recycling of engineered chromosomes

本公开提供了回收本文所述的工程化的染色体，并将所述工程化的染色体转移至适于下游应用的细胞环境中的方法。在一些实施方案中，回收本文所述的工程化的染色体包括微细胞介导的染色体转移(MMCT)。This disclosure provides methods for recovering the engineered chromosomes described herein and transferring the engineered chromosomes into a cellular environment suitable for downstream applications. In some embodiments, the recovery of the engineered chromosomes described herein includes microcell-mediated chromosome transfer (MMCT).

微细胞介导的染色体转移(MMCT)是将从供体细胞制备的微细胞与受体细胞融合的技术。通过这种技术，供体细胞中的特定(外源)DNA(例如，染色体)可被转移到受体细胞中。通常通过用秋水仙胺处理供体细胞来制备微细胞，尽管也可以使用其它方法，并且所述方法也被认为在本公开的范围内。Microcell-mediated chromosome transfer (MMCT) is a technique for fusing microcells prepared from donor cells with recipient cells. Through this technique, specific (exogenous) DNA (e.g., chromosomes) from the donor cells can be transferred to the recipient cells. Microcells are typically prepared by treating donor cells with colchicine, although other methods can also be used, and such methods are also considered to be within the scope of this disclosure.

示例性MMCT方案包括在足以诱导微核化的条件下，在包含至少一种微核诱导剂的细胞培养基中培养包含工程化的染色体的细胞，从而产生微核细胞，并收集微核细胞。示例性微核诱导剂包括但不限于微管聚合抑制剂、微管解聚抑制剂和纺锤体检查点抑制剂。本领域已知的示例性微核诱导剂包括但不限于秋水仙胺、秋水仙碱、长春新碱或其组合。例如，可用0.05μg/mL至0.25μg/mL处理细胞以诱导微核化。An exemplary MMCT protocol includes culturing cells containing engineered chromosomes in a cell culture medium containing at least one micronucleation inducer under conditions sufficient to induce micronucleation, thereby generating micronucleated cells, and collecting the micronucleated cells. Exemplary micronucleation inducers include, but are not limited to, microtubule polymerization inhibitors, microtubule depolymerization inhibitors, and spindle checkpoint inhibitors. Exemplary micronucleation inducers known in the art include, but are not limited to, colchicine, vincristine, or combinations thereof. For example, cells can be treated with 0.05 μg/mL to 0.25 μg/mL to induce micronucleation.

微核细胞可使用本领域已知的任何合适的方法包括离心和过滤来回收。Micronucleated cells can be recovered using any suitable method known in the art, including centrifugation and filtration.

因此，本公开提供了包括回收工程化的染色体的方法，所述方法包括在足以诱导微核化的条件下将细胞暴露于秋水仙胺，并使用离心收集微核细胞。Therefore, this disclosure provides a method for recovering engineered chromosomes, the method comprising exposing cells to colchicine under conditions sufficient to induce micronucleation and collecting micronucleated cells using centrifugation.

在一些实施方案中，工程化的染色体包含一种或多种标记，例如当用模板序列工程化染色体时引入的选择标记或可检测的标记。这些标记可用于追踪工程化的染色体，并在与上述微核细胞融合后选择包含工程化的染色体的细胞。In some implementations, the engineered chromosome contains one or more markers, such as selection markers or detectable markers introduced when the chromosome is engineered using a template sequence. These markers can be used to track the engineered chromosome and select cells containing the engineered chromosome after fusion with the aforementioned micronucleated cells.

因此，本公开提供了产生胚胎干细胞的方法，其包括：(a)将包含通过本公开的方法产生的工程化的染色体的微核细胞与ES细胞融合，其中(i)Es细胞包含与工程化的染色体同源的染色体，所述同源染色体包含与能够在ES细胞中表达荧光蛋白的启动子可操作地连接的第一荧光蛋白，以及(ii)至少一个亚群的微核细胞包含工程化的染色体，并且其中所述工程化的染色体包含不同于第一荧光蛋白的第二荧光蛋白，第二荧光蛋白与能够在ES细胞中表达荧光蛋白的启动子可操作地连接；(b)选择表达第一和第二荧光蛋白两者的ES细胞；(c)培养步骤(c)中选择的ES细胞，直至至少一个亚群的ES细胞丢失同源染色体；以及(d)选择表达第二荧光蛋白但不表达第一种荧光蛋白的ES细胞。在一些实施方案中，ES细胞是小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、鸡或猴ES细胞。在一些实施方案中，ES细胞是小鼠ES细胞。在一些实施方案中，ES细胞是大鼠ES细胞。在一些实施方案中，ES细胞是猴ES细胞。Therefore, this disclosure provides a method for generating embryonic stem cells, comprising: (a) fusing micronuclear cells containing engineered chromosomes generated by the method of this disclosure with ES cells, wherein (i) the ES cells contain chromosomes homologous to the engineered chromosomes, the homologous chromosomes containing a first fluorescent protein operatively linked to a promoter capable of expressing a fluorescent protein in the ES cells, and (ii) at least one subpopulation of micronuclear cells containing engineered chromosomes, wherein the engineered chromosomes contain a second fluorescent protein different from the first fluorescent protein, the second fluorescent protein being operatively linked to a promoter capable of expressing a fluorescent protein in the ES cells; (b) selecting ES cells expressing both the first and second fluorescent proteins; (c) culturing the ES cells selected in step (c) until at least one subpopulation of ES cells loses the homologous chromosome; and (d) selecting ES cells expressing the second fluorescent protein but not the first fluorescent protein. In some embodiments, the ES cells are mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cattle, horse, camel, chicken, or monkey ES cells. In some embodiments, the ES cells are mouse ES cells. In some embodiments, the ES cells are rat ES cells. In some implementations, the ES cells are monkey ES cells.

虽然上文所述的产生胚胎干细胞的方法使用两种不同的荧光蛋白作为标记，但本领域普通技术人员将会理解，只要工程化的染色体和同源染色体上的标记不同，其它标记也可以是合适的。例如，可使用本文所述的两种不同的选择标记，以及两种不同的表面分子，所述表面分子可被标记的抗体识别，或者缀合于选择标记诸如金颗粒，这允许通过离心进行选择。作为另外的实例，除了作为标记的荧光蛋白之外，嘌呤霉素和潮霉素/胸苷激酶(TK)标记也可用于该步骤中的阳性-阴性选择。当胸苷激酶在特定的胸苷类似物存在的情况下表达时，这些类似物被转化为杀死细胞的毒性化合物。例如，将嘌呤霉素抗性标记和潮霉素/TK标记敲入两条染色体的相同位置，并通过在嘌呤霉素和潮霉素中培养来选择双阳性单克隆。培养几天后，使用嘌呤霉素和胸苷激酶来选择已丢失了一个染色体拷贝的克隆，所述染色体携带有潮霉素/TK标记。While the methods for generating embryonic stem cells described above use two different fluorescent proteins as markers, those skilled in the art will understand that other markers may be suitable, provided the markers on the engineered chromosomes and homologous chromosomes are different. For example, the two different selection markers described herein, as well as two different surface molecules that can be recognized by labeled antibodies or conjugated to selection markers such as gold particles, can be used, allowing selection by centrifugation. As another example, in addition to fluorescent proteins as markers, puromycin and hygromycin/thymidine kinase (TK) markers can also be used for positive-negative selection in this step. When thymidine kinase is expressed in the presence of specific thymidine analogs, these analogs are converted into toxic compounds that kill cells. For example, a puromycin resistance marker and a hygromycin/TK marker are knocked into the same location on two chromosomes, and double-positive monoclonal clones are selected by culturing in puromycin and hygromycin. After several days of culture, clones that have lost a copy of a chromosome carrying the hygromycin/TK marker are selected using puromycin and thymidine kinase.

在一些实施方案中，产生胚胎干细胞的方法包括(a)将包含通过本公开的方法产生的工程化的染色体的微核细胞与ES细胞融合，其中(i)Es细胞包含与工程化的染色体同源的染色体，所述同源染色体包含第一标记，以及(ii)至少一个亚群的微核细胞包含工程化的染色体，并且其中工程化的染色体包含不同于第一标记的第二标记；(b)选择表达第一和第二标记两者的ES细胞；(c)培养步骤(c)中选择的ES细胞，直至至少一个亚群的ES细胞丢失同源染色体；以及(d)选择表达第二标记但不表达第一标记的ES细胞。In some embodiments, a method for generating embryonic stem cells includes (a) fusing micronucleated cells containing engineered chromosomes generated by the methods of this disclosure with ES cells, wherein (i) the ES cells contain chromosomes homologous to the engineered chromosomes, the homologous chromosomes containing a first marker, and (ii) at least one subpopulation of micronucleated cells contains engineered chromosomes, and wherein the engineered chromosomes contain a second marker different from the first marker; (b) selecting ES cells expressing both the first and second markers; (c) culturing the ES cells selected in step (c) until at least one subpopulation of ES cells loses the homologous chromosomes; and (d) selecting ES cells expressing the second marker but not the first marker.

可使用任何合适的方法将微核细胞与ES细胞融合。融合方法尤其包括电融合、病毒诱导融合和化学诱导融合，例如通过向细胞中加入PEG1000。Micronucleated cells can be fused with ES cells using any suitable method. Fusion methods include, in particular, electrofusion, virus-induced fusion, and chemically induced fusion, such as by adding PEG1000 to the cells.

考虑到通过上述回收工程化的染色体的方法产生的三体性的固有不稳定性，培养通过与微核细胞融合产生的细胞至少5天、至少7天、至少10天或至少14天的时间可足以获得已经丢失了对应于工程化的染色体的同源染色体的细胞。或者，可使用采用负选择标记例如位于同源染色体上的标记的选择方案，当所述标记暴露于选择方案时，其表达杀死细胞。在一些实施方案中，在步骤(b)和(d)中选择细胞包括荧光激活细胞分选(FACS)。例如，细胞可以是FAC分选的细胞，其表达用于标记工程化的染色体的第二荧光蛋白，但不表达用于标记同源染色体的第一荧光蛋白。Considering the inherent instability of trisomy resulting from the aforementioned method of recovering engineered chromosomes, culturing cells generated by fusion with micronucleated cells for at least 5, 7, 10, or 14 days is sufficient to obtain cells that have lost the homologous chromosome corresponding to the engineered chromosome. Alternatively, a selection scheme employing a negative selection marker, such as a marker located on the homologous chromosome, can be used, whose expression kills cells when exposed to the selection scheme. In some embodiments, cell selection in steps (b) and (d) includes fluorescence-activated cell sorting (FACS). For example, the cells may be FACS-sorted cells that express a second fluorescent protein for labeling the engineered chromosome but not a first fluorescent protein for labeling the homologous chromosome.

细胞cell

本发明提供了用于本公开的方法的细胞。在一些实施方案中，细胞包括胚胎干(ES)细胞、杂交胚胎干(EHS)细胞或受精卵细胞。本公开还提供了包含通过本公开的方法产生的工程化的染色体的细胞。本公开提供了分离、融合和培养本文所述细胞的方法。This invention provides cells for use in the methods of this disclosure. In some embodiments, the cells include embryonic stem (ES) cells, hybrid embryonic stem (EHS) cells, or fertilized egg cells. This disclosure also provides cells comprising engineered chromosomes produced by the methods of this disclosure. This disclosure provides methods for isolating, fusing, and culturing the cells described herein.

因此，本公开提供了融合细胞以产生本文所述的EHS细胞的方法。通过化学、生物学和物理手段，细胞融合已经成为可能。这些技术的实例分别包括聚乙二醇(PEG)融合、融合型病毒融合(fusagenic virus fusion)和电融合。Therefore, this disclosure provides a method for fusing cells to generate the EHS cells described herein. Cell fusion has become possible through chemical, biological, and physical means. Examples of these techniques include polyethylene glycol (PEG) fusion, fusagenic virus fusion, and electrofusion.

用于本公开的方法中的ES细胞可从多种来源获得，并且可以是原代分离的ES细胞或者人工或天然产生的ES细胞系。还可在细胞融合以产生本公开的EHS细胞之前或之后，或者在本文所述方法之前或之后，首先对ES细胞进行遗传修饰，以引入有用的性状，诸如一种或多种标记的表达。The ES cells used in the methods of this disclosure can be obtained from a variety of sources and can be primary isolated ES cells or artificially or naturally generated ES cell lines. The ES cells may also be genetically modified before or after cell fusion to generate the EHS cells of this disclosure, or before or after the methods described herein, to introduce useful traits, such as the expression of one or more markers.

一种常用的技术是使用例如PEG的化学融合。这项技术在产生杂交瘤方面特别成功。通过将细胞暴露在强电场中非常短的时间，可以提高融合概率。在暴露于电场之前，可以使用化学剂在悬浮液中实现所需类型的细胞对(即两种类型的EH细胞)的连合(linkage)和接近。A common technique is chemical fusion using, for example, PEG. This technique is particularly successful in generating hybridomas. The fusion probability can be increased by exposing cells to a strong electric field for a very short time. Before exposure to the electric field, chemical agents can be used in a suspension to achieve the linkage and proximity of the desired type of cell pair (i.e., two types of EH cells).

细胞的电融合包括将细胞紧密地聚集在一起，并将它们暴露在交变电场中。在适当的条件下，细胞被推到一起，细胞膜融合，然后形成融合细胞或杂交细胞。细胞的电融合和用于进行电融合的装置描述于例如美国专利第4,441,972号、第4,578,168号和第5,283,194号、国际专利申请第PCT/AU92/00473号中。通常，所述方法包括选择细胞并将它们放置在采用来用作细胞融合室的充满流体的室中。单个细胞对可参与融合过程，即单细胞融合，或者大量融合可在两个群体中发生，每个群体包含两个或多个细胞。大量融合(Bulkfusion)可以是其中涉及约2至约1000个细胞的小型大量融合(mini-bulk fusion)，或其中涉及超过约1000个细胞的大型大量融合(macro-bulk fusion)。可通过化学手段(诸如在PEG存在的情况下)、生物手段(诸如在融合病毒存在的情况下)或通过电手段(即电融合)来促进融合。融合也可包括这些技术的组合。还可用细胞因子诸如白细胞介素3(IL-3)处理细胞以促进融合。Electrofusion of cells involves bringing cells together tightly and exposing them to an alternating electric field. Under appropriate conditions, the cells are pushed together, their cell membranes fuse, and they form fused cells or hybrid cells. Electrofusion of cells and apparatus for performing electrofusion are described, for example, in U.S. Patent Nos. 4,441,972, 4,578,168, and 5,283,194, and International Patent Application No. PCT/AU92/00473. Typically, the method involves selecting cells and placing them in a fluid-filled chamber used as a cell fusion chamber. Single cell pairs may participate in the fusion process, i.e., single-cell fusion, or bulk fusion may occur in two populations, each containing two or more cells. Bulk fusion can be a mini-bulk fusion involving about 2 to about 1000 cells, or a macro-bulk fusion involving more than about 1000 cells. Fusion can be promoted by chemical means (such as in the presence of PEG), biological means (such as in the presence of a fusion virus), or electrical means (i.e., electrofusion). Fusion can also include a combination of these techniques. Treatment of cells with cytokines such as interleukin-3 (IL-3) can also promote fusion.

细胞融合后，获得融合的细胞(融合细胞(fusate cell))或另外地称为杂合细胞，其包含至少两个细胞的细胞核，所述细胞核被包裹在来自参与融合的细胞的融合脂质双层中。细胞核融合，产生染色体数目异常的杂交细胞，其可能是四倍体或含有更少或更多的染色体。杂交细胞在适当的培养条件下具有分裂和增殖的能力。Following cell fusion, fused cells (fusate cells) or hybrid cells are obtained, containing the nuclei of at least two cells enclosed in a fusion lipid bilayer derived from the cells involved in the fusion. Nuclear fusion produces hybrid cells with an abnormal number of chromosomes, which may be tetraploid or contain fewer or more chromosomes. Hybrid cells possess the ability to divide and proliferate under appropriate culture conditions.

在一些实施方案中，通过电融合产生EHS细胞。例如，人与小鼠、人与大鼠或人与猴的ES细胞可通过电融合来融合。在一些实施方案中，来自两个不同物种的两种EHS细胞经历电融合以产生EHS细胞，所述物种选自由以下组成的组：人、小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、鸡和猴。In some embodiments, EHS cells are generated by electrofusion. For example, human and mouse, human and rat, or human and monkey ES cells can be fused by electrofusion. In some embodiments, two EHS cells from two different species undergo electrofusion to generate EHS cells, said species being selected from the group consisting of: human, mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cattle, horse, camel, chicken, and monkey.

通常，一旦发生融合，在合适的富培养基中回收所得的杂交细胞，然后将其在培养中扩增用于本公开的方法。恢复培养基应包含允许在融合应激后细胞融合物恢复的因子。这种补充剂可包含高百分比(例如20％)的胎牛血清。Typically, once fusion occurs, the resulting hybrid cells are recovered in a suitable enriched medium and then amplified in culture for use in the methods disclosed herein. The recovery medium should contain factors that allow the cell fusion to recover after fusion stress. This supplement may contain a high percentage (e.g., 20%) of fetal bovine serum.

通过细胞融合产生的杂交细胞可包含独特的细胞表面标记，其可用于选择这些细胞、监测融合事件。Hybrid cells generated through cell fusion can contain unique cell surface markers that can be used to select these cells and monitor fusion events.

在一些实施方案中，本公开的细胞包含一种或多种遗传修饰，诸如本文所述的标记的引入。遗传修饰可通过本领域已知的任何合适的方法进行。例如，可通过转染、转导、电穿孔、脂转染等来修饰细胞。In some embodiments, the cells of this disclosure contain one or more genetic modifications, such as the introduction of markers described herein. Genetic modifications can be performed by any suitable method known in the art. For example, cells can be modified by transfection, transduction, electroporation, lipid transfection, etc.

本文所用的转染是指将核酸(包括裸核酸或纯化的核酸或携带特定核酸的载体)引入细胞，特别是真核细胞，包括哺乳动物细胞。在本公开的说明书中可以使用任何已知的转染方法。这些方法中的一些包括增强生物膜的通透性以将核酸带入细胞。突出的实例是电穿孔、微孔化(microporation)和脂转染。所述方法可单独使用，或者可由声能、电磁能和热能、化学渗透增强剂、压力等支持，用于选择性地提高核酸进入宿主细胞的流通率(fluxrate)。其它转染方法也在本公开的范围内，诸如基于载体的转染，包括脂转染或基于病毒(也称为转导)和化学的转染。然而，可使用任何将核酸带入细胞内的方法。瞬时转染的细胞将在短时间内携带/表达转染的RNA/DNA，并且不会将其传递下去。稳定转染的细胞将持续表达转染的DNA并将其传递下去：外源核酸已整合到细胞的基因组中。Transfection, as used herein, refers to the introduction of nucleic acids (including naked nucleic acids, purified nucleic acids, or vectors carrying specific nucleic acids) into cells, particularly eukaryotic cells, including mammalian cells. Any known transfection method may be used within the scope of this disclosure. Some of these methods involve enhancing the permeability of biological membranes to carry nucleic acids into cells. Prominent examples are electroporation, microporation, and lipid transfection. These methods may be used alone or supported by acoustic, electromagnetic, and thermal energy, chemipore enhancers, pressure, etc., to selectively increase the flux rate of nucleic acids into host cells. Other transfection methods are also within the scope of this disclosure, such as vector-based transfection, including lipid transfection or virus-based (also known as transduction) and chemical transfection. However, any method for carrying nucleic acids into cells may be used. Transiently transfected cells will carry/express the transfected RNA/DNA for a short period and will not pass it on. Stably transfected cells will continuously express and pass on the transfected DNA: the exogenous nucleic acid has been integrated into the cell's genome.

许多病毒已被用作基因转移载体或作为制备基因转移载体的基础，包括乳多空病毒、腺病毒、痘苗病毒、腺相关病毒、慢病毒、辛德比斯和塞姆利基森林病毒以及禽源和人源的逆转录病毒。Many viruses have been used as gene transfer vectors or as the basis for preparing gene transfer vectors, including lactopolyviruses, adenoviruses, vaccinia viruses, adeno-associated viruses, lentiviruses, Sindbis and Semlikie forest viruses, as well as avian and human retroviruses.

基因转移的化学技术(包括磷酸钙共沉淀)、机械技术(例如显微注射)、通过脂质体的膜融合介导的转移和直接DNA摄取以及受体介导的DNA转移。病毒介导的基因转移可与使用脂质体递送的直接体内基因转移相结合，允许将病毒载体导向特定细胞。或者，可将逆转录病毒载体生产细胞系注射到特定组织中。生产细胞的注射将提供载体颗粒的连续来源。Gene transfer can be achieved through chemical techniques (including calcium phosphate coprecipitation), mechanical techniques (e.g., microinjection), membrane fusion-mediated transfer via liposomes, direct DNA uptake, and receptor-mediated DNA transfer. Virus-mediated gene transfer can be combined with direct in vivo gene transfer using liposomes, allowing viral vectors to be directed to specific cells. Alternatively, retroviral vector-producing cell lines can be injected into specific tissues. Injection of producing cells provides a continuous source of vector particles.

本公开提供了培养本公开的细胞的方法。在本文所述的实施方案中设想了许多干细胞培养基培养或生长环境，包括成分明确的培养基、条件培养基、无饲养细胞培养基、无血清培养基等。如本文中所用，术语其“生长环境”等同物是未分化或分化的干细胞(例如，胚胎干细胞)将在其中进行体外增殖的环境。环境的特征包括在其中培养细胞的培养基和支持结构(诸如固体表面上的基质)(如果存在的话)。培养或维持细胞的方法也描述于PCT/US2007/062755、美国申请号11/993,399和美国申请号11/875,057中。This disclosure provides methods for culturing the cells of this disclosure. Many stem cell culture media or growth environments are envisioned in the embodiments described herein, including well-defined media, conditioned media, feeder-free media, serum-free media, etc. As used herein, the term "growth environment" is equivalent to an environment in which undifferentiated or differentiated stem cells (e.g., embryonic stem cells) will proliferate in vitro. The environment is characterized by the culture medium in which the cells are cultured and supporting structures (such as a matrix on a solid surface, if present). Methods for culturing or maintaining cells are also described in PCT/US2007/062755, U.S. Application No. 11/993,399, and U.S. Application No. 11/875,057.

基础细胞培养基在本领域中是已知的，并且是可商购获得的。示例性基础细胞培养基包括但不限于基于DMEM、CMRL或RPMI的培养基。Basic cell culture media are known in the art and are commercially available. Exemplary basic cell culture media include, but are not limited to, media based on DMEM, CMRL, or RPMI.

本公开的细胞培养方法中使用的细胞培养基可含血清，或者不含血清。细胞培养基还可包含一种或多种补充剂或本领域已知的其它培养基组分，诸如B27补充剂、胰岛素、葡萄糖、诸如EGF和FGF等生长因子以及细胞因子。The cell culture methods disclosed herein may use cell culture media containing serum or without serum. The cell culture media may also contain one or more supplements or other culture medium components known in the art, such as B27 supplements, insulin, glucose, growth factors such as EGF and FGF, and cytokines.

术语“饲养细胞”是指在体外生长并向培养基中分泌至少一种因子的细胞培养物，其可用于在培养物中支持另一种目标细胞的生长。如本文中所用，“饲养细胞层”可与术语“饲养细胞”互换使用饲养细胞可包含单层，其中饲养细胞在生长在彼此顶部之前以完整的层覆盖培养皿的表面，或者可包含成簇的细胞。在优选实施方案中，饲养细胞包含贴壁单层。The term "feeder cells" refers to a cell culture that grows in vitro and secretes at least one factor into a culture medium, which can be used to support the growth of another target cell in the culture. As used herein, the term "feeder cell layer" may be used interchangeably with the term "feeder cells." Feeder cells may comprise a monolayer, wherein the feeder cells cover the surface of the culture dish in a complete layer before growing on top of each other, or may comprise clusters of cells. In a preferred embodiment, the feeder cells comprise an adherent monolayer.

类似地，其中在不使用饲养细胞的情况下，在确定的条件或培养系统中生长ES或EHS细胞培养物或聚集体悬浮培养物(aggregate suspension culture)的实施方案是“无饲养细胞的”。美国专利第6,800,480号中也描述了无饲养细胞方法。在一些实施方案中，可将ES或ESH细胞在二维或三维环境中培养。在美国专利第6,800,480号中，通过培养成纤维细胞，原位裂解成纤维细胞，然后洗涤裂解后剩余的部分来制备细胞外基质。可选地，在美国专利第6,800,480号中，细胞外基质也可从分离的基质组分或选自以下物质的组分的组合来制备：胶原蛋白、胎盘基质、纤连蛋白、层粘连蛋白、分区蛋白(merosin)、生腱蛋白、硫酸肝素、硫酸软骨素、硫酸皮肤素、聚集蛋白聚糖、双糖链蛋白聚糖(biglycan)、凝血酶敏感蛋白、玻连蛋白和核心蛋白聚糖。Similarly, embodiments in which ES or EHS cell cultures or aggregate suspension cultures are grown under defined conditions or culture systems without the use of feeder cells are "cupffer-free". A feeder-free method is also described in U.S. Patent No. 6,800,480. In some embodiments, ES or EHS cells can be cultured in a two-dimensional or three-dimensional environment. In U.S. Patent No. 6,800,480, the extracellular matrix is prepared by culturing fibroblasts, lysing them in situ, and then washing the remaining lysate. Alternatively, in U.S. Patent No. 6,800,480, the extracellular matrix can also be prepared from isolated matrix components or combinations of components selected from: collagen, placental matrix, fibronectin, laminin, merosin, tendinin, heparin sulfate, chondroitin sulfate, dermatan sulfate, agglutinin, biglycan, thrombin-sensitive protein, hydatidin, and core proteoglycan.

在一些实施方案中，培养方法或培养系统不含动物来源的产品。在其它实施方案中，培养方法是无异源物(xeno-free)的。In some implementations, the culture method or system does not contain animal-derived products. In other implementations, the culture method is xeno-free.

本公开考虑将包含本文所述工程化的染色体的ES细胞分化成不同的细胞类型，以用于各种下游应用。可使用多种策略在体外诱导ES细胞分化成多种细胞类型，通常涉及用外源生物化学组合物补充细胞培养基，所述组合物指导重演内源发育细胞信号并指导细胞特异性分化。在Vazin和Freed,Restor Neurol Neurosci(2010)28(4):589-603(其内容通过引用并入本文)中论述了分化ES细胞的策略。This disclosure considers differentiating ES cells containing the engineered chromosomes described herein into various cell types for a range of downstream applications. Multiple strategies can be used to induce ES cells to differentiate into various cell types in vitro, typically involving supplementing the cell culture medium with exogenous biochemical compositions that guide the recapitulation of endogenous developmental cell signals and direct cell-specific differentiation. Strategies for differentiating ES cells are discussed in Vazin and Freed, Restor Neurol Neurosci (2010) 28(4):589-603 (the contents of which are incorporated herein by reference).

例如，可在某些补充生长因子存在的情况下进一步培养ES或EHS细胞群，以获得已经或将发育成不同细胞谱系，或者可被选择性逆转以能够发育成不同细胞谱系的细胞群。术语“补充生长因子”以其最广泛的含义使用，是指有效促进ES细胞生长、维持细胞存活、刺激细胞分化和/或刺激细胞分化逆转的物质。另外，补充生长因子可以是由饲养细胞分泌到其培养基中的物质。这些物质包括但不限于细胞因子、趋化因子、小分子、中和抗体和蛋白质。生长因子也可包括细胞间信号传导多肽，其控制细胞的发育和维持以及组织的形式和功能。在优选实施方案中，补充生长因子选自由以下组成的组：钢细胞因子(SCF)、制瘤素M(OSM)、睫状神经营养因子(CNTF)、与可溶性白细胞介素-6受体(IL-6R)组合的白细胞介素-6(IL-6)、成纤维细胞生长因子(FGF)、骨形态发生蛋白(BMP)、肿瘤坏死因子(TNF)和粒细胞巨噬细胞集落刺激因子(GM-CSF)。For example, ES or EHS cell populations can be further cultured in the presence of certain supplemental growth factors to obtain cell populations that have already developed or will develop into different cell lineages, or that can be selectively reversed to develop into different cell lineages. The term "supplemental growth factor," used in its broadest sense, refers to substances that effectively promote ES cell growth, maintain cell survival, stimulate cell differentiation, and/or stimulate the reversal of cell differentiation. Additionally, supplemental growth factors can be substances secreted by feeder cells into their culture medium. These substances include, but are not limited to, cytokines, chemokines, small molecules, neutralizing antibodies, and proteins. Growth factors may also include intercellular signaling peptides that control cell development and maintenance, as well as tissue form and function. In a preferred embodiment, the supplemented growth factor is selected from the group consisting of: steel cytokines (SCF), oncogene M (OSM), ciliary neurotrophic factor (CNTF), interleukin-6 (IL-6) in combination with soluble interleukin-6 receptor (IL-6R), fibroblast growth factor (FGF), bone morphogenetic protein (BMP), tumor necrosis factor (TNF), and granulocyte-macrophage colony-stimulating factor (GM-CSF).

干细胞向各种多能细胞和/或分化细胞的进展可通过确定特定细胞类型的特征性基因或基因标记相较于第二基因或对照基因(例如，管家基因)的表达的相对表达来监测。在一些过程中，通过检测标记的存在或不存在来确定某些标记的表达。可选地，某些标记的表达可通过测量标记在细胞培养物或细胞群的细胞中存在的水平来确定。在此类过程中，标记表达的测量可以是定性的或定量的。定量由标记基因产生的标记的表达的一种方法是通过使用定量PCR(Q-PCR)。进行Q-PCR的方法是本领域公知的。本领域已知的其它方法也可用于定量标记基因表达。例如，标记基因产物的表达可通过使用对目标标记基因产物特异的抗体来检测。The progression of stem cells to various pluripotent cells and/or differentiated cells can be monitored by determining the relative expression of a characteristic gene or gene marker of a specific cell type compared to the expression of a secondary gene or control gene (e.g., a housekeeping gene). In some processes, the expression of certain markers is determined by detecting the presence or absence of the marker. Alternatively, the expression of certain markers can be determined by measuring the level of the marker present in the cells of a cell culture or cell population. In such processes, the measurement of marker expression can be qualitative or quantitative. One method for quantifying the expression of a marker produced by a marker gene is by using quantitative PCR (Q-PCR). Methods for performing Q-PCR are well known in the art. Other methods known in the art can also be used to quantify marker gene expression. For example, the expression of a marker gene product can be detected using an antibody specific to the target marker gene product.

转基因动物Genetically modified animals

本公开提供了包含本公开的工程化的染色体的转基因动物(例如转基因小鼠)及其制备方法。This disclosure provides transgenic animals (e.g., transgenic mice) containing engineered chromosomes of this disclosure and methods for their preparation.

从包含本文所述的工程化的染色体的ES细胞或受精卵细胞制备转基因动物的合适方法的选择将取决于动物，并且是本领域技术人员已知的。The choice of a suitable method for preparing transgenic animals from ES cells or fertilized egg cells containing the engineered chromosomes described herein will depend on the animal and is known to those skilled in the art.

在示例性方法中，将包含工程化的染色体的ES细胞整合到胚泡发育阶段的胚胎中，然后将其植入怀孕或假孕的雌性中并足月分娩。结果是嵌合体动物。如果ES细胞产生生殖细胞，则动物的后代将是完全转基因的，并携带工程化的染色体。In an exemplary method, an ES cell containing engineered chromosomes is integrated into an embryo at the blastocyst stage, which is then implanted into a pregnant or pseudopregnant female and delivered at full term. The result is a chimeric animal. If the ES cell produces germ cells, the animal's offspring will be fully transgenic and carry the engineered chromosomes.

在一些实施方案中，转基因动物是小鼠、大鼠、兔、豚鼠、仓鼠、绵羊、山羊、驴、牛、马、骆驼、鸡或猴。In some implementation schemes, the genetically modified animal is a mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cow, horse, camel, chicken, or monkey.

在一些实施方案中，转基因动物是小鼠。在一些实施方案中，产生转基因小鼠包括将ES细胞注射到二倍体胚泡中，从所述ES细胞核转移到去核小鼠胚胎，或四倍体胚胎互补。In some embodiments, the transgenic animal is a mouse. In some embodiments, producing a transgenic mouse includes injecting ES cells into a diploid blastocyst, transferring the nucleus of the ES cells to an enucleated mouse embryo, or a tetraploid embryo for complementation.

在一些实施方案中，所述方法还包括将ES细胞或受精卵转移至假孕雌性体内。在小鼠中，通过将处于自然发情期的6-8周龄雌性小鼠与输精管切除的雄性小鼠交配，为假孕雌性小鼠做好准备。可从培养物中取出当天处理转移到假孕雌性的受精卵，并置于预温热的合适培养基(诸如M2培养基)中，并且通过输卵管转移至交配后0.5天的假孕雌性(例如9-11周龄)中。In some embodiments, the method further includes transferring ES cells or fertilized eggs into pseudopregnant females. In mice, pseudopregnant female mice are prepared by mating 6-8 week old female mice in natural estrus with vasectomized male mice. Fertilized eggs that have been processed and transferred to pseudopregnant females on the same day can be removed from the culture, placed in a pre-warmed suitable culture medium (such as M2 medium), and transferred via the oviduct to pseudopregnant females (e.g., 9-11 weeks old) 0.5 days after mating.

一旦使用本公开的方法将工程化的染色体插入宿主哺乳动物，就可在所得的转基因动物(例如，小鼠)或其后代中验证工程化的染色体的存在。这种验证通常包括对可能携带工程化的染色体的动物的一次或多次基因分型、连接序列的聚合酶链式反应扩增、某些DNA片段(例如，模板序列)的直接测序和遗传作图。此类技术在本领域是公知的。Once an engineered chromosome is inserted into a host mammal using the methods of this disclosure, the presence of the engineered chromosome can be verified in the resulting transgenic animal (e.g., a mouse) or its offspring. Such verification typically involves one or more genotyping operations on animals that may carry the engineered chromosome, polymerase chain reaction amplification of the linker sequence, direct sequencing of certain DNA fragments (e.g., a template sequence), and genetic mapping. Such techniques are well known in the art.

本公开提供了包含本公开的工程化的染色体的转基因小鼠。在一些实施方案中，转基因小鼠包含一种或多种已被人源化的基因，例如表1和表2中描述的基因中的任一种。在一些实施方案中，动物模型包含不止一种人源化基因(例如1个、2个、5个、10个、20个、50个、100个或更多个基因)。在一些实施方案中，转基因小鼠包含已被人源化的免疫球蛋白基因的全部或部分。在一些实施方案中，转基因小鼠包含已被人源化的TCR亚单位基因的全部或部分。This disclosure provides transgenic mice comprising engineered chromosomes of the present disclosure. In some embodiments, the transgenic mouse comprises one or more humanized genes, such as any of the genes described in Tables 1 and 2. In some embodiments, the animal model comprises more than one humanized gene (e.g., 1, 2, 5, 10, 20, 50, 100 or more genes). In some embodiments, the transgenic mouse comprises all or part of a humanized immunoglobulin gene. In some embodiments, the transgenic mouse comprises all or part of a humanized TCR subunit gene.

在本公开的转基因小鼠的一些实施方案中，小鼠12号染色体包含替代小鼠Igh可变区的人IGH可变区的序列。在一些实施方案中，小鼠Igh可变区包含V_H、D_H和JH1-6基因区段和间插非编码序列。在一些实施方案中，人IGH可变区包含V_H、D_H和J_H1-6基因区段和间插非编码序列。在一些实施方案中，工程化的染色体是小鼠6号染色体，其包含替代小鼠Igk可变区的人IGK可变区序列。在一些实施方案中，小鼠Igk可变区序列包含编码小鼠V_k和J_k1-5基因区段的序列和间插非编码序列。在一些实施方案中，模板序列包含人IGK可变区序列。在一些实施方案中，人IGK可变区序列包含编码人V_k和J_k1-5基因区段的序列和间插非编码序列。In some embodiments of the transgenic mice disclosed herein, mouse chromosome 12 contains a sequence representing the human IGH variable region replacing the mouse Igh variable region. In some embodiments, the mouse Igh variable region contains _VH , _DH , and JH1-6 gene segments and intercalated non-coding sequences. In some embodiments, the human IGH variable region contains _VH , _DH , and _JH1-6 gene segments and intercalated non-coding sequences. In some embodiments, the engineered chromosome is mouse chromosome 6, which contains a sequence representing the human IGK variable region replacing the mouse Igk variable region. In some embodiments, the mouse Igk variable region sequence contains sequences encoding mouse _Vk and _Jk1-5 gene segments and intercalated non-coding sequences. In some embodiments, the template sequence contains a human IGK variable region sequence. In some embodiments, the human IGK variable region sequence contains sequences encoding human _Vk and _Jk1-5 gene segments and intercalated non-coding sequences.

应用application

包含本文所述的工程化的染色体的细胞和转基因动物的下游应用被认为在本公开的范围内。Downstream applications of cells containing the engineered chromosomes described herein and of transgenic animals are considered to be within the scope of this disclosure.

示例性下游应用包括使用针对一种或多种人基因人源化的动物模型(例如，小鼠、大鼠或猴)对人疾病和病症的动物模型进行基础和应用研究。表1和表2中描述了示例性但非限制性的基因，可通过用人同源物替换模型动物同源物对所述基因进行人源化。与染色体异常(易位、倒位等)相关的人疾病的动物模型也可使用本文所述的方法来制备。任何需要对大于300kB的片段进行大规模染色体重排的动物模型，例如杜氏肉营养不良症(DMD)人源化小鼠疾病模型，或者需要大规模插入或替换多达数百个基因的阵列的动物模型都被认为在本公开的范围内。Exemplary downstream applications include basic and applied research on animal models of human diseases and conditions using animal models (e.g., mice, rats, or monkeys) that have been humanized for one or more human genes. Exemplary but non-limiting genes are described in Tables 1 and 2, which can be humanized by replacing the model animal homolog with a human homolog. Animal models of human diseases associated with chromosomal abnormalities (translocations, inversions, etc.) can also be prepared using the methods described herein. Any animal model requiring large-scale chromosomal rearrangements of fragments larger than 300 kB, such as a humanized mouse model of Duchenne malnutrition (DMD), or an animal model requiring large-scale insertions or replacements of arrays of up to hundreds of genes, is considered to be within the scope of this disclosure.

在一些实施方案(例如其中动物的Igh可变区已被人源化的那些实施方案)中，本公开的转基因动物可用于产生人源化抗体。例如，此类动物可产生具有人抗体或人源化抗体的特定B细胞。在一些实施方案(例如其中动物的Igk或Igl可变区已被人源化的那些实施方案)中，本公开的转基因动物可用于产生人源化抗体。In some embodiments (e.g., those in which the Igh variable region of the animal has been humanized), the transgenic animals of this disclosure can be used to produce humanized antibodies. For example, such animals can produce specific B cells with human antibodies or humanized antibodies. In some embodiments (e.g., those in which the Igk or Igl variable region of the animal has been humanized), the transgenic animals of this disclosure can be used to produce humanized antibodies.

在一些实施方案(例如其中包含抗体或其抗原片段的模板序列已被插入靶染色体的那些实施方案)中，本公开的转基因动物可用于产生抗体或抗原结合片段。例如，转基因动物可用于产生单链可变片段(scFv)、纳米抗体、双特异性抗体和多特异性抗体等。此类抗体可用于研究或治疗目的。In some embodiments (e.g., those in which a template sequence containing an antibody or an antigen fragment thereof has been inserted into a target chromosome), the transgenic animals of this disclosure can be used to produce antibodies or antigen-binding fragments. For example, the transgenic animals can be used to produce single-chain variable fragments (scFv), nanobodies, bispecific antibodies, and multispecific antibodies. Such antibodies can be used for research or therapeutic purposes.

示例性下游应用包括其中工程化的染色体未被整合到转基因动物中的应用。相反，作为一个实例，包含工程化的染色体的ES细胞分化成另一种细胞类型，其可用于研究或治疗目的。Exemplary downstream applications include those in which the engineered chromosome is not integrated into the transgenic animal. Conversely, as an example, ES cells containing engineered chromosomes differentiate into another cell type that can be used for research or therapeutic purposes.

试剂盒Reagent test kit

本公开提供了包含本文所述核酸分子的试剂盒。在一些实施方案中，核酸分子是载体，诸如质粒。This disclosure provides kits containing the nucleic acid molecules described herein. In some embodiments, the nucleic acid molecules are vectors, such as plasmids.

在本公开的试剂盒的一些实施方案中，试剂盒包括用于本文所述方法的细胞，例如已被冷冻保存的EHS细胞。在一些实施方案中，试剂盒包括核酸分子和任选的细胞的使用说明。In some embodiments of the kits disclosed herein, the kit includes cells for use with the methods described herein, such as cryopreserved EHS cells. In some embodiments, the kit includes instructions for use of nucleic acid molecules and, optionally, the cells.

实施例Example

实施例1:胚胎杂交干(EHS)细胞的建立Example 1: Establishment of Embryonic Hybrid Stem (EHS) Cells

这项研究的总体目标是获得针对Igh和Igk基因的可变结构域人源化的小鼠。人和小鼠在抗体基因的排列和表达方面表现出高度的相似性，并且重链的基因组组织在人和小鼠中也相似。因此，小鼠Igh或Igk基因可变结构域的人源化形式可通过将含有所有V_H、D_H和J_H基因区段的约3MB小鼠基因组序列替换为含有等同人基因片段的大约1MB的连续人基因组序列来获得(图1)。The overall goal of this study was to obtain humanized mice with variable domains targeting the Igh and Igk genes. Humans and mice exhibit a high degree of similarity in antibody gene arrangement and expression, and the genomic organization of heavy chains is also similar in both. Therefore, humanized forms of the variable domains of the mouse Igh or Igk genes could be obtained by replacing approximately 3 MB of mouse genome sequence containing all _VH , _DH , and _JH gene segments with approximately 1 MB of continuous human genome sequence containing equivalent human gene segments (Figure 1).

产生人源化小鼠Igh基因的第一步是通过将小鼠胚胎干(ES)细胞与人ES细胞融合以产生具有小鼠和人Igh基因的细胞，来产生小鼠胚胎杂交干(EHS)细胞。The first step in generating humanized mouse Igh genes is to produce mouse embryonic hybrid stem (EHS) cells by fusing mouse embryonic stem (ES) cells with human ES cells to generate cells with both mouse and human Igh genes.

根据电融合仪器制造商提供的标准方法，通过电融合将在PGK启动子控制下表达新霉素抗性基因的工程化的小鼠细胞和与在CAG启动子控制下表达mCherry标记的工程化的人ES细胞融合。将杂交EHS细胞在含有G418的小鼠ES细胞培养基中培养7天，根据mCherry的表达水平通过荧光激活细胞分选术(FACS)分选存活的细胞(图2)。将阳性细胞在含有G418的小鼠ES细胞培养基中连续培养，并将单细胞克隆分离到单独的孔中用于生长。接下来，提取每个单细胞克隆的基因组DNA用于基因分型。具体而言，将人免疫球蛋白重链(IGH)的V、D、J区的三对引物(图3A)用于进行PCR以确认EHS克隆中存在靶向序列(图3B)。只有具有所有三个所需区域的克隆被保留用于进一步的实验。Following standard procedures provided by the electrofusion instrument manufacturer, engineered mouse cells expressing a neomycin resistance gene under the control of the PGK promoter were fused with engineered human ES cells expressing an mCherry marker under the control of the CAG promoter. The hybrid EHS cells were cultured for 7 days in mouse ES cell medium containing G418, and surviving cells were sorted using fluorescence activated cell sorting (FACS) based on mCherry expression levels (Figure 2). Positive cells were serially cultured in mouse ES cell medium containing G418, and single-cell clones were isolated into individual wells for growth. Genomic DNA was then extracted from each single-cell clone for genotyping. Specifically, three pairs of primers (Figure 3A) for the V, D, and J regions of the human immunoglobulin heavy chain (IGH) were used for PCR to confirm the presence of the target sequence in the EHS clones (Figure 3B). Only clones containing all three desired regions were retained for further experiments.

实施例2:工程化人源化染色体Example 2: Engineered humanized chromosomes

2.1.通过HDR介导的染色体重排(HMCR)建立EHC2.1. Establishment of EHC via HDR-mediated chromosome rearrangement (HMCR)

为了获得针对其Igh基因的可变结构域人源化的小鼠胚胎杂交干(EHS)细胞，通过HDR介导的染色体重排(HMCR；图4A)用人4号染色体上的人IGH基因的约1MB可变结构域替换小鼠12号染色体上的Igh基因的约3MB可变结构域。To obtain mouse embryonic hybrid stem (EHS) cells with humanized variable domains of the Igh gene, approximately 3 MB of the variable domain of the mouse Igh gene on chromosome 12 was replaced with approximately 1 MB of the variable domain of the human IGH gene on human chromosome 4 via HDR-mediated chromosome rearrangement (HMCR; Figure 4A).

两种质粒被设计成介导HMCR过程，并示于图4A中。5’HMCR质粒被设计来介导用其人对应物替换小鼠Igh基因的5’末端，而3’HMCR质粒介导用其人对应物替换小鼠Igh基因的3’末端。5’HMCR质粒包含与小鼠Igh基因5’末端同源的5’臂、与人IGH基因5’同源的3’臂和插入两个同源臂之间的CMV-EGFP-polyA-PGK-嘌呤霉素-poly的盒。类似地，3’HMCR质粒包含与人IGH可变基因座的3’同源的5’臂、与小鼠Igh可变基因座的3’同源的3’臂和插入在两个同源臂之间的PGK-潮霉素-polyA盒(见图4A)。同源臂的长度介于600bp与1000bp之间。同时，还设计了四种质粒，所述质粒含有Cas9和靶向小鼠和人中的Igh可变结构域的5’和3’末端的sgRNA(见图4A，表7中提供了sgRNA靶向序列)。使用标准方法将这六种质粒作为环状质粒共转染到实施例1中获得的EHS细胞中，并将所得细胞在含有嘌呤霉素和潮霉素的小鼠ES细胞培养基中培养7天。挑选存活的GFP阳性单一克隆用于进一步培养。Two plasmids were designed to mediate the HMCR process and are shown in Figure 4A. The 5’ HMCR plasmid was designed to mediate the replacement of the 5’ end of the mouse Igh gene with its human counterpart, while the 3’ HMCR plasmid mediates the replacement of the 3’ end of the mouse Igh gene with its human counterpart. The 5’ HMCR plasmid contains a 5’ arm homologous to the 5’ end of the mouse Igh gene, a 3’ arm homologous to the 5’ end of the human IGH gene, and a cassette of CMV-EGFP-polyA-PGK-purinemycin-poly inserted between the two homologous arms. Similarly, the 3’ HMCR plasmid contains a 5’ arm homologous to the 3’ end of the human IGH variable locus, a 3’ arm homologous to the 3’ end of the mouse Igh variable locus, and a cassette of PGK-hygromycin-polyA inserted between the two homologous arms (see Figure 4A). The length of the homologous arms is between 600 bp and 1000 bp. Simultaneously, four plasmids containing Cas9 and sgRNAs targeting the 5' and 3' ends of the Igh variable domain in mice and humans were designed (see Figure 4A; sgRNA targeting sequences are provided in Table 7). These six plasmids were co-transfected as circular plasmids into the EHS cells obtained in Example 1 using standard methods, and the resulting cells were cultured for 7 days in mouse ES cell culture medium containing puromycin and hygromycin. Surviving GFP-positive single clones were selected for further culture.

进行基因分型以鉴定具有成功HMCR的所需单一克隆。为了进行基因分型，如图5A所示，设计了四对PCR引物。对于第一对引物，正向引物设计在小鼠Igh5’HMCR质粒的5’同源臂的上游，反向引物位于CMV启动子区域内(图5A)。对于第二对引物，正向引物在5’HMCR质粒的嘌呤霉素基因内，反向引物在人IGH的5’同源臂的下游，在人IGH序列内(图5A)。对于第三对引物，正向引物位于人IGH可变区3’的同源臂的上游，反向引物位于3’HMCR质粒的PGK启动子区中(图5A)。对于最后一对引物，正向引物位于3’HMCR质粒的潮霉素基因中，反向引物位于3’HMCR质粒的3’同源序列的下游，在小鼠Igh可变结构域内(图5A)。用每种引物对对每个克隆进行PCR扩增，并且只有对所有四个基因分型测试都显示阳性PCR产物的克隆被保留用于进一步的实验。在该步骤的196个分离的克隆中，6个被鉴定为对于所有4个PCR扩增子呈阳性(图5B)。Genotyping was performed to identify the desired single clone with successful HMCR. For genotyping, four pairs of PCR primers were designed, as shown in Figure 5A. For the first primer pair, the forward primer was designed upstream of the 5' homologous arm of the mouse Igh 5' HMCR plasmid, and the reverse primer was located within the CMV promoter region (Figure 5A). For the second primer pair, the forward primer was located within the puromycin gene of the 5' HMCR plasmid, and the reverse primer was located downstream of the 5' homologous arm of the human IGH, within the human IGH sequence (Figure 5A). For the third primer pair, the forward primer was located upstream of the homologous arm of the 3' homologous region of the human IGH variable region, and the reverse primer was located within the PGK promoter region of the 3' HMCR plasmid (Figure 5A). For the last primer pair, the forward primer was located within the hygromycin gene of the 3' HMCR plasmid, and the reverse primer was located downstream of the 3' homologous sequence of the 3' HMCR plasmid, within the mouse Igh variable domain (Figure 5A). Each clone was amplified by PCR using each primer pair, and only clones that showed positive PCR products for all four genotyping tests were retained for further experiments. Of the 196 clones isolated in this step, 6 were identified as positive for all four PCR amplicons (Figure 5B).

为了促进人IGH基因在具有成功HMCR的EHS细胞中的表达，通过同源定向修复(HDR)(图4A)将3’选择标记从阳性克隆的基因组中删除，尽管也可以使用非同源末端连接(NHEJ)、微同源性介导的末端连接(MMEJ)和同源介导的末端连接(HMEJ)方法。上述方法成功地建立了工程化的人源化染色体(EHC)，所述染色体在EHS细胞中通过HMCR用等价的人区域替换了小鼠12号染色体上的小鼠Igh基因的包含V_H、D_H和J_H1-6基因区段的可变结构域。To promote expression of the human IGH gene in EHS cells with successful HMCR, the 3' selection marker was deleted from the genome of positive clones via homology-directed repair (HDR) (Figure 4A), although non-homologous end joining (NHEJ), microhomology-mediated end joining (MMEJ), and homology-mediated end joining (HMEJ) methods can also be used. The above methods successfully created an engineered humanized chromosome (EHC) in EHS cells, in which the variable domain of the mouse Igh gene on mouse chromosome 12, containing the _VH , _DH , and _JH1-6 gene segments, was replaced with an equivalent human region by HMCR.

下面的表5和表6提供了用于介导HMCR过程的质粒序列。Tables 5 and 6 below provide plasmid sequences used to mediate the HMCR process.

表5.用于利用相应的人区域对小鼠Igh可变区进行HMCR介导的替换的示例性5’质粒序列Table 5. Exemplary 5' plasmid sequences for HMCR-mediated substitution of mouse Igh variable regions using corresponding human regions.

表6.用于利用相应的人区域对小鼠Igh可变区进行HMCR介导的替换的示例性3’质粒序列Table 6. Exemplary 3' plasmid sequences for HMCR-mediated substitution of mouse Igh variable regions using corresponding human regions.

表7.sgRNA序列Table 7. sgRNA Sequences

在表7中，提供了具有位于sgRNA靶向序列的非靶链3’的PAM序列(NGG)的sgRNA序列。不具有PAM的相应sgRNA靶向序列以SEQ ID NOS:14-17提供。Table 7 provides sgRNA sequences having a PAM (NGG) sequence located at the 3' of the non-target strand of the sgRNA targeting sequence. Corresponding sgRNA targeting sequences without PAM are provided as SEQ ID NOS:14-17.

2.2.通过CRE-Loxp介导的染色体重排(CMCR)建立EHC2.2. Establishment of EHC via CRE-Loxp-mediated Chromosomal Rearrangement (CMCR)

为了获得针对它们的Igh基因的可变结构域人源化的小鼠EHS细胞，通过CRE-Loxp介导的染色体重排(CMCR；图4B)用人14号染色体上的IGH基因的约1Mb可变结构域替换小鼠12号染色体上的Igh基因的约3MB可变结构域。设计了四种质粒来介导CMCR过程。小鼠Igh5’(pCMV-GFP-BGH PolyA-Loxp)和3’(BGH polyA-Loxp-511-潮霉素-BGH polyA-PGK-BSD-BGHPolyA)质粒被设计成分别插入小鼠Igh可变基因座的5’和3’末端。同时，人IGH 5’(BGHpolyA-Loxp-Puro-BGH PolyA-PGK-新霉素-BGH PolyA)和3’(pCMV-BGP-BGH PolyA-PGK-Loxp-511)质粒被设计成分别插入人IGH可变基因座的5’和3’末端(图5)。将转染后的EHS细胞在含有BSD和新霉素的小鼠ES细胞培养基中培养7天。挑选存活的GFP-和BFP-双阳性细胞用于进一步培养。进行基因分型以鉴定成功整合上述质粒的所需单个克隆。将Cre转染到成功整合的EHS细胞中以用于CMCR，并且成功重排的细胞可在含有嘌呤霉素和潮霉素的培养基中存活。然后将存活的细胞放在袋中进行基因分型。为了促进人IGH基因在具有成功的CMCR的EHS细胞中的表达，接着从基因组中删除3’选择标记(图5)。按照上述过程，通过在EHS细胞中进行CMCR，成功地建立了工程化的人源化染色体(EHC；针对它们的可变结构域对小鼠12号染色体的Igh基因进行了人源化)。To obtain humanized mouse EHS cells with variable domains targeting their Igh genes, approximately 3 MB of variable domains of the mouse Igh gene on chromosome 12 were replaced with approximately 1 Mb of variable domains of the Igh gene on human chromosome 14 via CRE-Loxp-mediated chromosome rearrangement (CMCR; Figure 4B). Four plasmids were designed to mediate the CMCR process. The mouse Igh 5’ (pCMV-GFP-BGH PolyA-Loxp) and 3’ (BGH polyA-Loxp-511-hygromycin-BGH polyA-PGK-BSD-BGHPolyA) plasmids were designed to be inserted into the 5’ and 3’ ends of the mouse Igh variable locus, respectively. Simultaneously, human IGH 5’ (BGHpolyA-Loxp-Puro-BGH PolyA-PGK-neomycin-BGH PolyA) and 3’ (pCMV-BGP-BGH PolyA-PGK-Loxp-511) plasmids were designed to be inserted into the 5’ and 3’ ends of the human IGH variable locus, respectively (Figure 5). Transfected EHS cells were cultured for 7 days in mouse ES cell culture medium containing BSD and neomycin. Surviving GFP- and BFP- double-positive cells were selected for further culture. Genotyping was performed to identify the desired single clones that successfully integrated the aforementioned plasmids. Cre was transfected into successfully integrated EHS cells for CMCR, and successfully rearranged cells survived in culture medium containing puromycin and hygromycin. The surviving cells were then placed in bags for genotyping. To promote expression of the human IGH gene in EHS cells with successful CMCR, the 3’ selection marker was then deleted from the genome (Figure 5). Following the above process, engineered humanized chromosomes (EHCs) were successfully established by performing CMCR in EHS cells; the Igh gene on mouse chromosome 12 was humanized for their variable domains.

实施例3:通过微细胞介导的染色体转移在小鼠胚胎干细胞中进行染色体置换Example 3: Chromosomal replacement in mouse embryonic stem cells via microcell-mediated chromosome transfer

如实施例1和2所述获得了具有工程化的人源化染色体(EHC)的EHS细胞，然后通过微细胞介导的染色体转移(MMCT)将EHC转移至小鼠ES细胞，以建立针对Igh基因可变结构域人源化的小鼠ES细胞。EHS cells with engineered humanized chromosomes (EHC) were obtained as described in Examples 1 and 2, and then the EHC was transferred to mouse ES cells via microcell-mediated chromosome transfer (MMCT) to establish mouse ES cells with humanized Igh gene variable domains.

将携带EHC的EHS细胞在37℃下用0.2μg/ml秋水仙胺处理48小时。延长的有丝分裂停滞诱导微细胞的形成，通过离心收集所述微细胞(图6)。同时，获得在12号染色体上表达mCherry荧光标记的小鼠ES细胞(图6)。是通过将CMV-mCherry-polyA的盒插入小鼠12号染色体的一个拷贝中获得了这些细胞。EHS cells carrying EHC were treated with 0.2 μg/ml colchicine at 37°C for 48 hours. Prolonged mitotic arrest induced microcell formation, which was collected by centrifugation (Fig. 6). Simultaneously, mouse ES cells expressing the mCherry fluorescent label on chromosome 12 were obtained (Fig. 6). These cells were obtained by inserting a CMV-mCherry-polyA cassette into a copy of mouse chromosome 12.

接下来，通过电融合将微细胞与小鼠ES细胞杂交，并且通过FACS使用GFP+和mCherry+标记对所得细胞进行分选，以获得为GFP+和mCherry+的小鼠ES细胞。GFP+表明EHC被成功转移到小鼠ES细胞中，而mCherry+标记表明细胞也携带mCherry+12号染色体。将阳性细胞在小鼠ES细胞培养基中连续培养2周，通过FACS分选mCherry-和GFP+小鼠ES细胞(即丢失了标记有mCherry+的额外12号染色体的细胞)，并培养7天。将单个克隆分离到单独的孔中用于生长和核型分析，保留具有正确核型的克隆。结果是针对其Igh基因的可变区而人源化的小鼠ES细胞。Next, microcells were hybridized to mouse ES cells via electrofusion, and the resulting cells were sorted by FACS using GFP+ and mCherry+ labels to obtain GFP+ and mCherry+ mouse ES cells. GFP+ indicates that EHC was successfully transferred to mouse ES cells, while mCherry+ labeling indicates that the cells also carry mCherry+ chromosome 12. Positive cells were cultured in mouse ES cell culture medium for 2 weeks, and mCherry- and GFP+ mouse ES cells (i.e., cells that have lost the additional chromosome 12 labeled mCherry+) were sorted by FACS and cultured for 7 days. Individual clones were isolated into separate wells for growth and karyotype analysis, retaining clones with the correct karyotype. The result is humanized mouse ES cells targeting the variable region of their Igh gene.

实施例4:产生Igh人源化小鼠Example 4: Generation of Igh humanized mice

根据标准程序，将实施例3中获得的针对其Igh基因可变区人源化的小鼠ES细胞注射入B6D2F1(C57BL/6X DBA2)小鼠品系的胚泡中。或者，核移植或四倍体胚胎互补也可用于产生人源化小鼠。According to standard procedures, humanized mouse ES cells with the variable region of the Igh gene obtained in Example 3 were injected into the blastocysts of the B6D2F1 (C57BL/6X DBA2) mouse strain. Alternatively, nuclear transfer or tetraploid embryo complementation can also be used to generate humanized mice.

在交配后2.5天(dpc)，将注射的胚泡转移到假孕ICR雌性的子宫中。通过荧光立体显微镜下GFP的表达水平鉴定Igh人源化小鼠，并进一步分析GFP+小鼠。Two and a half days post-mating (dpc), the injected blastocysts were transferred to the uterus of pseudopregnant ICR females. Igh humanized mice were identified by GFP expression levels under a fluorescence stereomicroscope, and GFP+ mice were further analyzed.

接下来，设计了一系列PCR实验来验证Igh人源化小鼠。第一组PCR实验被设计成验证人IGH可变区的完整性。设计了针对人IGH可变区的不同区域的五对引物(见图7A，箭头表示PCR引物1-10)。Igh人源化小鼠显示所有五个PCR引物对的阳性PCR产物(图7B)。我们还设计了人IGH可变区上游和下游的引物(图7A)，对于我们的Igh人源化小鼠的任一个PCR实验都未观察到产物，而HEK293T显示PCR产物的正确条带(图7B)。Next, a series of PCR experiments were designed to validate the Igh humanized mice. The first set of PCR experiments was designed to verify the integrity of the human IGH variable region. Five primer pairs targeting different regions of the human IGH variable region were designed (see Figure 7A, arrows indicate PCR primers 1-10). The Igh humanized mice showed positive PCR products for all five PCR primer pairs (Figure 7B). We also designed primers upstream and downstream of the human IGH variable region (Figure 7A), but no products were observed for any of our Igh humanized mice in any of the PCR experiments, while HEK293T showed the correct band of the PCR product (Figure 7B).

从Igh人源化小鼠的尾部分离成纤维细胞，并将其用于进行荧光原位杂交(FISH)。FISH结果显示Igh人源化小鼠的12号染色体含有人14号染色体的片段(图8A)，表明人IGH基因的可变结构域成功地原位插入小鼠的12号染色体。Fibroblasts were isolated from the tail of Igh humanized mice and used for fluorescence in situ hybridization (FISH). The FISH results showed that chromosome 12 of the Igh humanized mice contained a segment of human chromosome 14 (Figure 8A), indicating that the variable domain of the human IGH gene was successfully inserted in situ into mouse chromosome 12.

还进行了G-显带核型分析，以排除任何异常染色体(图8B)。G-banding karyotype analysis was also performed to rule out any abnormal chromosomes (Figure 8B).

还提取了Igh人源化小鼠的基因组DNA，并对其进行全基因组测序(WGS)分析。将WGS序列映射到包含所有小鼠染色体和人14号染色体的参考基因组上。人IGH基因的所有可变结构域(V_H、D_H和J_H基因区段)都被全基因组序列读数覆盖。另外，在其它基因组区域中未发现脱靶编辑(图9A-图9B)。Genomic DNA was also extracted from Igh humanized mice and subjected to whole-genome sequencing (WGS) analysis. The WGS sequences were mapped onto a reference genome containing all mouse chromosomes and human chromosome 14. All variable domains of the human IGH gene ( _VH , _DH , and _JH gene segments) were covered by the whole-genome sequence reads. Furthermore, no off-target editing was found in other genomic regions (Figures 9A-9B).

实施例5:生产Igk人源化小鼠Example 5: Production of IgK humanized mice

应用MASIRT获得针对其Igk基因可变结构域人源化的小鼠(图10)。使用与上文针对Igh基因所述的方法相似的方法，我们也获得了Igk人源化小鼠。为了验证Igk人源化小鼠，我们首先进行PCR实验来验证人IGK可变区的完整性。在人IGK可变区的不同位基因座上设计了五对引物(图11A)，获得的Igk人源化小鼠在所有五次实验中都显示出阳性PCR产物(图11B)。还设计了人IGK可变区上游和下游的引物(图11A)，对于获得的Igk人源化小鼠的任一PCR实验都没有观察到产物，而HEK293T显示PCR产物的正确条带(图11B)。最后，还提取了Igk人源化小鼠的基因组DNA并进行全基因组测序(WGS)分析。Humanized mice targeting the variable domain of the Igk gene were obtained using MASIRT (Figure 10). Using a similar method to that described above for the Igh gene, we also obtained Igk humanized mice. To validate the Igk humanized mice, we first performed PCR experiments to verify the integrity of the human IGK variable region. Five primer pairs were designed at different loci in the human IGK variable region (Figure 11A), and the obtained Igk humanized mice showed positive PCR products in all five experiments (Figure 11B). Primers upstream and downstream of the human IGK variable region were also designed (Figure 11A), but no products were observed in any of the PCR experiments for the obtained Igk humanized mice, while HEK293T showed the correct PCR band (Figure 11B). Finally, genomic DNA was extracted from the Igk humanized mice and whole-genome sequencing (WGS) analysis was performed.

表8.用于利用相应的人区域对小鼠Igk可变区进行HMCR介导的替换的示例性5’质粒序列Table 8. Exemplary 5' plasmid sequences for HMCR-mediated substitution of mouse Igk variable regions using corresponding human regions.

表9.用于利用相应的人区域对小鼠Igk可变区进行HMCR介导的替换的示例性3’质粒序列Table 9. Exemplary 3' plasmid sequences for HMCR-mediated substitution of mouse Igk variable regions using corresponding human regions.

表10.用于利用相应的人区域替换小鼠Igk可变区的sgRNA序列Table 10. sgRNA sequences used to replace the mouse Igk variable region with the corresponding human region

sgRNAsgRNA 序列sequence SEQ ID NOSEQ ID NO 具有PAM的小鼠igk 5’Mice with PAM igk 5’ agtctctgctgcctacagcaNGGagtctctgctgcctacagcaNGG 24twenty four 具有PAM的小鼠igk 3’Mice with PAM igk 3’ agtccttgacagacagctcaNGGagtccttgacagacagctcaNGG 2525 具有PAM的人IGK 5'People with PAM IGK 5' gcctatgatattacccagccNGGgcctatgatattacccagccNGG 2626 具有PAM的人IGK 3'People with PAM IGK 3' acccatgacctggccactgaNGGacccatgacctggccactgaNGG 2727

在表10中，提供了具有位于sgRNA靶向序列的非靶链3’上的PAM序列(NGG)的sgRNA序列。不具有PAM的相应sgRNA靶向序列以SEQ ID NOS:28-31提供。Table 10 provides sgRNA sequences having a PAM (NGG) sequence located on the 3' of the non-target strand of the sgRNA targeting sequence. Corresponding sgRNA targeting sequences without PAM are provided as SEQ ID NOS:28-31.

绘制了包含小鼠的所有染色体和人的2号染色体的参考基因组的全基因组序列。这表明人IGK基因的所有可变结构域(V_H和J_H基因区段)都被全基因组序列覆盖。此外，在其它基因组区域没有发现脱靶编辑(图12)。Whole genome sequences containing all mouse chromosomes and a reference genome including human chromosome 2 were plotted. This shows that all variable domains of the human IGK gene ( _VH and _JH gene segments) are covered by the whole genome sequence. Furthermore, no off-target editing was found in other genomic regions (Figure 12).

Claims

1. A method for producing engineered chromosomes, comprising:

a. Provide cells containing a target chromosome with the target sequence and a template chromosome with the template sequence;

b. Contact the cells with the following:

i. A first nucleic acid molecule comprising, from 5' to 3', a 5' homologous arm, at least one first marker, and a 3' homologous arm, wherein the 5' homologous arm contains a nucleotide sequence upstream of the 5' end of the target sequence, and the 3' homologous arm contains a nucleotide sequence upstream of the 5' end of the template sequence; and

ii. A second nucleic acid molecule comprising, from 5' to 3', a 5' homologous arm, at least one second marker, and a 3' homologous arm, wherein the 5' homologous arm contains a nucleotide sequence downstream of the 3' end of the template sequence, and the 3' homologous arm contains a nucleotide sequence downstream of the 3' end of the target sequence;

c. Double-strand breaks are generated at or on either side of the target sequence, and at the 5' and 3' ends of the template sequence, thereby inserting the template sequence and the first and second markers into the target chromosome; and

d. Select one or more cells that express the first and second markers.

2. The method of claim 1, wherein after inserting the template sequence, the first marker is located at the 5' end of the template sequence, and the second marker is located at the 3' end of the template sequence.

3. The method of claim 1 or 2, wherein the lengths of the 5' and 3' homologous arms of the first and second nucleic acid molecules are between about 20 bp and 2,000 bp, between about 50 bp and 1,500 bp, between about 100 bp and 1,400 bp, between about 150 bp and 1,300 bp, between about 200 bp and 1,200 bp, between about 300 bp and 1,100 bp, between about 400 bp and 1,000 bp, or between about 500 bp and 900 bp, or between about 600 bp and 800 bp.

4. The method of claim 1 or 2, wherein the lengths of the 5' and 3' homologous arms of the first and second nucleic acid molecules are between about 400 bp and 1,500 bp, between about 500 and 1,300 bp, or between about 600 and 1,000 bp.

5. The method of claim 1 or 2, wherein the lengths of the 5' and 3' homologous arms of the first and second nucleic acid molecules are between about 600 bp and 1,000 bp.

6. The method of any one of claims 1-5, wherein the template sequence has a length of at least 25 kilobase pairs (KB), at least 50 KB, at least about 100 KB, at least about 200 KB, at least about 400 KB, or at least about 500 KB.

KB, at least 600KB, at least 700KB, at least 800KB, at least 900KB, at least 1 megabase pair (MB), at least 2MB, at least 3MB, at least 4MB, at least 5MB, at least 6MB, at least 7MB, at least 8MB, at least 9MB, at least 10MB, at least 15MB, at least 20MB, at least 25MB, at least 30MB, at least 40MB, at least 50MB, at least 60MB, at least 70MB, at least 80MB, at least 90MB, at least 100MB, at least 120MB, at least 140MB, at least 160MB, at least 180MB, at least 200MB, at least 220MB, or at least 250MB.

7. The method of any one of claims 1-5, wherein the length of the template sequence is between 50KB and 250MB.

Between, between 50KB and 100MB, between 50KB and 50MB, between 50KB and 20MB

Between, between 50KB and 10MB, between 50KB and 5MB, between 50KB and 3MB, between 50KB and 2MB, between 50KB and 1MB, between 100KB and 200MB, between 100KB and 100MB, between 100KB and 50MB, between 100KB and 20MB, between 100KB and 10MB, between 100KB and 5MB, between 100KB and 3MB, between 100KB and 2MB, between 100KB and 1MB, between 100KB and 50KB, between 100KB and 50MB, between 200KB and 200MB

Between MB, between 200KB and 10MB, between 200KB and 5MB, between 200KB and 3MB

Between MB, between 200KB and 2MB, between 200KB and 1MB, between 200KB and 500KB, between 500KB and 100MB, between 500KB and 50MB, between 500KB and 20MB, between 500KB and 10MB, between 500KB and 5MB, between 500KB and 3MB, between 500KB and 2MB, between 500KB and 1MB, between 1MB and 100MB

Between 1MB and 50MB, between 1MB and 20MB, between 1MB and 10MB, between 1MB and 5MB, between 1MB and 3MB, between 1MB and 2MB, between 3MB and 100MB, between 3MB and 50MB, between 3MB and 20MB, between 3MB and 10MB, between 3MB and 5MB, between 5MB and 100MB, between 5MB and 5MB

Between 50MB and 50MB, between 5MB and 20MB, between 5MB and 10MB, between 10MB and 10MB

Between 100MB and 100MB, between 10MB and 50MB, or between 10MB and 20MB.

8. The method of any one of claims 1-5, wherein the length of the template sequence is between 200KB and 50MB.

Between, between 1MB and 20MB, between 1MB and 10MB, between 1MB and 5MB, between 1MB and 3MB, between 3MB and 20MB, between 3MB and 10MB, between 3MB and 7MB, or between 3MB and 5MB.

9. The method of any one of claims 1-8, wherein generating the double-strand break in (c) comprises using

CRISPR/Cas endonuclease and one or more guide nucleic acids (gNA), one or more zinc finger nucleases, one or more transcription activator-like effector nucleases (TALEN), or one or more CRE recombinases are used to induce the double-strand break.

10. The method of claim 9, wherein the CRISPR/Cas endonuclease comprises CasI, CasIB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, Cas10, CasX, CasY, Cas12a(Cpf1), Cas12b, Cas13a, CsyI, Csy2, Csy3, CseI, Cse2, CscI, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, CmrI, Cmr3, Cmr4, Cmr5, Cmr6, CsbI, Csb2,

Csb3, Csx17, CsxI4, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, CsfI, Csf2,

Csf3, Csf4, Cms1, C2c1, C2c2 or C2c3 or their homologs, direct homologs or modified forms.

11. The method of claim 9, wherein the CRISPR/Cas endonuclease comprises Cas9, Cpf1, CasX, CasY, C2c1, C2c3 or their homologs, orthologs or modified forms.

12. The method of claim 9, wherein the CRISPR/Cas endonuclease comprises Cas9.

13. The method of any one of claims 10-12, wherein the gNA comprises a single guide RNA (sgRNA).

14. The method of any one of claims 1-13, wherein the target chromosome comprises, from 5' to 3', a 5' homologous arm sequence of a first nucleic acid molecule, a target sequence, and a 3' homologous arm sequence of a second nucleic acid molecule.

15. The method of any one of claims 1-14, wherein the template chromosome comprises, from 5' to 3', a 3' homologous arm sequence of a first nucleic acid molecule, a template sequence, and a 5' homologous arm sequence of a second nucleic acid molecule.

16. The method of any one of claims 1-15, wherein the target sequence comprises at least 1 gene, at least 2 genes, at least 3 genes, at least 5 genes, at least 10 genes, at least 20 genes, at least 30 genes, at least 40 genes, at least 50 genes, at least 100 genes, or at least 200 genes.

17. The method of any one of claims 1-16, wherein the target sequence comprises one or more genes homologous to one or more genes of the template sequence.

18. The method of any one of claims 1-17, wherein the template sequence comprises a naturally occurring sequence.

19. The method of claim 18, wherein the template sequence comprises one or more modifications to the naturally occurring sequence.

20. The method of claim 18, wherein the template sequence comprises at least 1 gene, at least 2 genes, at least 3 genes, at least 5 genes, at least 10 genes, at least 20 genes, at least 30 genes, at least 40 genes, at least 50 genes, at least 100 genes, or at least 200 genes.

21. The method of any one of claims 1-17, wherein the template sequence comprises an artificial sequence.

22. The method of claim 21, wherein the artificial sequence comprises a sequence encoding one or more antibodies or an antigen-binding fragment thereof.

23. The method of claim 22, wherein the one or more antibodies or antigen-binding fragments thereof comprise scFv, bispecific antibodies, or multispecific antibodies.

24. The method of any one of claims 1-23, wherein the target sequence is deleted by inserting the template sequence.

25. The method of claim 24, wherein:

a. The target chromosome comprises, from 5' to 3', the 5' homologous arm sequence of the first nucleic acid molecule, the first sgRNA target sequence, the target sequence, the second sgRNA target sequence, and the 3' homologous arm sequence of the second nucleic acid molecule; and

b. The template chromosome contains, from 5' to 3', a third sgRNA target sequence, a 3' homologous arm sequence of the first nucleic acid molecule, the template sequence, a 5' homologous arm sequence of the second nucleic acid molecule, and a fourth sgRNA target sequence.

26. The method of claim 25, wherein generating the double-strand break comprises contacting the cell with a CRISPR/Cas endonuclease and the first, second, third, and fourth sgRNAs.

27. The method of claim 26, wherein the first, second, third, and fourth sgRNAs comprise targeting sequences specific to the target sequences of the first, second, third, and fourth sgRNAs.

28. The method of claim 26, wherein contacting the cells with a CRISPR/Cas endonuclease and sgRNA comprises transfecting the cells with one or more nucleic acid molecules encoding the CRISPR/Cas endonuclease and the sgRNA.

29. The method of any one of claims 1-23, wherein inserting the template sequence comprises a sequence in which the target sequence is hardly deleted or is not deleted.

30. The method of claim 29, wherein inserting the template sequence disrupts one or more functions of the target sequence.

31. The method of claim 29 or 30, wherein inserting the template sequence disrupts the gene in the target sequence.

32. The method according to any one of claims 29-31, wherein

a. The target chromosome comprises, from 5' to 3', the 5' homologous arm sequence of the first nucleic acid molecule, the first sgRNA target sequence, and the 3' homologous arm sequence of the second nucleic acid molecule; and

b. The template chromosome contains, from 5' to 3', a second sgRNA target sequence, a 3' homologous arm sequence of the first nucleic acid molecule, a template sequence, a 5' homologous arm sequence of the second nucleic acid molecule, and a third sgRNA target sequence.

33. The method of claim 32, wherein generating the double-strand break comprises contacting the cell with a CRISPR/Cas endonuclease and first, second, and third sgRNAs.

34. The method of claim 33, wherein the first, second, and third sgRNAs comprise target sequences specific to the target sequences of the first, second, and third sgRNAs.

35. The method of claim 34 or 35, wherein contacting the cell with the CRISPR/Cas endonuclease and the sgRNA comprises transfecting the cell with one or more nucleic acid molecules encoding the CRISPR/Cas endonuclease and the sgRNA.

36. The method of any one of claims 1-35, wherein the first or second label comprises a fluorescent protein operatively linked to a promoter capable of expressing the fluorescent protein in the cell.

37. The method of claim 36, wherein the fluorescent protein comprises green fluorescent protein (GFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent protein (CFP), blue fluorescent protein (BFP), dsRed,

mCherry or tdTomato.

38. The method of claim 36, wherein the fluorescent protein comprises GFP.

39. The method of any one of claims 1-38, wherein the first mark further includes a selection mark.

40. The method of any one of claims 1-39, wherein the second mark further includes a selection mark.

41. The method of claim 39 or 40, wherein the selection marker is selected from the group consisting of: dihydrofolate reductase (DHFR), glutamine synthase (GS), puromycin acetyltransferase, blastomycin deaminase, histidine dehydrogenase, hygromycin phosphotransferase (hph), bleomycin resistance gene and aminoglycoside phosphotransferase (neomycin resistance).

42. The method of any one of claims 39-41, wherein the first and second marks are not the same selection marks.

43. The method of any one of claims 1-42, wherein the first label comprises GFP and puromycin acetyltransferase, the GFP being operatively linked to a promoter capable of expressing GFP in the cell, and the second label comprises hygromycin phosphotransferase.

44. The method of any one of claims 1 to 43, further comprising (e) deleting all or part of the first or second mark after step (d).

45. The method of claim 44, wherein deleting the first or second marker comprises inducing deletion with a CRISPR/Cas endonuclease and gNA, the gNA comprising a target sequence specific to the sequence encoding the marker.

46. The method of any one of claims 1-45, wherein the cell comprises a hybrid cell, an embryonic hybrid stem (EHS) cell, or a fertilized egg.

47. The method of claim 46, wherein the EHS cells are generated by fusing ES cells from any two species selected from the group consisting of: mice, rats, rabbits, guinea pigs, hamsters, sheep, goats, donkeys, cattle, horses, camels, chickens, and monkeys.

48. The method of claim 46, wherein the EHS cells are generated by fusing human embryonic stem cells with embryonic stem cells from a non-human species.

49. The method of claim 48, wherein the non-human species is a mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cow, horse, camel, chicken, or monkey.

50. The method of claim 46, wherein the EHS cells are generated by fusing ES cells from any two different species, the species being selected from the group consisting of: mice, rats, rabbits, guinea pigs, hamsters, sheep, goats, donkeys, cattle, horses, camels, chickens, and monkeys.

51. The method of claim 46, wherein generating the hybrid cells comprises:

a. Production of micronucleated human cells; and

b. The micronucleated human cells are fused with cells from a non-human species to produce hybrid cells.

52. The method of claim 51, wherein the micronucleated human cells are generated by exposing human cells to colchicine under conditions sufficient to induce micronucleation and collecting the micronucleated cells by centrifugation.

53. The method of claim 51 or 52, wherein the non-human species is a mouse, rat, rabbit, guinea pig, hamster, sheep, goat, donkey, cow, horse, camel, chicken, or monkey.

54. The method of any one of claims 51-53, wherein the cells from the non-human species are ES cells and the hybrid cells are EHS cells.

55. The method of any one of claims 47-50, wherein the fusion comprises electrofusion, virus-induced fusion, or chemically-induced fusion.

56. The method of any one of claims 1-55, wherein the target sequence comprises a gene encoding an immunoglobulin or a T-cell receptor subunit.

57. The method of any one of claims 1-56, wherein the target chromosome comprises mouse chromosome 12 and the template chromosome comprises human chromosome 14, or wherein the target chromosome comprises mouse chromosome 6 and the template chromosome comprises human chromosome 2.

58. The method of claim 57, wherein the target sequence comprises a mouse Igh variable region sequence, a mouse Igk variable region sequence, and/or a mouse Igl variable region sequence.

59. The method of claim 58, wherein the mouse Igh variable region sequence comprises sequences encoding mouse _VH , _DH and _JH1-6 gene segments and interpolated non-coding sequences.

60. The method of any one of claims 57-59, wherein the template sequence comprises a human IGH variable region sequence, a human IGK variable region sequence, and/or a human IGL variable region sequence.

61. The method of claim 60, wherein the human IGH variable region sequence comprises sequences encoding human _VH , _DH and _JH 1-6 gene segments and interpolated non-coding sequences.

62. The method of any one of claims 1-61, further comprising recovering the engineered chromosome from the cell selected in step (d).

63. The method of claim 62, wherein recovering the engineered chromosome comprises exposing the cells to colchicine under conditions sufficient to induce micronucleation and collecting the micronucleated cells using centrifugation.

64. The method of any one of claims 1-63, wherein the first and second nucleic acid molecules are plasmids.

65. An engineered chromosome produced by any one of claims 1-64.

66. The engineered chromosome of claim 65, wherein the engineered chromosome is mouse chromosome 12 comprising a human IGH variable region sequence replacing the mouse Igh variable region, or wherein the engineered chromosome is mouse chromosome 6 comprising a human IGK variable region sequence replacing the mouse Igk variable region.

67. The engineered chromosome of claim 66, wherein the mouse Igh variable region comprises _VH , _DH and _JH 1-6 gene segments and intercalated non-coding sequences.

68. The engineered chromosome of claim 66 or 67, wherein the human IGH variable region comprises _VH , _DH and _JH 1-6 gene segments and intercalated non-coding sequences.

69. A cell comprising the engineered chromosome of any one of claims 64-68.

70. The cell of claim 69, wherein the cell is capable of hybridizing with mouse ES cells.

71. The cell of claim 69, wherein the cell is an embryonic stem (ES) cell, an embryonic hybrid stem (EHS) cell, or a fertilized egg.

72. The method of claim 68, wherein the cell is a micronucleated cell.

73. The cell of claim 72, wherein the EHS cell is a hybrid of human and mouse ES cells.

74. The cell of claim 72, wherein the ES cell is a mouse ES cell.

75. A method for generating mouse embryonic stem cells, the method comprising:

a. Fusing micronucleated cells containing the engineered chromosome produced by any one of claims 1-64 with mouse ES cells, wherein:

i. The mouse ES cells contain a chromosome homologous to the engineered chromosome, the homologous chromosome containing a first fluorescent protein operatively linked to a promoter capable of expressing the fluorescent protein in the ES cells, and

ii. At least one subpopulation of the micronucleated cells comprises engineered chromosomes, and wherein the engineered chromosomes comprise a second fluorescent protein different from the first fluorescent protein, the second fluorescent protein being operatively linked to a promoter capable of expressing the fluorescent protein in the ES cells;

b. Select ES cells expressing the first and second fluorescent proteins;

c. Culture the ES cells selected in step (c) until at least one subgroup of the ES cells loses the homologous chromosome; and

d. Select ES cells that express the second fluorescent protein but not the first fluorescent protein.

76. The method of claim 75, wherein culturing the cells in step (c) comprises culturing the cells for at least 5 days.

One day, at least 7 days, at least 10 days, or at least 14 days.

77. The method of claim 75 or 76, wherein selecting the cells in steps (b) and (d) includes fluorescence activated cell sorting (FACS).

78. A mouse ES cell, which is generated by the method of any one of claims 75-77.

79. A transgenic mouse produced from mouse ES cells generated by the method of any one of claims 75-78.

80. The transgenic mouse of claim 79, wherein generating the transgenic mouse comprises injecting the ES cell into a diploid blastocyst, transferring from the nucleus of the ES cell to an enucleated mouse embryo, or complementing a tetraploid embryo.

81. The transgenic mouse of claim 79 or 80, wherein mouse chromosome 12 contains a human IGH variable region sequence replacing the mouse Igh variable region, or wherein mouse chromosome 6 contains a human IGK variable region sequence replacing the mouse Igk variable region.

82. The transgenic mouse of claim 81, wherein the mouse Igh variable region comprises _VH , _DH and _JH 1-6 gene segments and intercalated non-coding sequences.

83. The transgenic mouse of claim 81 or 82, wherein the human IGH variable region comprises _VH , _DH and _JH 1-6 gene segments and intercalated non-coding sequences.

84. A method for producing antibodies, comprising:

a. The transgenic mouse of any one of claims 80-83 is attacked with an antigen, thereby producing a plurality of antibodies comprising human V, D, and J segments from the human IGH variable region; and

b. Isolate antibodies specific to the antigen.

85. An antibody derived from an antibody produced by the method of claim 84.

86. The antibody of claim 85, wherein the antibody comprises a single-chain variable fragment (scFv), a bispecific antibody, or a multispecific antibody.

87. A method for generating chromosomal rearrangements, the method comprising:

a. Provide cells containing a target chromosome with the target location and a template chromosome with the template sequence;

b. Contact the cell with a nucleic acid molecule, the nucleic acid molecule comprising, from 5' to 3', a 5' homologous arm containing a nucleotide sequence upstream of the 5' end of the target site, a marker, and a 3' homologous arm containing a nucleotide sequence upstream of the 5' end of the template sequence;

c. A double-strand break is created at the target site and the 5' end of the template sequence, thereby inserting the marker into the target chromosome at the 3' of the 5' homologous arm sequence, followed by insertion into the template sequence, thereby producing a chromosomal rearrangement; and

d. Select one or more cells that express the marker.

88. The method of claim 87, wherein the lengths of the 5' and 3' homologous arms of the nucleic acid molecule are between about 20 bp and 2,000 bp, between about 50 bp and 1,500 bp, between about 100 bp and 1,400 bp, between about 150 bp and 1,300 bp, between about 200 bp and 1,200 bp, and between about 300 bp and 1,100 bp.

Between approximately 400bp and 1,000bp, or between approximately 500bp and 900bp, or between approximately 600bp and 800bp.

89. The method of claim 87, wherein the lengths of the 5' and 3' homologous arms of the nucleic acid molecule are between about 400 bp and 1,500 bp, between about 500 bp and 1,300 bp, or between about 600 bp and 1,000 bp.

90. The method of claim 87, wherein the lengths of the 5' and 3' homologous arms of the nucleic acid molecule are between about 600 bp and 1,000 bp.

91. The method of any one of claims 87-90, wherein generating the double-strand break in (c) comprises using

CRISPR/Cas endonuclease and at least one sgRNA, one or more zinc finger nucleases, one or more transcription activator-like effector nucleases (TALENs), or one or more CRE recombinases are used to induce the double-strand break.

92. The method of claim 91, wherein the CRISPR/Cas endonuclease comprises CasI, CasIB,

Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, Cas10, CasX, CasY, Cas12a(Cpf1), Cas12b, Cas13a, CsyI, Csy 2. Csy3, CseI, Cse2, CscI, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, CmrI, Cmr3, Cmr4, Cmr5, Cmr6, CsbI,

Csb2, Csb3, Csx17, CsxI4, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, CsfI,

Csf2, Csf3, Csf4, Cms1, C2c1, C2c2 or C2c3 or their homologs, direct homologs or modified forms.

93. The method of claim 91, wherein the CRISPR/Cas endonuclease comprises Cas9, Cpf1,

CasX, CasY, C2c1, C2c3 or their homologs, direct homologs or modified forms.

94. The method of claim 91, wherein the CRISPR/Cas endonuclease comprises Cas9.

95. The method of any one of claims 91-93, wherein generating the double-strand break comprises connecting the cell with...

The CRISPR/Cas endonuclease is in contact with at least a first gNA and a second gNA, the first gNA containing a target-specific sequence that causes the CRISPR/Cas endonuclease to cleave the target site, and the second gNA containing a target-specific sequence at the 5' end of the template sequence.

96. The method of claim 95, wherein contacting the cells with a CRISPR/Cas endonuclease and sgRNA comprises transfecting the cells with one or more nucleic acid molecules encoding the CRISPR/Cas endonuclease and the sgRNA.

97. The method of any one of claims 87-96, wherein the marker comprises a fluorescent protein operatively linked to a promoter capable of expressing the fluorescent protein in the cell.

98. The method of claim 97, wherein the fluorescent protein comprises GFP, YFP, RFP, CFP, BFP, dsRed, mCherry, or tdTomato.

99. The method of any one of claims 87-98, wherein the mark further includes a selection mark.

100. The method of claim 99, wherein the selection marker is selected from the group consisting of: dihydrofolate reductase (DHFR), glutamine synthase (GS), puromycin acetyltransferase, blastomycin deaminase, histidine dehydrogenase, hygromycin phosphotransferase (hph), bleomycin resistance gene and aminoglycoside phosphotransferase (neomycin resistance).

101. The method of any one of claims 87-100, wherein the cells comprise embryonic stem (ES) cells.

102. The method of any one of claims 87-101, wherein the nucleic acid molecule is a plasmid.

103. A cell comprising the chromosomal rearrangement of any one of claims 87-101.

104. The cell of claim 103, wherein the cell is a mouse ES cell.

105. A transgenic mouse derived from mouse ES cells produced from the cells of claim 103 or 104.