[go: up one dir, main page]

CN117586987A - A single base editing system and its use - Google Patents

A single base editing system and its use Download PDF

Info

Publication number
CN117586987A
CN117586987A CN202210968850.5A CN202210968850A CN117586987A CN 117586987 A CN117586987 A CN 117586987A CN 202210968850 A CN202210968850 A CN 202210968850A CN 117586987 A CN117586987 A CN 117586987A
Authority
CN
China
Prior art keywords
fragment
seq
base
fusion protein
base editing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210968850.5A
Other languages
Chinese (zh)
Inventor
孟飞龙
黄敏
蔡燕妮
秦艺宁
尚雅芳
刘浏
刘晓静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Center for Excellence in Molecular Cell Science of CAS
Original Assignee
Center for Excellence in Molecular Cell Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Center for Excellence in Molecular Cell Science of CAS filed Critical Center for Excellence in Molecular Cell Science of CAS
Priority to CN202210968850.5A priority Critical patent/CN117586987A/en
Publication of CN117586987A publication Critical patent/CN117586987A/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • C07K14/4701Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals not used
    • C07K14/4702Regulators; Modulating activity
    • C07K14/4703Inhibitors; Suppressors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04001Cytosine deaminase (3.5.4.1)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/09Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/90Fusion polypeptide containing a motif for post-translational modification
    • C07K2319/91Fusion polypeptide containing a motif for post-translational modification containing a motif for glycosylation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/10Plasmid DNA
    • C12N2800/106Plasmid DNA for vertebrates
    • C12N2800/107Plasmid DNA for vertebrates for mammalian

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Toxicology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

The invention relates to the technical field of biology, and discloses a single base editing system and application. The said method includes: 1) A nucleic acid nicking enzyme or a polynucleotide encoding the same; 2) Cytosine deaminase or a polynucleotide encoding the same; 3) An abasic site protecting polypeptide or a polynucleotide encoding the same. The invention constructs fusion protein or over-expression plasmid containing HMCES, and constructs a novel cytosine-guanine base editor CGBEpH, which can protect abasic sites (AP sites), reduce DNA double-strand break damage generated by CGBEs, and reduce the generation of harmful low-frequency products generated in the editing process.

Description

一种单碱基编辑系统和用途A single base editing system and its use

技术领域Technical field

本发明涉及生物技术领域,特别是涉及一种单碱基编辑系统和用途。The present invention relates to the field of biotechnology, and in particular to a single base editing system and its use.

背景技术Background technique

单碱基编辑器旨在不产生DNA双链断裂损伤的前提下,实现基因组DNA序列特定位点的单碱基突变(Gaudelli et al.,2017;Komor et al.,2016;Nishida et al.,2016),有望成为一种更为安全可靠的临床基因编辑治疗方法。对于不同的基因组位置以及不同的序列,单碱基编辑器的编辑效果会有所不同(Anzalone et al.,2020;Porto et al.,2020),在此过程中会产生的一些低频有害副产物,目前人们认为这些副产物的生产是随机的、无诱因性的,但在临床治疗中这些少量的副产物有可能足以产生严重的不良后果。由于单碱基编辑器的编辑副产物占比较少,目前对其潜在的产生机制尚不明了,需要进一步探索其产生机制从而减少其生成,这对促进单碱基编辑器在临床基因编辑治疗中发挥作用具有重要意义(Doudna,2020)。Single base editors aim to achieve single base mutations at specific sites in genomic DNA sequences without causing DNA double-strand break damage (Gaudelli et al., 2017; Komor et al., 2016; Nishida et al., 2016), is expected to become a safer and more reliable clinical gene editing treatment method. For different genome positions and different sequences, the editing effects of single base editors will be different (Anzalone et al., 2020; Porto et al., 2020), and some low-frequency harmful by-products will be produced during this process. , It is currently believed that the production of these by-products is random and non-inducible, but in clinical treatment these small amounts of by-products may be enough to cause serious adverse consequences. Since the editing by-products of single base editors account for a small proportion, their potential production mechanisms are currently unclear. It is necessary to further explore their production mechanisms to reduce their production. This will promote the use of single base editors in clinical gene editing treatments. It is of great significance to play a role (Doudna, 2020).

单碱基编辑器包含脱氨酶和切割活性缺陷的Cas切割酶两个组分,能够在基因组特定的编辑窗口内通过脱氨酶起始实现特定碱基的突变(Anzalone et al.,2020)。除了这两个基本组分以外,研究人员还通过融合表达其他组分改变编辑产物。例如通过融合表达尿嘧啶糖基化酶抑制因子(UGI),通过维持尿嘧啶(U)不被切除从而提高胞嘧啶到胸腺嘧啶的转换突变,同时降低碱基插入或碱基删除突变的产生频率(Komor et al.,2016)。最早研究发明的胞嘧啶碱基编辑器以及腺嘌呤碱基编辑器,能够较高效率地产生胞嘧啶到胸腺嘧啶、腺嘌呤到鸟嘌呤的转换突变(Komor et al.,2016;Komor et al.,2017;Wang et al.,2017),然而临床上也存在许多需要实现颠换突变方可校正为正常序列的单核苷酸突变(SNV),所以最近研究报道了可以实现胞嘧啶到鸟嘌呤颠换突变的单碱基编辑器(CGBEs)(Chen et al.,2021a;Koblan et al.,2021;Kurt et al.,2021;Zhao et al.,2021),通过去除UGI组分或者进一步融合表达尿嘧啶糖基化酶(UNG),从而促进尿嘧啶的切除进而产生AP位点,有利于实现胞嘧啶到鸟嘌呤的颠换突变,但是也有报道揭示CGBEs会产生较高频率的碱基删除或插入突变(Arbab et al.,2020;Chen et al.,2021a;Koblan et al.,2021;Kurt et al.,2021)。由于碱基删除或者插入突变发生的频率很低,导致常规的实验难以捕获到足够的信息用于进一步分析其机制(Huang et al.,2021)。目前人们认为碱基删除突变和碱基插入突变是同一机制产生的突变,而单碱基编辑器编辑导致的碱基删除突变与基于Cas9的链切割型碱基编辑器导致的碱基删除突变的形成机制却有所不同(Allen etal.,2018;Arbab et al.,2020;Shen et al.,2018),而且目前尚未有研究表明单碱基编辑器是否会产生大片段的碱基缺失突变或者染色体易位,由此可见人们对单碱基编辑器编辑副产物的危害程度尚未有清晰的认知。The single base editor contains two components: a deaminase and a Cas cleavage enzyme defective in cleavage activity, and can achieve mutation of specific bases through deaminase initiation within a specific editing window of the genome (Anzalone et al., 2020) . In addition to these two basic components, the researchers also altered the editing product by expressing other components through fusion. For example, by fusion expression of uracil glycosylase inhibitor (UGI), the conversion mutation of cytosine to thymine can be increased by maintaining uracil (U) from being excised, while reducing the frequency of base insertion or base deletion mutations. (Komor et al., 2016). The earliest researched and invented cytosine base editors and adenine base editors can generate conversion mutations from cytosine to thymine and adenine to guanine with high efficiency (Komor et al., 2016; Komor et al. , 2017; Wang et al., 2017), however, there are also many single nucleotide mutations (SNV) in clinical practice that require transversion mutations to be corrected to normal sequences, so recent studies have reported that cytosine to guanine can be achieved Transversion mutagenic single base editors (CGBEs) (Chen et al., 2021a; Koblan et al., 2021; Kurt et al., 2021; Zhao et al., 2021), by removing UGI components or further fusion Expressing uracil glycosylase (UNG) promotes the excision of uracil and generates AP sites, which is conducive to the realization of cytosine to guanine transversion mutations, but there are also reports revealing that CGBEs will produce a higher frequency of base deletions or insertional mutation (Arbab et al., 2020; Chen et al., 2021a; Koblan et al., 2021; Kurt et al., 2021). Due to the low frequency of base deletion or insertion mutations, conventional experiments are difficult to capture sufficient information for further analysis of their mechanisms (Huang et al., 2021). It is currently believed that base deletion mutations and base insertion mutations are mutations caused by the same mechanism, and base deletion mutations caused by single base editor editing are different from base deletion mutations caused by Cas9-based strand cutting base editors. The formation mechanisms are different (Allen et al., 2018; Arbab et al., 2020; Shen et al., 2018), and there is currently no research showing whether single base editors will produce large base deletion mutations or Chromosome translocation, it can be seen that people do not yet have a clear understanding of the harmfulness of single-base editor editing by-products.

通过干涉DNA损伤修复通路的方式可以优化单碱基编辑器的编辑产物,例如通过融合表达非同源末端连接抑制因子GAM,或者同源重组修复因子53BP1(Canny et al.,2018;Chu et al.,2015;Maruyama et al.,2015),促进CRISPR/Cas9断裂损伤的同源重组修复从而提高编辑效率,减少碱基删除突变,然而GAM结合DSB末端有可能导致持久性DNA断裂的产生而导致无法检测其进一步形成的碱基删除突变产物(Komor et al.,2017);研究人员通过Repair-seq的方法揭示了Cas9(Hussmann et al.,2021)、单碱基编辑器(Koblanet al.,2021)、以及先导性碱基编辑器(Chen et al.,2021b)切割双链DNA后的损伤修复途径;也有研究表明跨损伤DNA合成修复通路(TLS)因子在CGBEs生成胞嘧啶到鸟嘌呤的颠换突变中发挥作用(Koblan et al.,2021)。但是关于单碱基编辑器的低频产物,包括碱基删除突变、碱基插入突变、异常碱基颠换突变等产物的发生机制仍有许多未知,构建减少低频副产物的优化型单碱基编辑器具有重要意义。The editing products of single base editors can be optimized by interfering with the DNA damage repair pathway, such as through fusion expression of the non-homologous end joining inhibitor GAM or the homologous recombination repair factor 53BP1 (Canny et al., 2018; Chu et al. ., 2015; Maruyama et al., 2015), promotes homologous recombination repair of CRISPR/Cas9 break damage, thereby improving editing efficiency and reducing base deletion mutations. However, GAM binding to DSB ends may lead to the generation of persistent DNA breaks. The further formed base deletion mutation products cannot be detected (Komor et al., 2017); researchers used the Repair-seq method to reveal that Cas9 (Hussmann et al., 2021), single base editor (Koblanet al., 2021), and the damage repair pathway after cutting double-stranded DNA by a leading base editor (Chen et al., 2021b); some studies have also shown that trans-damage DNA synthesis repair pathway (TLS) factors generate cytosine to guanine in CGBEs Play a role in transversion mutations (Koblan et al., 2021). However, there are still many unknowns about the occurrence mechanism of low-frequency products of single base editors, including base deletion mutations, base insertion mutations, abnormal base transversion mutations, etc. It is necessary to construct an optimized single base editor that reduces low-frequency by-products. Instruments are of great significance.

发明内容Contents of the invention

鉴于以上所述现有技术的缺点,本发明的目的在于提供一种单碱基编辑系统和用途,以期解决现有技术中存在的问题。In view of the above-mentioned shortcomings of the prior art, the purpose of the present invention is to provide a single base editing system and uses, in order to solve the problems existing in the prior art.

为实现上述目的,本发明具体采用如下技术方案。In order to achieve the above object, the present invention specifically adopts the following technical solutions.

本发明的第一方面保护一种单碱基编辑系统,所述单碱基编辑系统包括:The first aspect of the present invention protects a single base editing system, which includes:

1)核酸切口酶或其编码多核苷酸;1) Nucleic acid nickase or its encoding polynucleotide;

2)胞嘧啶脱氨酶或其编码多核苷酸;2) Cytosine deaminase or its encoding polynucleotide;

3)无碱基位点保护多肽或其编码多核苷酸。3) Abasic site protected polypeptide or its encoding polynucleotide.

在本发明一些实施方式中,所述无碱基位点保护多肽为APEX1竞争性抑制剂或APEX1突变体。In some embodiments of the present invention, the abasic site protected polypeptide is an APEX1 competitive inhibitor or an APEX1 mutant.

优选地,所述APEX1竞争性抑制剂选自HMCES。Preferably, the APEX1 competitive inhibitor is selected from HMCES.

更优选地,所述HMCES来源于真核生物。更优选地,来源于小鼠。More preferably, the HMCES is derived from eukaryotes. More preferably, derived from mice.

在本发明一些具体实施方式中,所述HMCES的氨基酸序列包括:In some specific embodiments of the present invention, the amino acid sequence of HMCES includes:

1)如SEQ ID NO.1所示的氨基酸序列;或,1) The amino acid sequence shown in SEQ ID NO.1; or,

2)与SEQ ID NO.1具有80%以上序列相似性、且具有1)所限定的氨基酸序列的功能的氨基酸序列。2) An amino acid sequence that has at least 80% sequence similarity with SEQ ID NO. 1 and has the function of the amino acid sequence defined in 1).

在本发明一些实施方式中,所述核酸切口酶为Cas9切口酶。In some embodiments of the invention, the nucleic acid nickase is Cas9 nickase.

在本发明一些实施方式中,所述胞嘧啶脱氨酶选自APOBEC1、APOBEC2、APOBEC3A、APOBEC3B、APOBEC3C、APOBEC3D/E、APOBEC3F、APOBEC3G、APOBEC3H、APOBEC4及其突变体或AID及其突变体中的一种或多种。In some embodiments of the present invention, the cytosine deaminase is selected from APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D/E, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4 and mutants thereof or AID and mutants thereof one or more.

在本发明一些实施方式中,所述核酸切口酶或胞嘧啶脱氨酶以融合蛋白的形式表达时,所述融合蛋白包含胞嘧啶脱氨酶片段和核酸切口酶片段;所述无碱基位点保护多肽融合表达或独立表达。In some embodiments of the present invention, when the nucleic acid nickase or cytosine deaminase is expressed in the form of a fusion protein, the fusion protein includes a cytosine deaminase fragment and a nucleic acid nickase fragment; the abasic position Point-protected polypeptides are expressed in fusion or independently.

在本发明某些实施方式中,所述融合蛋白还包括尿嘧啶糖基化蛋白片段和/或核定位片段和/或无碱基位点保护多肽片段。In certain embodiments of the present invention, the fusion protein further includes a uracil glycosylated protein fragment and/or a nuclear localization fragment and/or an abasic site protected polypeptide fragment.

在本发明某些具体实施方式中,所述融合蛋白从N端至C端依次包括胞嘧啶脱氨酶片段和核酸切口酶片段。In certain embodiments of the present invention, the fusion protein includes a cytosine deaminase fragment and a nucleic acid nickase fragment in sequence from the N-terminus to the C-terminus.

在本发明某些更具体实施方式中,所述融合蛋白中,胞嘧啶脱氨酶片段的N端连接有尿嘧啶糖基化蛋白片段和/或无碱基位点保护多肽片段。In some more specific embodiments of the present invention, in the fusion protein, the N-terminus of the cytosine deaminase fragment is connected to a uracil glycosylated protein fragment and/or an abasic site protected polypeptide fragment.

优选地,所述融合蛋白从N端至C端依次包括无碱基位点保护多肽片段、胞嘧啶脱氨酶片段和核酸切口酶片段。Preferably, the fusion protein includes an abasic site protected polypeptide fragment, a cytosine deaminase fragment and a nucleic acid nickase fragment in sequence from the N-terminus to the C-terminus.

优选地,所述融合蛋白从N端至C端依次包括无碱基位点保护多肽片段、胞嘧啶脱氨酶片段、核酸切口酶片段和核定位片段。Preferably, the fusion protein includes an abasic site protected polypeptide fragment, a cytosine deaminase fragment, a nucleic acid nickase fragment and a nuclear localization fragment in sequence from the N-terminus to the C-terminus.

优选地,所述融合蛋白从N端至C端依次包括无碱基位点保护多肽片段、尿嘧啶糖基化蛋白片段、胞嘧啶脱氨酶片段和核酸切口酶片段。Preferably, the fusion protein includes an abasic site protected polypeptide fragment, a uracil glycosylated protein fragment, a cytosine deaminase fragment and a nucleic acid nickase fragment in order from the N-terminus to the C-terminus.

优选地,所述融合蛋白从N端至C端依次包括无碱基位点保护多肽片段、尿嘧啶糖基化蛋白片段、胞嘧啶脱氨酶片段、核酸切口酶片段和核定位片段。Preferably, the fusion protein includes an abasic site protected polypeptide fragment, a uracil glycosylated protein fragment, a cytosine deaminase fragment, a nucleic acid nickase fragment and a nuclear localization fragment in sequence from the N-terminus to the C-terminus.

在本发明一些实施方式中,所述融合蛋白包括氨基酸序列包括如SEQ ID NO.19或SEQ ID NO.20或SEQ ID NO.21所示序列。In some embodiments of the present invention, the fusion protein includes an amino acid sequence including the sequence shown in SEQ ID NO. 19 or SEQ ID NO. 20 or SEQ ID NO. 21.

在本发明一些实施方式中,所述单碱基编辑系统还包括sgRNA或其编码多核苷酸。In some embodiments of the present invention, the single base editing system further includes sgRNA or its encoding polynucleotide.

优选地,所述sgRNA不含有脱氨酶偏好的回文基序。Preferably, the sgRNA does not contain a deaminase-preferred palindromic motif.

更优选的,所述脱氨酶偏好的回文基序为TCGA。More preferably, the palindromic motif preferred by the deaminase is TCGA.

本发明的第二方面保护一种单碱基编辑用融合蛋白,包括无碱基位点保护多肽片段、胞嘧啶脱氨酶片段及核酸切口酶片段。The second aspect of the invention protects a fusion protein for single base editing, including an abasic site protecting polypeptide fragment, a cytosine deaminase fragment and a nucleic acid nickase fragment.

在本发明一些实施方式中,所述无碱基位点保护多肽为APEX1竞争性抑制剂或APEX1突变体。In some embodiments of the present invention, the abasic site protected polypeptide is an APEX1 competitive inhibitor or an APEX1 mutant.

优选地,所述APEX1竞争性抑制剂选自HMCES。Preferably, the APEX1 competitive inhibitor is selected from HMCES.

更优选地,所述HMCES来源于真核生物。更优选地,来源于小鼠。More preferably, the HMCES is derived from eukaryotes. More preferably, derived from mice.

在本发明一些实施方式中,所述HMCES的氨基酸序列包括:In some embodiments of the invention, the amino acid sequence of HMCES includes:

1)如SEQ ID NO.1所示的氨基酸序列;或,1) The amino acid sequence shown in SEQ ID NO.1; or,

2)与SEQ ID NO.1具有80%以上序列相似性、且具有1)所限定的氨基酸序列的功能的氨基酸序列。2) An amino acid sequence that has at least 80% sequence similarity with SEQ ID NO. 1 and has the function of the amino acid sequence defined in 1).

在本发明一些实施方式中,所述融合蛋白从N端至C端依次包括无碱基位点保护多肽片段、胞嘧啶脱氨酶片段和核酸切口酶片段。In some embodiments of the present invention, the fusion protein includes an abasic site protected polypeptide fragment, a cytosine deaminase fragment and a nucleic acid nickase fragment in sequence from the N-terminus to the C-terminus.

在本发明一些实施方式中,所述融合蛋白还包括尿嘧啶糖基化蛋白片段和/或核定位片段。In some embodiments of the present invention, the fusion protein further includes a uracil glycosylated protein fragment and/or a nuclear localization fragment.

优选地,所述胞嘧啶脱氨酶片段的N端连接有尿嘧啶糖基化蛋白片段和/或核定位片段。Preferably, the N-terminus of the cytosine deaminase fragment is connected to a uracil glycosylation protein fragment and/or a nuclear localization fragment.

在本发明一些实施方式中,所述融合蛋白包括氨基酸序列包括如SEQ ID NO.19或SEQ ID NO.20或SEQ ID NO.21所示序列。In some embodiments of the present invention, the fusion protein includes an amino acid sequence including the sequence shown in SEQ ID NO. 19 or SEQ ID NO. 20 or SEQ ID NO. 21.

本发明的第三方面保护一种多核苷酸,编码如上文所述的融合蛋白。A third aspect of the invention protects a polynucleotide encoding a fusion protein as described above.

本发明的第四方面保护一种核酸构建体,所述核酸构建体含有如上文所述的多核苷酸。A fourth aspect of the invention protects a nucleic acid construct comprising a polynucleotide as described above.

本发明的第五方面保护一种表达系统,所述表达系统含有如上文所述的核酸构建体或基因组中整合有如上文所述的多核苷酸。The fifth aspect of the present invention protects an expression system, which contains the nucleic acid construct as described above or the polynucleotide as described above integrated into the genome.

本发明的第六方面保护如上文所述的单碱基编辑系统、上文所述的融合蛋白、上文所述的多核苷酸、上文所述的核酸构建体或上文所述的表达系统在碱基编辑中的用途。The sixth aspect of the present invention protects the single base editing system as described above, the fusion protein as described above, the polynucleotide as described above, the nucleic acid construct as described above or the expression as described above Use of the system in base editing.

本发明的第七方面保护一种基因编辑方法,包括:通过如上文所述的碱基编辑系统或如上文所述的融合蛋白进行基因编辑。The seventh aspect of the present invention protects a gene editing method, which includes: performing gene editing through the base editing system as described above or the fusion protein as described above.

本发明的第八方面保护HMCES作为无碱基位点保护多肽用于基因编辑或用于碱基编辑系统的用途。The eighth aspect of the present invention protects the use of HMCES as an abasic site protected polypeptide for gene editing or for use in a base editing system.

在本发明一些实施方式中,所述HMCES用于在单碱基编辑中减少DNA双链断裂损伤。In some embodiments of the invention, the HMCES is used to reduce DNA double-strand break damage in single base editing.

在本发明一些实施方式中,所述单碱基编辑为胞嘧啶至鸟嘌呤碱基编辑。In some embodiments of the invention, the single base editing is cytosine to guanine base editing.

与现有技术相比,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

本发明通过过表达HMCES或融合表达保护无碱基位点(AP位点),构建了新的胞嘧啶至鸟嘌呤碱基编辑器CGBEpH,能减少CGBEs产生的DNA双链断裂损伤,降低编辑过程中产生的有害低频产物的产生。The present invention constructs a new cytosine to guanine base editor CGBEpH by overexpressing HMCES or fusion expression to protect abasic sites (AP sites), which can reduce DNA double-strand break damage generated by CGBEs and reduce the editing process. The generation of harmful low-frequency products produced in.

附图说明Description of drawings

图1显示为本发明的实施例1中的现有碱基编辑器产生DNA双链断裂损伤的类型图。Figure 1 is a diagram showing the types of DNA double-strand break damage generated by existing base editors in Example 1 of the present invention.

图2显示为本发明的实施例1中筛选影响碱基编辑器的DNA损伤修复因子的实验流程图以及三种碱基编辑器的突变图谱。其中Figure 2 shows an experimental flow chart for screening DNA damage repair factors that affect base editors in Example 1 of the present invention and the mutation maps of three base editors. in

A为筛选影响碱基编辑产物的DNA损伤修复因子的实验设计流程图。首先在稳定表达Streptococcus pyogenes Cas9(后缩写为SpCas9)的HEK293T细胞系中,通过慢病毒感染表达靶向敲除DNA损伤修复因子的sgRNA文库,经嘌呤霉素对感染的细胞筛选6天之后,获得DNA损伤修复因子敲除的细胞文库。随后通过Lipofectamine 2000瞬转基于Staphylococcus aureus Cas9(后缩写为SaCas9)构建的碱基编辑器,靶向特定的SaBE-Test插入序列并进行碱基编辑。通过PCR扩增细胞表达的sgRNA和产生了碱基编辑的SaBE-Test插入序列,经高通量测序检测SaBE-Test插入序列上发生的不同碱基编辑产物及其对应表达的sgRNA的富集情况,进而分析影响不同类型碱基编辑产物生成的DNA损伤修复因子。A is the experimental design flow chart for screening DNA damage repair factors that affect base editing products. First, in the HEK293T cell line stably expressing Streptococcus pyogenes Cas9 (hereinafter abbreviated as SpCas9), an sgRNA library targeting the knockout of DNA damage repair factors was expressed through lentiviral infection. After puromycin was used to screen the infected cells for 6 days, the results were obtained. DNA damage repair factor knockout cell library. The base editor constructed based on Staphylococcus aureus Cas9 (hereinafter abbreviated as SaCas9) was then transiently transferred through Lipofectamine 2000 to target the specific SaBE-Test insertion sequence and perform base editing. The sgRNA expressed in cells was amplified by PCR and the base-edited SaBE-Test insert sequence was generated. High-throughput sequencing was performed to detect the enrichment of different base editing products occurring on the SaBE-Test insert sequence and the corresponding expressed sgRNA. , and then analyze the DNA damage repair factors that affect the generation of different types of base editing products.

B为碱基编辑器AID-SaBE3高通量测序文库中包含阴性对照sgRNA的测序序列上发生的碱基替换突变图谱。B is the base substitution mutation map that occurs on the sequencing sequence containing the negative control sgRNA in the base editor AID-SaBE3 high-throughput sequencing library.

C为碱基编辑器AID-SaBE1n高通量测序文库中包含阴性对照sgRNA的测序序列上发生的碱基替换突变图谱。C is the base substitution mutation map that occurs on the sequencing sequence containing the negative control sgRNA in the base editor AID-SaBE1n high-throughput sequencing library.

D为碱基编辑器AID-SaBE1高通量测序文库中包含阴性对照sgRNA的测序序列上发生的碱基替换突变图谱。D is the base substitution mutation map that occurs on the sequencing sequence containing the negative control sgRNA in the base editor AID-SaBE1 high-throughput sequencing library.

图3显示为本发明的实施例1中SaBE-Test插入序列上发生的不同类型突变占比及其碱基删除突变的图谱。其中Figure 3 shows the proportion of different types of mutations occurring on the SaBE-Test insertion sequence in Example 1 of the present invention and a map of base deletion mutations. in

A为显示SaBE3编辑产物中碱基替换突变、碱基删除突变以及碱基插入突变占比的pie图。A is a pie chart showing the proportion of base substitution mutations, base deletion mutations and base insertion mutations in SaBE3 editing products.

B为显示SaBE1n编辑产物中碱基替换突变、碱基删除突变以及碱基插入突变占比的pie图。B is a pie chart showing the proportion of base substitution mutations, base deletion mutations and base insertion mutations in SaBE1n editing products.

C为显示SaBE1编辑产物中碱基替换突变、碱基删除突变以及碱基插入突变占比的pie图。C is a pie chart showing the proportion of base substitution mutations, base deletion mutations and base insertion mutations in SaBE1 editing products.

D为SaBE3、SaBE1n、SaBE1编辑产生的1bp的碱基删除突变的图谱,y轴表示各个位点发生1bp碱基删除突变的概率,x轴表示SaBE-Test插入序列的1-50bp。D is a map of 1 bp base deletion mutations generated by SaBE3, SaBE1n, and SaBE1 editing. The y-axis represents the probability of 1 bp base deletion mutations at each site, and the x-axis represents 1-50 bp of the SaBE-Test insertion sequence.

E为SaBE3编辑产生的大于1bp的碱基删除突变的图谱,其中红色折线表示不同位置匹配到碱基删除片段左端的概率,蓝色折线表示不同位置匹配到碱基删除片段右端的概率。E is a map of base deletion mutations larger than 1 bp produced by SaBE3 editing. The red line represents the probability of matching the left end of the base deletion fragment at different positions, and the blue line represents the probability of matching the right end of the base deletion fragment at different positions.

F为SaBE1n编辑产生的大于1bp的碱基删除突变的图谱。F is the map of base deletion mutations larger than 1 bp produced by SaBE1n editing.

G为SaBE1编辑产生的大于1bp的碱基删除突变的图谱。G is the map of base deletion mutations larger than 1 bp produced by SaBE1 editing.

图4显示为本发明的实施例1中筛选不同类型碱基编辑产物的DNA损伤修复因子富集结果图。其中Figure 4 shows a graph showing the enrichment results of DNA damage repair factors screening different types of base editing products in Example 1 of the present invention. in

A为影响碱基编辑器编辑效率的基因在不同DNA损伤修复通路的富集结果图。A shows the enrichment results of genes affecting the editing efficiency of base editors in different DNA damage repair pathways.

B为影响SaBE1和SaBE1n编辑效率的富集基因的比对图。B is an alignment diagram of enriched genes that affect the editing efficiency of SaBE1 and SaBE1n.

C为影响SaBE1n和SaBE3编辑效率的富集基因的比对图。C is an alignment diagram of enriched genes that affect the editing efficiency of SaBE1n and SaBE3.

D为文库筛选实验中不同碱基编辑产物类型、不同DNA损伤修复通路的基因的聚类分析热图。D is a cluster analysis heat map of genes with different base editing product types and different DNA damage repair pathways in the library screening experiment.

图5显示为本发明的实施例1中不同类型碱基编辑产物的基因富集分析和基因验证结果图。其中Figure 5 shows the results of gene enrichment analysis and gene verification of different types of base editing products in Example 1 of the present invention. in

A为碱基切除修复通路(BER)因子在编辑突变组比无突变组条件下的富集分析结果图。A shows the enrichment analysis results of base excision repair pathway (BER) factors in the editing mutation group compared with the non-mutation group.

B为碱基错配修复通路(MMR)因子在编辑突变组比无突变组条件下的富集分析结果图。B is the enrichment analysis result of base mismatch repair pathway (MMR) factors in the editing mutation group compared with the non-mutation group.

C为核酸切除修复通路(NER)因子在编辑突变组比无突变组条件下的富集分析结果图。C is the enrichment analysis result of nucleic acid excision repair pathway (NER) factors in the editing mutation group compared with the non-mutation group.

图6显示为本发明的实施例1中CRISPR/Cas9文库筛选实验中影响碱基删除突变的基因富集分析以及各种碱基编辑器发生碱基删除突变的结果分析图谱。其中Figure 6 shows the gene enrichment analysis affecting base deletion mutations in the CRISPR/Cas9 library screening experiment in Example 1 of the present invention and the result analysis chart of base deletion mutations occurring with various base editors. in

A为碱基删除突变组与C>T碱基替换突变组相比后的基因富集分析结果图,筛选出了促进碱基删除突变产生的DNA损伤修复因子。A shows the gene enrichment analysis results of the base deletion mutation group compared with the C>T base substitution mutation group. DNA damage repair factors that promote the generation of base deletion mutations were screened out.

B为碱基编辑器作用下碱基删除突变产生的示意图。B is a schematic diagram of base deletion mutations produced by the base editor.

C为HEK293T细胞系中四种碱基编辑器在4个基因组位置上的碱基编辑效率、C>G突变的占比以及删除突变效率的分析结果图。C is the analysis result of the base editing efficiency, the proportion of C>G mutations and the deletion mutation efficiency of four base editors in the HEK293T cell line at four genomic positions.

D为现有常用的胞嘧啶碱基编辑器BE4max、胞嘧啶至鸟嘌呤碱基编辑器CGBE1/AXC、以及本实验室已有的AID-BE1n在三个基因组序列上发生的大于1bp的碱基删除突变图谱。D is the base larger than 1 bp produced by the commonly used cytosine base editor BE4max, the cytosine to guanine base editor CGBE1/AXC, and the existing AID-BE1n in our laboratory on three genome sequences. Delete mutation map.

图7显示为本发明的实施例1中碱基颠换突变C>G以及C>A的基因富集分析以及验证结果图。其中Figure 7 shows the gene enrichment analysis and verification results of the base transversion mutations C>G and C>A in Example 1 of the present invention. in

A为碱基颠换突变C>G组与C>T碱基替换突变组相比后的基因富集分析结果图。A is the gene enrichment analysis result of the base transversion mutation C>G group compared with the C>T base substitution mutation group.

B为碱基编辑器通过NHEJ和TLS两条通路产生C>G碱基颠换突变的示意图。B is a schematic diagram of the base editor generating C>G base transversion mutations through two pathways: NHEJ and TLS.

C为HEK293T细胞系中敲除NHEJ因子XRCC4或者TLS因子REV1后,CGBEs的编辑效率、C>G突变的占比以及删除突变效率的变化分析结果图。C is the analysis result of changes in editing efficiency, proportion of C>G mutations and deletion mutation efficiency of CGBEs after knocking out NHEJ factor XRCC4 or TLS factor REV1 in HEK293T cell line.

D为HEK293T细胞系中敲除NHEJ因子XRCC4或者TLS因子REV1后,相较于野生型细胞中碱基编辑器在6个基因组位置上编辑产生C>G突变占比的变化分析结果图。D is the analysis result of the change in the proportion of C>G mutations produced by base editor editing at six genomic positions in the HEK293T cell line after knocking out the NHEJ factor XRCC4 or the TLS factor REV1 compared with wild-type cells.

图8显示为本发明的实施例1中碱基编辑器产生的DSB导致染色体易位发生的结果分析图。其中Figure 8 shows an analysis diagram showing the results of chromosomal translocations caused by DSBs generated by the base editor in Example 1 of the present invention. in

A为优化的用于捕获碱基编辑器产生的染色体易位的方法Tn5-HTGTS流程图。A is a flow chart of the optimized method Tn5-HTGTS for capturing chromosomal translocations generated by base editors.

B为指示碱基编辑器作用的细胞中Tn5-HTGTS捕获到的全基因组染色体易位连接的Circos图。B is a Circos diagram of genome-wide chromosomal translocation connections captured by Tn5-HTGTS in cells indicating the action of the base editor.

C为nCas9以及三种碱基编辑器作用于四个基因组位置时,每1000个细胞中捕获到的染色体易位数量分析结果图。C is the analysis result of the number of chromosomal translocations captured per 1000 cells when nCas9 and three base editors act on four genome positions.

D为EMX1位置上Cas9组分介导产生的脱靶热点处的染色体易位连接分布图谱。D is the distribution map of chromosomal translocation connections at off-target hotspots mediated by Cas9 components at the EMX1 position.

E为AID脱氨酶脱靶热点上不同碱基编辑器的染色体易位分布图谱、H3K27Ac的ChIP-seq图谱以及示意转录的PRO-seq图谱。E is the chromosomal translocation distribution map of different base editors on the off-target hotspot of AID deaminase, the ChIP-seq map of H3K27Ac, and the PRO-seq map illustrating transcription.

F为CGBE1在四个基因组位置上产生的与组蛋白标记、开放性染色体、以及转录信号相关的染色体易位的富集情况分析。F is the enrichment analysis of chromosomal translocations related to histone marks, open chromosomes, and transcriptional signals produced by CGBE1 at four genomic locations.

G为CGBE1中染色体易位连接发生的序列中不同基序的富集情况分析。G is the analysis of the enrichment of different motifs in the sequence where chromosomal translocation connection occurs in CGBE1.

H为胞嘧啶至鸟嘌呤碱基编辑器产生染色体易位的示意图。H is a schematic diagram of the chromosomal translocation generated by the cytosine to guanine base editor.

图9显示为本发明的实施例2中单碱基编辑系统CGBEpH的构建方法和结果验证图。其中Figure 9 shows the construction method and result verification diagram of the single base editing system CGBEpH in Example 2 of the present invention. in

D为构建单碱基编辑系统CGBEpH的示意简图。D is a schematic diagram of constructing the single base editing system CGBEpH.

E为转染CGBE或单碱基编辑系统CGBEpH的细胞中产生的碱基缺失突变及大于1bp的重复序列插入突变的效率分析结果图。E shows the efficiency analysis results of base deletion mutations and repetitive sequence insertion mutations greater than 1 bp produced in cells transfected with CGBE or the single base editing system CGBEpH.

F为CGBE或单碱基编辑系统CGBEpH编辑效率以及编辑产物中C>G占比的结果图。F is the result of CGBE or single base editing system CGBEpH editing efficiency and the proportion of C>G in the edited product.

G为显示单碱基编辑系统CGBEs编辑的细胞中易位连接分布的Circos图。G is a Circos diagram showing the distribution of translocation connections in cells edited by single base editing system CGBEs.

图10显示为本发明的实施例3中单碱基编辑用融合蛋白的构建方法和结果验证图。其中Figure 10 shows the construction method and result verification diagram of the fusion protein for single base editing in Example 3 of the present invention. in

A为依赖于HMCES保护进而减少删除突变产生的工作模型图。A is a working model diagram that relies on HMCES protection to reduce the generation of deletion mutations.

B为HMCES与不同蛋白融合后的质粒结构示意图。B is a schematic diagram of the plasmid structure after fusion of HMCES and different proteins.

C为不同不同蛋白融合表达HMCES或者GFP(对照)后,编辑效率、产生碱基删除突变的效率以及C>G占比的结果分析图。C is an analysis chart showing the editing efficiency, the efficiency of generating base deletion mutations and the ratio of C>G after different proteins are fused to express HMCES or GFP (control).

图11显示为本发明的实施例4中回文基序更易发生碱基删除突变的验证结果图。其中Figure 11 shows the verification results showing that the palindromic motif is more likely to undergo base deletion mutations in Example 4 of the present invention. in

A为CGBE1作用于含有回文基序(TCGA)或者非回文基序(TCAA)的编辑序列上发生的大于1bp的碱基删除突变图谱。A is the base deletion mutation map of greater than 1 bp when CGBE1 acts on the editing sequence containing a palindromic motif (TCGA) or a non-palindromic motif (TCAA).

B为CGBE1作用于五对回文/非回文基序序列后产生删除突变的效率差异统计图。B is a statistical diagram of the difference in efficiency of deletion mutations produced by CGBE1 after acting on five pairs of palindromic/non-palindromic motif sequences.

C为ClinVar数据库中3040个潜在CGBE校正位点中,编辑窗口内含有回文基序的占比分析pie图。C is a pie chart analyzing the proportion of 3040 potential CGBE correction sites in the ClinVar database that contain palindromic motifs in the editing window.

具体实施方式Detailed ways

本发明人经过大量研究,发现单碱基编辑器在碱基编辑过程中会产生大量的DNA双链断裂损伤(DSB),这些双链断裂损伤末端会通过末端切除、模板的片段插入和填充等过程进一步加工为小片段碱基删除突变、胞嘧啶到鸟嘌呤的碱基颠换突变以及CGBEs的碱基编辑位点与Cas9或者胞嘧啶脱氨酶来源的脱靶序列之间的染色体易位。同时发现,APEX1是脱嘌呤/脱嘧啶核酸内切酶,切割碱基编辑器编辑过程中产生的无嘌呤/无嘧啶位点(AP位点),进而生成单链切口,而通过APEX1竞争性抑制剂或者表达酶活缺陷的APEX1突变体可以抑制APEX1的活性,进而减少其对AP位点的切割。进而发现,对于碱基编辑器作用过程中产生的AP位点,亦可通过添加APEX1竞争性抑制剂,抑制其切割活性,通过减少AP位点的断裂,进而减少DNA损伤及其介导的各类副产物的产生。进一步发现,HMCES作为APEX1竞争性抑制剂能保护无碱基位点(AP位点),减少碱基编辑过程中产生的DNA损伤,进而能减少DNA损伤介导生成的各类副产物,为今后优化单碱基编辑器提供新的思路,促进了无编辑副产物的完美型单碱基编辑器开发的进程。此外,发明人发现不同胞嘧啶碱基编辑器的编辑产物由顺式的基因组序列和反式的DNA损伤修复途径共同决定,sgRNA靶向序列含有胞嘧啶脱氨酶偏好的回文基序均具有显著增加的碱基删除突变频率,避免选择具有脱氨酶偏好回文基序的sgRNA能降低DNA双链断裂损伤的产生。After extensive research, the inventor found that a single base editor will produce a large number of DNA double-strand breaks (DSBs) during the base editing process. These double-strand break damaged ends will be removed through end resection, template fragment insertion and filling, etc. The process is further processed into small base deletion mutations, cytosine to guanine base transversion mutations, and chromosomal translocations between the base editing sites of CGBEs and off-target sequences derived from Cas9 or cytosine deaminase. At the same time, it was discovered that APEX1 is an apurinic/apyrimidinic endonuclease that cleaves the apurinic/apyrimidine site (AP site) generated during the base editor editing process, thereby generating a single-stranded nick, which is competitively inhibited by APEX1 Agents or expression of APEX1 mutants defective in enzyme activity can inhibit the activity of APEX1, thereby reducing its cleavage of the AP site. It was further discovered that for the AP sites generated during the action of the base editor, APEX1 competitive inhibitors can also be added to inhibit their cleavage activity and reduce the breakage of AP sites, thereby reducing DNA damage and various mediated consequences. Generation of by-products. It was further found that HMCES, as a competitive inhibitor of APEX1, can protect abasic sites (AP sites) and reduce DNA damage generated during the base editing process, thereby reducing various by-products mediated by DNA damage, providing a promising future Optimizing single base editors provides new ideas and promotes the development of perfect single base editors without editing by-products. In addition, the inventors found that the editing products of different cytosine base editors are determined by both the cis-acting genome sequence and the trans-acting DNA damage repair pathway. The sgRNA targeting sequence contains palindromic motifs favored by cytosine deaminase. Significantly increasing the frequency of base deletion mutations and avoiding the selection of sgRNAs with deaminase-preferred palindromic motifs can reduce the generation of DNA double-strand break damage.

本发明中术语“竞争性抑制剂”是指与被抑制的酶的底物在结构上具有相似性,能与底物竞相争夺酶分子上的结合位点,从而产生酶活性的可逆或不可逆的物质。The term "competitive inhibitor" in the present invention refers to a substance that is structurally similar to the substrate of the enzyme to be inhibited and can compete with the substrate for the binding site on the enzyme molecule, thereby producing reversible or irreversible changes in enzyme activity. substance.

本发明的第一方面提供一种单碱基编辑系统,所述单碱基编辑系统包括:A first aspect of the present invention provides a single base editing system, which includes:

1)核酸切口酶或其编码多核苷酸;1) Nucleic acid nickase or its encoding polynucleotide;

2)胞嘧啶脱氨酶或其编码多核苷酸;2) Cytosine deaminase or its encoding polynucleotide;

3)无碱基位点保护多肽或其编码多核苷酸。3) Abasic site protected polypeptide or its encoding polynucleotide.

本发明所提供的单碱基编辑系统中,所述无碱基位点保护多肽为APEX1竞争性抑制剂或APEX1突变体。In the single base editing system provided by the invention, the abasic site protection polypeptide is an APEX1 competitive inhibitor or an APEX1 mutant.

优选地,所述APEX1竞争性抑制剂选自HMCES。所述HMCES来源于真核生物。更优选地,来源于小鼠。Preferably, the APEX1 competitive inhibitor is selected from HMCES. The HMCES is derived from eukaryotes. More preferably, derived from mice.

优选地,所述APEX1突变体为具有酶活缺陷的APEX1。如N212A,D210N,Y171F或R177A等APEX1突变体具有较低的AP位点切割活性,APEX1突变体、核酸切口酶和胞嘧啶脱氨酶在单一融合蛋白上表达或作为分离的蛋白表达都能保护无碱基位点(AP位点),减少DNA损伤及其介导的各类副产物的产生。Preferably, the APEX1 mutant is APEX1 with enzyme activity deficiency. APEX1 mutants such as N212A, D210N, Y171F or R177A have lower AP site cleavage activity. APEX1 mutants, nucleic acid nickases and cytosine deaminase are all protected when expressed on a single fusion protein or as isolated proteins. Abasic sites (AP sites) reduce DNA damage and the production of various by-products mediated by it.

本发明所提供的单碱基编辑系统中,所述HMCES的氨基酸序列包括:In the single base editing system provided by the present invention, the amino acid sequence of HMCES includes:

1)如SEQ ID NO.1所示的氨基酸序列;或,1) The amino acid sequence shown in SEQ ID NO.1; or,

2)与SEQ ID NO.1具有80%以上序列相似性、且具有1)所限定的氨基酸序列的功能的氨基酸序列。具体的,所述2)中的氨基酸序列具体指:如SEQ ID No.1所示的氨基酸序列经过取代、缺失或者添加一个或多个(具体可以是1-50、1-30个、1-20个、1-10个、1-5个、1-3个、1个、2个、或3个)氨基酸而得到的,或者在N-末端和/或C-末端添加一个或多个(具体可以是1-50个、1-30个、1-20个、1-10个、1-5个、1-3个、1个、2个、或3个)氨基酸而得到的,且具有氨基酸如SEQ ID No.1所示的多肽片段的功能的多肽片段。更具体,所述2)中的氨基酸序列可与SEQ ID No.1具有80%、85%、90%、93%、95%、97%、或99%以上的相似性。编码HMCES的核苷酸序列包括如SEQ ID NO.2所示序列。2) An amino acid sequence that has at least 80% sequence similarity with SEQ ID NO. 1 and has the function of the amino acid sequence defined in 1). Specifically, the amino acid sequence in 2) specifically refers to: the amino acid sequence shown in SEQ ID No. 1 with one or more substitutions, deletions or additions (specifically, it can be 1-50, 1-30, 1- 20, 1-10, 1-5, 1-3, 1, 2, or 3) amino acids, or one or more ( Specifically, it can be obtained from 1-50, 1-30, 1-20, 1-10, 1-5, 1-3, 1, 2, or 3) amino acids, and has The amino acid is a functional polypeptide fragment of the polypeptide fragment shown in SEQ ID No. 1. More specifically, the amino acid sequence in 2) may have a similarity of 80%, 85%, 90%, 93%, 95%, 97%, or 99% or more with SEQ ID No. 1. The nucleotide sequence encoding HMCES includes the sequence shown in SEQ ID NO.2.

MCGRTSCHLPREVLTRACAYQDRQGRRRLPQWRDPDKYCPSYNKSPQSSSPVLLSRLHFEKDADSSDRIIIPMRWGLVPSWFKESDPSKLQFNTTNCRSDTIMEKQSFKVPLGKGRRCVVLADGFYEWQRCQGTNQRQPYFIYFPQIKTEKSGGNDASDSSDNKEKVWDNWRLLTMAGIFDCWEAPGGECLYSYSIITVDSCRGLSDIHSRMPAILDGEEAVSKWLDFGEVATQEALKLIHPIDNITFHPVSPVVNNSRNNTPECLAPADLLVKKEPKANGSSQRMMQWLATKSPKKEVPDSPKKDASGLPQWSSQFLQKSPLPAKRGATSSFLDRWLKQEKEDEPMAKKPNS(SEQ ID NO.1)MCGRTSCHLPREVLTRACAYQDRQGRRRLPQWRDPDKYCPSYNKSPQSSSPVLLSRLHFEKDADSSDRIIIPMRWGLVPSWFKESDPSKLQFNTTNCRSDTIMEKQSFKVPLGKGRRCVVLADGFYEWQRCQGTNQRQPYFIYFPQIKTEKSGGNDASDSSDNKEKVWDNWRLLTMAGIFDCWEAPGGECLYSYSIITVDSCRGLSDIHS RMPAILDGEEAVSKWLDFGEVATQEALKLIHPIDNITFHPVSPVVNNSRNNTPECLAPADLLVKKEPKANGSSQRMMQWLATKSPKKEVPDSPKKDASGLPQWSSQFLQKSPLPAKRGATSSFLDRWLKQEKEDEPMAKKPNS(SEQ ID NO.1)

ATGTGCGGGCGAACGTCCTGTCACTTGCCCAGAGAGGTTCTCACCAGGGCCTGCGCCTATCAGGATCGGCAGGGCCGGCGGCGGCTCCCGCAGTGGAGGGACCCCGACAAGTACTGCCCCTCCTACAACAAGAGCCCGCAGTCCAGCAGCCCCGTGCTGCTCTCCAGACTGCACTTTGAGAAGGATGCAGACTCATCAGATCGGATAATTATTCCCATGCGATGGGGCTTAGTCCCATCTTGGTTCAAAGAAAGTGATCCTTCTAAGCTGCAGTTCAACACTACCAACTGTCGTAGTGATACCATAATGGAGAAGCAGTCATTCAAGGTTCCTCTGGGGAAAGGACGGCGGTGTGTTGTTTTAGCAGATGGATTCTACGAGTGGCAGCGGTGTCAGGGAACAAACCAGAGGCAACCATACTTCATCTATTTTCCTCAAATCAAGACAGAGAAGTCAGGTGGGAACGATGCTTCAGACAGCTCTGACAACAAGGAAAAGGTCTGGGACAACTGGAGGCTGCTGACAATGGCAGGGATCTTTGACTGCTGGGAAGCGCCAGGGGGAGAGTGCCTGTATTCCTACAGCATCATCACTGTGGATTCCTGCAGAGGTTTGAGTGACATCCACAGCAGGATGCCTGCCATACTAGATGGAGAAGAAGCAGTCTCCAAATGGCTCGACTTTGGTGAGGTCGCCACTCAGGAAGCTCTGAAGCTAATCCACCCCATAGACAATATCACCTTCCATCCAGTTTCTCCAGTGGTGAACAATTCCCGAAACAACACTCCGGAGTGTCTGGCGCCTGCTGACTTGCTGGTTAAGAAGGAGCCCAAGGCAAATGGCAGCAGTCAAAGGATGATGCAGTGGCTGGCTACAAAGTCACCCAAAAAGGAAGTCCCTGACTCACCCAAAAAGGATGCATCAGGTCTACCCCAGTGGTCCAGCCAGTTTCTCCAGAAGAGCCCATTGCCTGCTAAAAGAGGTGCTACCAGCAGCTTCCTGGATCGATGGCTGAAGCAGGAGAAGGAGGATGAGCCCATGGCCAAGAAGCCTAACAGC(SEQ ID NO.2)ATGTGCGGGCGAACGTCCTGTCACTTGCCCAGAGAGGTTCTCACCAGGGCCTGCGCCTATCAGGATCGGCAGGGGCCGGCGGCGGCTCCCGCAGTGGAGGGACCCCGACAAGTACTGCCCCTCCTACAACAAGAGCCCGCAGTCCAGCAGCCCCGTGCTGCTCTCCAGACTGCACTTTGAGAAGGATGCAGACTCATCAGATCGGATAATTATTCCCATGCGATGGGGCTTAGTCCCATCTTGGTTCAAAGAAAGTGATCCTTCTA AGCTGCAGTTCAACACTACCAACTGTCGTAGTGATACCATAATGGAGAAGCAGTCATTCAAGGTTTCCTCTGGGGAAAGGACGGCGGTGTGTTGTTTTAGCAGATGGATTCTACGAGTGGCAGCGGTGTCAGGGAACAAACCAGAGGCAACCATACTTCATCTATTTTCCTCAAATCAAGACAGAGAAGTCAGGTGGGAACGATGCTTCAGACAGCTCTGACAACAAGGAAAAGGTCTGGGACAACTGGAGGCTGCTGACAATGGCA GGGATCTTTGACTGCTGGGAAGCGCCAGGGGGAGAGTGCCTGTATTCCTACAGCATCATCACTGTGGATTCCTGCAGAGGTTTGAGTGACATCCACAGCAGGATGCCTGCCATACTAGATGGAGAAGAAGCAGTCTCCAAATGGCTCGACTTTGGTGAGGTCGCCACTCAGGAAGCTCTGAAGCTAATCCACCCCATAGACAATATCACCTTCCATCCAGTTTCTCCAGTGGTGAACAATTCCCGAAACAACACTCCGGAGTGTC TGGCGCCTGCTGACTTGCTGGTTAAGAAGGAGCCCAAGGCAAATGGCAGCAGTCAAAGGATGATGCAGTGGCTGGCTACAAAGTCACCCAAAAAGGAAGTCCCTGACTCACCCAAAAAGGATGCATCAGGTCTACCCCAGTGGTCCAGCCAGTTTCTCCAGAAGAGCCCATTGCCTGCTAAAAGAGGTGCTACCAGCAGCTTCCTGGATCGATGGCTGAAGCAGGAGAAGGAGGATGAGCCCATGGCCAAGAAGCCTAACAGC( SEQ ID NO.2)

本发明所提供的单碱基编辑系统中,所述核酸切口酶为Cas9切口酶。所述的核酸切口酶为切割活性完全缺失或部分缺失的Cas9,如dCas9或nCas9。Cas9切口酶切割DNA的靶向链而不切割非靶向链,避免直接产生DNA双链断裂。例如,可以是第一Cas9切口酶片段与第二Cas9切口酶片段配合后依然具有Cas9切口酶的靶向活性,更具体可以是能够在合适的sgRNA的引导下靶向DNA的活性。所述Cas9切口酶选自nSpCas9、nSaCas9、nScCas9和nXCas9中的一种或多种。在本发明的某个具体实施例中Cas9切口酶选自nSpCas9。所述nSpCas9的序列如SEQ ID No.3所示。In the single base editing system provided by the invention, the nucleic acid nickase is Cas9 nickase. The nucleic acid nickase is Cas9 with complete or partial loss of cleavage activity, such as dCas9 or nCas9. Cas9 nickase cleaves the target strand of DNA without cutting the non-target strand, avoiding direct DNA double-strand breaks. For example, the first Cas9 nickase fragment can still have the targeting activity of Cas9 nickase after being combined with the second Cas9 nickase fragment. More specifically, it can be the activity of targeting DNA under the guidance of a suitable sgRNA. The Cas9 nickase is selected from one or more of nSpCas9, nSaCas9, nScCas9 and nXCas9. In a specific embodiment of the invention, the Cas9 nickase is selected from nSpCas9. The sequence of nSpCas9 is shown in SEQ ID No. 3.

本发明所提供的单碱基编辑系统中,所述胞嘧啶脱氨酶选自APOBEC1、APOBEC2、APOBEC3A、APOBEC3B、APOBEC3C、APOBEC3D/E、APOBEC3F、APOBEC3G、APOBEC3H、APOBEC4或胞甘脱氨酶(AID)中的一种或多种。在某些具体实施方式中,所述胞嘧啶脱氨酶选自APOBEC1及其突变体。所述APOBEC1突变体如YE1、YE2、EE、YEE、R33A和R33A/R34A。在某个具体实施方式中为,为APOBEC1或APOBEC1突变体(R33A)或AID。所述APOBEC1突变体(R33A)的氨基酸序列如SEQ ID NO.4所示,所述APOBEC1的氨基酸序列如SEQ ID NO.5所示,所述AID的氨基酸序列如SEQ ID NO.6所示。In the single base editing system provided by the present invention, the cytosine deaminase is selected from APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D/E, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4 or cytosine deaminase (AID ) one or more. In certain embodiments, the cytosine deaminase is selected from APOBEC1 and mutants thereof. The APOBEC1 mutants include YE1, YE2, EE, YEE, R33A and R33A/R34A. In a specific embodiment, it is APOBEC1 or APOBEC1 mutant (R33A) or AID. The amino acid sequence of the APOBEC1 mutant (R33A) is shown in SEQ ID NO.4, the amino acid sequence of APOBEC1 is shown in SEQ ID NO.5, and the amino acid sequence of the AID is shown in SEQ ID NO.6.

本发明所提供的单碱基编辑系统中,所述核酸切口酶或胞嘧啶脱氨酶以融合蛋白的形式表达时,所述融合蛋白包含胞嘧啶脱氨酶片段和核酸切口酶片段;所述无碱基位点保护多肽融合表达或独立表达。In the single base editing system provided by the present invention, when the nucleic acid nickase or cytosine deaminase is expressed in the form of a fusion protein, the fusion protein includes a cytosine deaminase fragment and a nucleic acid nickase fragment; Abasic site protected polypeptides are fused or expressed independently.

本发明所提供的单碱基编辑系统中,所述融合蛋白还包括尿嘧啶糖基化蛋白片段和/或核定位(NLS)片段和/或无碱基位点保护多肽片段。所述核定位(NLS)片段的氨基酸序列如SEQ ID NO.7所示,所述尿嘧啶糖基化蛋白片段的氨基酸序列如SEQ ID NO.8所示。In the single base editing system provided by the present invention, the fusion protein further includes a uracil glycosylated protein fragment and/or a nuclear localization (NLS) fragment and/or an abasic site protected polypeptide fragment. The amino acid sequence of the nuclear localization (NLS) fragment is shown in SEQ ID NO.7, and the amino acid sequence of the uracil glycosylated protein fragment is shown in SEQ ID NO.8.

本发明所提供的单碱基编辑系统中,所述融合蛋白从N端至C端依次包括胞嘧啶脱氨酶片段和核酸切口酶片段。In the single base editing system provided by the invention, the fusion protein includes a cytosine deaminase fragment and a nucleic acid nickase fragment in sequence from the N-terminus to the C-terminus.

本发明所提供的单碱基编辑系统中,所述融合蛋白中,胞嘧啶脱氨酶片段的N端连接有尿嘧啶糖基化蛋白片段和/或无碱基位点保护多肽片段。In the single base editing system provided by the present invention, in the fusion protein, the N-terminus of the cytosine deaminase fragment is connected to a uracil glycosylated protein fragment and/or an abasic site protected polypeptide fragment.

优选地,所述融合蛋白从N端至C端依次包括无碱基位点保护多肽片段、胞嘧啶脱氨酶片段和核酸切口酶片段。Preferably, the fusion protein includes an abasic site protected polypeptide fragment, a cytosine deaminase fragment and a nucleic acid nickase fragment in sequence from the N-terminus to the C-terminus.

优选地,所述融合蛋白从N端至C端依次包括无碱基位点保护多肽片段、胞嘧啶脱氨酶片段、核酸切口酶片段和核定位片段。Preferably, the fusion protein includes an abasic site protected polypeptide fragment, a cytosine deaminase fragment, a nucleic acid nickase fragment and a nuclear localization fragment in sequence from the N-terminus to the C-terminus.

优选地,所述融合蛋白从N端至C端依次包括无碱基位点保护多肽片段、尿嘧啶糖基化蛋白片段、胞嘧啶脱氨酶片段和核酸切口酶片段。Preferably, the fusion protein includes an abasic site protected polypeptide fragment, a uracil glycosylated protein fragment, a cytosine deaminase fragment and a nucleic acid nickase fragment in order from the N-terminus to the C-terminus.

优选地,所述融合蛋白从N端至C端依次包括无碱基位点保护多肽片段、尿嘧啶糖基化蛋白片段、胞嘧啶脱氨酶片段、核酸切口酶片段和核定位片段。Preferably, the fusion protein includes an abasic site protected polypeptide fragment, a uracil glycosylated protein fragment, a cytosine deaminase fragment, a nucleic acid nickase fragment and a nuclear localization fragment in sequence from the N-terminus to the C-terminus.

本发明所提供的单碱基编辑系统中,所述融合蛋白包括氨基酸序列包括如SEQ IDNO.19或SEQ ID NO.20或SEQ ID NO.21所示序列。In the single base editing system provided by the present invention, the fusion protein includes an amino acid sequence including the sequence shown in SEQ ID NO. 19 or SEQ ID NO. 20 or SEQ ID NO. 21.

本发明所提供的单碱基编辑系统中,所述单碱基编辑系统还包括sgRNA或其编码多核苷酸。优选地,所述sgRNA不含有脱氨酶偏好的回文基序。更优选的,所述脱氨酶偏好的回文基序为TCGA。本发明发现,靶向编辑区域内含有回文基序TCGA的靶向位点序列在碱基编辑器作用编辑过程中产生较高频率的碱基删除突变,导致其目的碱基编辑产物碱基替换突变频率较低。In the single base editing system provided by the present invention, the single base editing system further includes sgRNA or its encoding polynucleotide. Preferably, the sgRNA does not contain a deaminase-preferred palindromic motif. More preferably, the palindromic motif preferred by the deaminase is TCGA. The present invention found that the target site sequence containing the palindromic motif TCGA in the targeted editing region produces a higher frequency of base deletion mutations during the editing process of the base editor, resulting in base substitution of the target base editing product. Mutation frequency is low.

本发明的第二方面提供一种单碱基编辑用融合蛋白,包括无碱基位点保护多肽片段、胞嘧啶脱氨酶片段及核酸切口酶片段。A second aspect of the present invention provides a fusion protein for single base editing, including an abasic site protection polypeptide fragment, a cytosine deaminase fragment and a nucleic acid nickase fragment.

本发明所提供的单碱基编辑用融合蛋白中,所述无碱基位点保护多肽片段为APEX1竞争性抑制剂和/或APEX1突变体;优选地,所述APEX1竞争性抑制剂选自HMCES。如N212A,D210N,Y171F或R177A等APEX1突变体具有较低的AP位点切割活性,APEX1突变体、核酸切口酶和胞嘧啶脱氨酶在单一融合蛋白上表达或作为分离的蛋白表达都能保护无碱基位点(AP位点),减少DNA损伤及其介导的各类副产物的产生。In the fusion protein for single base editing provided by the present invention, the abasic site protection polypeptide fragment is an APEX1 competitive inhibitor and/or an APEX1 mutant; preferably, the APEX1 competitive inhibitor is selected from HMCES . APEX1 mutants such as N212A, D210N, Y171F or R177A have lower AP site cleavage activity. APEX1 mutants, nucleic acid nickases and cytosine deaminase are all protected when expressed on a single fusion protein or as isolated proteins. Abasic sites (AP sites) reduce DNA damage and the production of various by-products mediated by it.

优选地,所述HMCES来源于真核生物,优选地,来源于小鼠。Preferably, the HMCES is derived from eukaryotes, preferably from mice.

本发明所提供的单碱基编辑用融合蛋白中,所述HMCES的氨基酸序列包括:In the fusion protein for single base editing provided by the present invention, the amino acid sequence of HMCES includes:

1)如SEQ ID NO.1所示的氨基酸序列;或,1) The amino acid sequence shown in SEQ ID NO.1; or,

2)与SEQ ID NO.1具有80%以上序列相似性、且具有1)所限定的氨基酸序列的功能的氨基酸序列。具体的,所述2)中的氨基酸序列具体指:如SEQ ID No.1所示的氨基酸序列经过取代、缺失或者添加一个或多个(具体可以是1-50、1-30个、1-20个、1-10个、1-5个、1-3个、1个、2个、或3个)氨基酸而得到的,或者在N-末端和/或C-末端添加一个或多个(具体可以是1-50个、1-30个、1-20个、1-10个、1-5个、1-3个、1个、2个、或3个)氨基酸而得到的,且具有氨基酸如SEQ ID No.1所示的多肽片段的功能的多肽片段。更具体,所述2)中的氨基酸序列可与SEQ ID No.1具有80%、85%、90%、93%、95%、97%、或99%以上的相似性。编码HMCES的核苷酸序列包括如SEQ ID NO.2所示序列。2) An amino acid sequence that has at least 80% sequence similarity with SEQ ID NO. 1 and has the function of the amino acid sequence defined in 1). Specifically, the amino acid sequence in 2) specifically refers to: the amino acid sequence shown in SEQ ID No. 1 with one or more substitutions, deletions or additions (specifically, it can be 1-50, 1-30, 1- 20, 1-10, 1-5, 1-3, 1, 2, or 3) amino acids, or one or more ( Specifically, it can be obtained from 1-50, 1-30, 1-20, 1-10, 1-5, 1-3, 1, 2, or 3) amino acids, and has The amino acid is a functional polypeptide fragment of the polypeptide fragment shown in SEQ ID No. 1. More specifically, the amino acid sequence in 2) may have a similarity of 80%, 85%, 90%, 93%, 95%, 97%, or 99% or more with SEQ ID No. 1. The nucleotide sequence encoding HMCES includes the sequence shown in SEQ ID NO.2.

本发明所提供的单碱基编辑用融合蛋白中,所述融合蛋白从N端至C端依次包括无碱基位点保护多肽片段、胞嘧啶脱氨酶片段和核酸切口酶片段。In the fusion protein for single base editing provided by the present invention, the fusion protein includes an abasic site protection polypeptide fragment, a cytosine deaminase fragment and a nucleic acid nickase fragment in order from the N-terminus to the C-terminus.

本发明所提供的单碱基编辑用融合蛋白中,所述核酸切口酶为Cas9切口酶。所述Cas9切口酶选自nSpCas9、nSaCas9、nScCas9和nXCas9中的一种或多种。在本发明的某个具体实施例中Cas9切口酶选自nSpCas9。所述nSpCas9的氨基酸序列如SEQ ID NO.3所示。In the fusion protein for single base editing provided by the present invention, the nucleic acid nickase is Cas9 nickase. The Cas9 nickase is selected from one or more of nSpCas9, nSaCas9, nScCas9 and nXCas9. In a specific embodiment of the invention, the Cas9 nickase is selected from nSpCas9. The amino acid sequence of nSpCas9 is shown in SEQ ID NO.3.

DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD(SEQ ID NO.3)DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKS RRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDF YPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLF DDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDS IDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGE IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISE FSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD(SEQ ID NO.3)

本发明所提供的单碱基编辑用融合蛋白中,所述胞嘧啶脱氨酶选自APOBEC1、APOBEC2、APOBEC3A、APOBEC3B、APOBEC3D/E、APOBEC3G、APOBEC3H、APOBEC4及其突变体或AID及其突变体的一种或多种。在某些具体实施方式中,所述胞嘧啶脱氨酶选自APOBEC1及其突变体。所述APOBEC1突变体如YE1、YE2、EE、YEE、R33A和R33A/R34A。在某个具体实施方式中为,为APOBEC1或APOBEC1突变体(R33A)或AID。所述APOBEC1突变体(R33A)的氨基酸序列如SEQ ID NO.4所示,所述APOBEC1的氨基酸序列如SEQ ID NO.5所示,所述AID的氨基酸序列如SEQ ID NO.6所示。In the fusion protein for single base editing provided by the invention, the cytosine deaminase is selected from APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3D/E, APOBEC3G, APOBEC3H, APOBEC4 and mutants thereof or AID and mutants thereof of one or more. In certain embodiments, the cytosine deaminase is selected from APOBEC1 and mutants thereof. The APOBEC1 mutants include YE1, YE2, EE, YEE, R33A and R33A/R34A. In a specific embodiment, it is APOBEC1 or APOBEC1 mutant (R33A) or AID. The amino acid sequence of the APOBEC1 mutant (R33A) is shown in SEQ ID NO.4, the amino acid sequence of APOBEC1 is shown in SEQ ID NO.5, and the amino acid sequence of the AID is shown in SEQ ID NO.6.

SSETGPVAVDPTLRRRIEPHEFEVFFDPRELAKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLK(SEQ ID NO.4)SSETGPVAVDPTLRRRIEPHEVFFDPRELAKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLP PHILWATGLK(SEQ ID NO.4)

SSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLK(SEQ ID NO.5)SSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQR LPPHILWATGLK(SEQ ID NO.5)

MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGVQIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRILQ(SEQ ID NO.6)MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGVQIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRILQ(SEQ ID NO.6)

本发明所提供的单碱基编辑用融合蛋白中,所述融合蛋白还包括尿嘧啶糖基化蛋白片段和/或核定位(NLS)片段。In the fusion protein for single base editing provided by the present invention, the fusion protein further includes a uracil glycosylation protein fragment and/or a nuclear localization (NLS) fragment.

优选地,所述胞嘧啶脱氨酶片段的N端连接有尿嘧啶糖基化蛋白片段和/或核定位(NLS)片段。Preferably, the N-terminus of the cytosine deaminase fragment is connected to a uracil glycosylation protein fragment and/or a nuclear localization (NLS) fragment.

优选地,所述核定位(NLS)片段的序列如SEQ ID NO.7所示。本发明中C端连接有核定位片段有利于更高效率进行编辑,而N段连接有核定位片段(NLS),则不能减少CGBE产生的碱基删除突变,由于N端短肽影响了HMCES的活性。Preferably, the sequence of the nuclear localization (NLS) fragment is shown in SEQ ID NO.7. In the present invention, the nuclear localization fragment connected to the C-terminal is conducive to higher efficiency editing, but the nuclear localization fragment (NLS) connected to the N-terminal cannot reduce the base deletion mutations produced by CGBE, because the N-terminal short peptide affects the HMCES active.

PKKKRKV(SEQ ID NO.7)。PKKKRKV (SEQ ID NO. 7).

优选地,所述尿嘧啶糖基化蛋白片段选自eUNG。所述的eUNG的氨基酸序列如SEQID NO.8所示。所述尿嘧啶糖基化蛋白片段位于核酸切口酶片段和无碱基位点保护多肽片段之间。Preferably, the uracil glycosylated protein fragment is selected from eUNG. The amino acid sequence of eUNG is shown in SEQ ID NO.8. The uracil glycosylated protein fragment is located between the nucleic acid nickase fragment and the abasic site protected polypeptide fragment.

ANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAES(SEQ ID NO.8)ANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPPHPSPLSAHRGFFGCNHFVLANQWLEQR GETPIDWMPVLPAES(SEQ ID NO.8)

本发明所提供的单碱基编辑用融合蛋白中,所述无碱基位点保护多肽片段、胞嘧啶脱氨酶片段或核酸切口酶片段中的一些或全部之间具有连接肽。In the fusion protein for single base editing provided by the present invention, some or all of the abasic site protection polypeptide fragments, cytosine deaminase fragments or nucleic acid nickase fragments have connecting peptides between them.

SGGSSGGSSGSETPGTSESATPESSGGSSGGS(SEQ ID NO.9)(Hmces和Ung之间)SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO.9) (between Hmces and Ung)

ESGGSGGSGGS(SEQ ID NO.10)(Ung和APOBEC1(R33A)之间)ESGGSGGSGGS(SEQ ID NO.10)(between Ung and APOBEC1(R33A))

SGGSSGGSSGSETPGTSESATPESSGGSSGGS(SEQ ID NO.11)(APOBEC1(R33A)和nCas9之间)SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO.11) (between APOBEC1 (R33A) and nCas9)

SGGSKRTADGSEFE(SEQ ID NO.12)(nCas9和NLS之间)SGGSKRTADGSEFE(SEQ ID NO.12)(between nCas9 and NLS)

SGGSSGGSSGSETPGTSESATPESSGGSSGGS(SEQ ID NO.13)(Hmces和APOBEC1之间)SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO.13) (between Hmces and APOBEC1)

SGGSSGGSSGSETPGTSESATPESSGGSSGGS(SEQ ID NO.14)(APOBEC1和nCas9之间)SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO.14) (between APOBEC1 and nCas9)

SGGSKRTADGSEFE(SEQ ID NO.15)(nCas和NLS)SGGSKRTADGSEFE(SEQ ID NO.15)(nCas and NLS)

SGGSSGGSSGSETPGTSESATPESSGGSSGGS(SEQ ID NO.16)(Hmces和AID之间)SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO.16) (between Hmces and AID)

SGSETPGTSESATPES(SEQ ID NO.17)(AID和nCas9之间)SGSETPGTSESATPES(SEQ ID NO.17)(between AID and nCas9)

SGGS(SEQ ID NO.18)(nCas和NLS)SGGS(SEQ ID NO.18)(nCas and NLS)

本发明所提供的单碱基编辑用融合蛋白中,所述融合蛋白包括氨基酸序列包括如SEQ ID NO.19或SEQ ID NO.20或SEQ ID NO.21所示序列。In the fusion protein for single base editing provided by the present invention, the fusion protein includes an amino acid sequence including the sequence shown in SEQ ID NO. 19 or SEQ ID NO. 20 or SEQ ID NO. 21.

Hmces-连接肽-eUng-连接肽-APOBEC1(R33A)-连接肽-nCas9-连接肽-NLSHmces-linker peptide-eUng-linker peptide-APOBEC1(R33A)-linker peptide-nCas9-linker peptide-NLS

MCGRTSCHLPREVLTRACAYQDRQGRRRLPQWRDPDKYCPSYNKSPQSSSPVLLSRLHFEKDADSSDRIIIPMRWGLVPSWFKESDPSKLQFNTTNCRSDTIMEKQSFKVPLGKGRRCVVLADGFYEWQRCQGTNQRQPYFIYFPQIKTEKSGGNDASDSSDNKEKVWDNWRLLTMAGIFDCWEAPGGECLYSYSIITVDSCRGLSDIHSRMPAILDGEEAVSKWLDFGEVATQEALKLIHPIDNITFHPVSPVVNNSRNNTPECLAPADLLVKKEPKANGSSQRMMQWLATKSPKKEVPDSPKKDASGLPQWSSQFLQKSPLPAKRGATSSFLDRWLKQEKEDEPMAKKPNSSGGSSGGSSGSETPGTSESATPESSGGSSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESESGGSGGSGGSSSETGPVAVDPTLRRRIEPHEFEVFFDPRELAKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSKRTADGSEFEPKKKRKV(SEQ ID NO.19)MCGRTSCHLPREVLTRACAYQDRQGRRRLPQWRDPDKYCPSYNKSPQSSSPVLLSRLHFEKDADSSDRIIIPMRWGLVPSWFKESDPSKLQFNTTNCRSDTIMEKQSFKVPLGKGRRCVVLADGFYEWQRCQGTNQRQPYFIYFPQIKTEKSGGNDASDSSDNKEKVWDNWRLLTMAGIFDCWEAPGGECLYSYSIITVDSCRGLSDIHS RMPAILDGEEAVSKWLDFGEVATQEALKLIHPIDNITFHPVSPVVNNSRNNTPECLAPADLLVKKEPKANGSSQRMMQWLATKSPKKEVPDSPKKDASGLPQWSSQFLQKSPLPAKRGATSSFLDRWLKQEKEDEPMAKKPNSSGGSSGGSSGSETPGTSESATPESSGGSSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFR FTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESESGGSGGSGGSSSETGPVAVDPTLRRRIEPHEFEVFFDPRELAKETC LLYEINWGGRHSIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESATPESSGGSSGG SDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRR LENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFY PFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDD KVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSID NKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIR KRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFS KRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSKRTADGSEFEPKKKRKV(SEQ ID NO.19)

Hmces-连接肽-APOBEC1-连接肽-nCas9-连接肽-NLSHmces-linker peptide-APOBEC1-linker peptide-nCas9-linker peptide-NLS

MCGRTSCHLPREVLTRACAYQDRQGRRRLPQWRDPDKYCPSYNKSPQSSSPVLLSRLHFEKDADSSDRIIIPMRWGLVPSWFKESDPSKLQFNTTNCRSDTIMEKQSFKVPLGKGRRCVVLADGFYEWQRCQGTNQRQPYFIYFPQIKTEKSGGNDASDSSDNKEKVWDNWRLLTMAGIFDCWEAPGGECLYSYSIITVDSCRGLSDIHSRMPAILDGEEAVSKWLDFGEVATQEALKLIHPIDNITFHPVSPVVNNSRNNTPECLAPADLLVKKEPKANGSSQRMMQWLATKSPKKEVPDSPKKDASGLPQWSSQFLQKSPLPAKRGATSSFLDRWLKQEKEDEPMAKKPNSSGGSSGGSSGSETPGTSESATPESSGGSSGGSSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSKRTADGSEFEPKKKRKV(SEQ IDNO.20)MCGRTSCHLPREVLTRACAYQDRQGRRRLPQWRDPDKYCPSYNKSPQSSSPVLLSRLHFEKDADSSDRIIIPMRWGLVPSWFKESDPSKLQFNTTNCRSDTIMEKQSFKVPLGKGRRCVVLADGFYEWQRCQGTNQRQPYFIYFPQIKTEKSGGNDASDSSDNKEKVWDNWRLLTMAGIFDCWEAPGGECLYSYSIITVDSCRGLSDIHS RMPAILDGEEAVSKWLDFGEVATQEALKLIHPIDNITFHPVSPVVNNSRNNTPECLAPADLLVKKEPKANGSSQRMMQWLATKSPKKEVPDSPKKDASGLPQWSSQFLQKSPLPAKRGATSSFLDRWLKQEKEDEPMAKKPNSSGGSSGGSSGSETPGTSESATPESSGGSSGGSSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQ NTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVIT DEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIAL SLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYY VGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSR KLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSE EVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFAT VRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSKRTADGSEFEPKKKRKV(SEQ IDNO.20)

Hmces-连接肽-AID-连接肽-nCas9-连接肽-NLSHmces-linking peptide-AID-linking peptide-nCas9-linking peptide-NLS

MCGRTSCHLPREVLTRACAYQDRQGRRRLPQWRDPDKYCPSYNKSPQSSSPVLLSRLHFEKDADSSDRIIIPMRWGLVPSWFKESDPSKLQFNTTNCRSDTIMEKQSFKVPLGKGRRCVVLADGFYEWQRCQGTNQRQPYFIYFPQIKTEKSGGNDASDSSDNKEKVWDNWRLLTMAGIFDCWEAPGGECLYSYSIITVDSCRGLSDIHSRMPAILDGEEAVSKWLDFGEVATQEALKLIHPIDNITFHPVSPVVNNSRNNTPECLAPADLLVKKEPKANGSSQRMMQWLATKSPKKEVPDSPKKDASGLPQWSSQFLQKSPLPAKRGATSSFLDRWLKQEKEDEPMAKKPNSSGGSSGGSSGSETPGTSESATPESSGGSSGGSMDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGVQIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRILQSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV(SEQ ID NO.21)MCGRTSCHLPREVLTRACAYQDRQGRRRLPQWRDPDKYCPSYNKSPQSSSPVLLSRLHFEKDADSSDRIIIPMRWGLVPSWFKESDPSKLQFNTTNCRSDTIMEKQSFKVPLGKGRRCVVLADGFYEWQRCQGTNQRQPYFIYFPQIKTEKSGGNDASDSSDNKEKVWDNWRLLTMAGIFDCWEAPGGECLYSYSIITVDSCRGLSDIHS RMPAILDGEEAVSKWLDFGEVATQEALKLIHPIDNITFHPVSPVVNNSRNNTPECLAPADLLVKKEPKANGSSQRMMQWLATKSPKKEVPDSPKKDASGLPQWSSQFLQKSPLPAKRGATSSFLDRWLKQEKEDEPMAKKPNSSGGSSGGSSGSETPGTSESATPESSGGSSGGSMDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCH VELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGVQIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRILQSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQE IFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADILLLFLAAKNLSDASDI LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSL LYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIAN LAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVA QILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVA KVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATHQSITGLYETRIDLSQLGGDS GGSPKKKRKV(SEQ ID NO.21)

本发明的第三方面提供一种多核苷酸,编码如上文所述的融合蛋白。A third aspect of the invention provides a polynucleotide encoding a fusion protein as described above.

本发明的第四方面提供一种核酸构建体,所述核酸构建体含有如上文所述的多核苷酸。A fourth aspect of the invention provides a nucleic acid construct comprising a polynucleotide as described above.

本发明的第五方面提供一种表达系统,所述表达系统含有如上文所述的核酸构建体或基因组中整合有如上文所述的多核苷酸。所述表达系统可以是大肠杆菌等。在本发明一优选实施方式中,所述表达系统为DH5α。A fifth aspect of the present invention provides an expression system, which contains the nucleic acid construct as described above or the polynucleotide as described above integrated into the genome. The expression system can be E. coli or the like. In a preferred embodiment of the present invention, the expression system is DH5α.

本发明的第六方面提供如上文所述的单碱基编辑系统、上文所述的融合蛋白、上文所述的多核苷酸、上文所述的核酸构建体或上文所述的表达系统在碱基编辑中的用途。所述用途可以是例如碱基转变、基因失活等。本发明所提供的单碱基编辑系统、融合蛋白、多核苷酸、核酸构建体或表达系统,能阻止碱基编辑过程中产生无碱基位点(AP位点),从而减少碱基编辑过程中产生的DNA损伤,进而能减少DNA损伤介导生成的各类副产物的产生,从而实现高效的碱基编辑。The sixth aspect of the present invention provides the single base editing system as described above, the fusion protein as described above, the polynucleotide as described above, the nucleic acid construct as described above or the expression as described above Use of the system in base editing. The use may be, for example, base conversion, gene inactivation, etc. The single base editing system, fusion protein, polynucleotide, nucleic acid construct or expression system provided by the invention can prevent the generation of abasic sites (AP sites) during the base editing process, thereby reducing the base editing process. The DNA damage produced in the process can then reduce the production of various by-products mediated by DNA damage, thereby achieving efficient base editing.

本发明的第七方面提供一种基因编辑方法,包括:通过如上文所述的碱基编辑系统或如上文所述的融合蛋白进行基因编辑。A seventh aspect of the present invention provides a gene editing method, including: performing gene editing through the base editing system as described above or the fusion protein as described above.

本发明的第八方面提供HMCES作为无碱基位点保护多肽用于基因编辑或用于碱基编辑系统的用途。An eighth aspect of the present invention provides the use of HMCES as an abasic site protection polypeptide for gene editing or for use in a base editing system.

本发明所提供的用途中,所述HMCES用于在单碱基编辑中减少DNA双链断裂损伤。In the uses provided by the present invention, the HMCES is used to reduce DNA double-strand break damage in single base editing.

本发明所提供的用途中,所述单碱基编辑为胞嘧啶(C)至鸟嘌呤(G)碱基编辑。In the uses provided by the invention, the single base editing is cytosine (C) to guanine (G) base editing.

以下由特定的具体实施例说明本发明的实施方式,熟悉此技术的人士可由本说明书所揭露的内容轻易地了解本发明的其他优点及功效。The implementation of the present invention is described below with specific embodiments. Those familiar with this technology can easily understand other advantages and effects of the present invention from the content disclosed in this specification.

在进一步描述本发明具体实施方式之前,应理解,本发明的保护范围不局限于下述特定的具体实施方案;还应当理解,本发明实施例中使用的术语是为了描述特定的具体实施方案,而不是为了限制本发明的保护范围。下列实施例中未注明具体条件的试验方法,通常按照常规条件,或者按照各制造商所建议的条件。Before further describing the specific embodiments of the present invention, it should be understood that the protection scope of the present invention is not limited to the following specific specific embodiments; it should also be understood that the terms used in the embodiments of the present invention are for describing specific specific embodiments, It is not intended to limit the scope of the present invention. Test methods without specifying specific conditions in the following examples usually follow conventional conditions or conditions recommended by each manufacturer.

当实施例给出数值范围时,应理解,除非本发明另有说明,每个数值范围的两个端点以及两个端点之间任何一个数值均可选用。除非另外定义,本发明中使用的所有技术和科学术语与本技术领域技术人员通常理解的意义相同。除实施例中使用的具体方法、设备、材料外,根据本技术领域的技术人员对现有技术的掌握及本发明的记载,还可以使用与本发明实施例中所述的方法、设备、材料相似或等同的现有技术的任何方法、设备和材料来实现本发明。When the examples give numerical ranges, it should be understood that, unless otherwise stated in the present invention, both endpoints of each numerical range and any value between the two endpoints can be selected. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. In addition to the specific methods, equipment, and materials used in the embodiments, those skilled in the art can also use methods, equipment, and materials described in the embodiments of the present invention based on their understanding of the prior art and the description of the present invention. Any methods, equipment and materials similar or equivalent to those in the prior art may be used to implement the present invention.

本申请下述实施例中,pcDNA3质粒见文献1(Citation:An W,Zhang Z,Zeng L,Yang Y,Zhu X,Wu J(2015)Cyclin Y Is Involved in the Regulation of Adipogenesisand Lipid Production.PLoS ONE 10(7):e0132721.https://doi.org/10.1371/journal.pone.0132721)。In the following examples of this application, the pcDNA3 plasmid can be found in Document 1 (Citation: An W, Zhang Z, Zeng L, Yang Y, Zhu X, Wu J (2015) Cyclin Y Is Involved in the Regulation of Adipogenesis and Lipid Production.PLoS ONE 10(7):e0132721. https://doi.org/10.1371/journal.pone.0132721).

PX33R质粒为在PX330(Addgene plasmid#42230)的基础上,经Sbf I和Kpn I核酸内切酶酶切后,由T4连接酶连接,得到无Cas9组分的PX33R载体。The PX33R plasmid is based on PX330 (Addgene plasmid#42230). After being digested by Sbf I and Kpn I endonucleases, it is ligated by T4 ligase to obtain the PX33R vector without Cas9 component.

实施例1碱基编辑器产生的编辑产物和相关损失修复机制研究Example 1 Research on editing products produced by base editors and related loss repair mechanisms

本实施例中,为了分析CBE不同DNA损伤类型产生的编辑产物及其相关的DNA损伤修复机制,构建了结合SpCas9和基于SaCas9的SaBE的正交筛选碱基编辑体系,然后考察碱基编辑器在碱基编辑过程中,产生不同编辑产物的相关DNA损伤修复机制。包括如下:In this example, in order to analyze the editing products produced by different DNA damage types of CBE and their related DNA damage repair mechanisms, an orthogonal screening base editing system combining SpCas9 and SaCas9-based SaBE was constructed, and then the base editor was examined. During the base editing process, relevant DNA damage repair mechanisms produce different editing products. Includes the following:

1.1碱基编辑器文库的构建1.1 Construction of base editor library

目前广泛运用的胞嘧啶碱基编辑器(CBE)理论上主要会形成三种类型的DNA损伤缺口,见图1所示,常用的CBEs(Huang et al.,2021),例如AID-SaBE3,会在sgRNA非靶向链编辑产生尿嘧啶(U),具有部分切割活性的nCas9在靶向链切割产生一个缺口,同时融合表达的UGI组分可以抑制尿嘧啶的切除;CGBEs(Kurt et al.,2021;Zhao et al.,2021),例如AID-SaBE1n,产生的损伤缺口与CBEs类似,但是这类工具不限制、甚至促进U的切除;dCBEs(Hess et al.,2017;Komor et al.,2016),例如AID-SaBE1,仅在非靶向链编辑产生U。The currently widely used cytosine base editor (CBE) will theoretically form three types of DNA damage gaps, as shown in Figure 1. Commonly used CBEs (Huang et al., 2021), such as AID-SaBE3, will The non-targeted strand editing of sgRNA produces uracil (U), nCas9 with partial cleavage activity generates a nick in the targeted strand cleavage, and the fusion-expressed UGI component can inhibit the excision of uracil; CGBEs (Kurt et al., 2021; Zhao et al., 2021), such as AID-SaBE1n, which produces damage gaps similar to CBEs, but such tools do not limit or even promote the resection of U; dCBEs (Hess et al., 2017; Komor et al., 2016), such as AID-SaBE1, produces U only in non-targeted strand editing.

结合SpCas9和基于SaCas9的SaBE的正交筛选体系,以AID突变体作为胞苷脱氨酶组分,构建了3种碱基编辑器,包括CBE(AID-SaBE3)、CGBE(AID-SaBE1n)和dCBE(AID-SaBE1)。An orthogonal screening system combining SpCas9 and SaCas9-based SaBE, using AID mutants as cytidine deaminase components, constructed three base editors, including CBE (AID-SaBE3), CGBE (AID-SaBE1n) and dCBE(AID-SaBE1).

1)首先通过SpCas9靶向切割513个DNA损伤修复因子,得到不同因子缺陷的细胞文库,同时稳定表达了位于sgRNA茎环结构下游的特定SaBE-Test序列,详见图2中A。1) First, SpCas9 targeted cleavage of 513 DNA damage repair factors to obtain cell libraries deficient in different factors. At the same time, the specific SaBE-Test sequence located downstream of the sgRNA stem-loop structure was stably expressed. See A in Figure 2 for details.

2)然后通过转染SaBEs靶向编辑SaBE-Test序列,经过同一个扩增子同时扩增SpCas9的sgRNA以及被编辑的SaBE-Test序列,使得DNA损伤修复因子的sgRNA表达情况与不同的碱基编辑产物一一对应,详见图2中A。2) Then target and edit the SaBE-Test sequence by transfecting SaBEs, and simultaneously amplify the SpCas9 sgRNA and the edited SaBE-Test sequence through the same amplicon, so that the sgRNA expression of the DNA damage repair factor can be compared with different bases The editing products correspond one to one, see A in Figure 2 for details.

将得到的3种碱基编辑器分别转染的细胞文库,每个碱基编辑器有2个重复试验。The obtained cell libraries were transfected with the three base editors respectively, and each base editor had two repeated experiments.

1.2结果分析1.2 Result analysis

1.2.1胞嘧啶单碱基编辑器编辑产物由不同的DNA损伤介导生成1.2.1 Cytosine single base editor editing products are generated by different DNA damage-mediated

分别对3种碱基编辑器编辑的共1亿个细胞进行了平均为96.5百万读取序列的深度测序,实现低频产物(1%)所有sgRNA文库(5985个sgRNA)约200倍的覆盖率。A total of 100 million cells edited by three base editors were deeply sequenced with an average of 96.5 million read sequences, achieving approximately 200-fold coverage of all sgRNA libraries (5985 sgRNAs) with low-frequency products (1%). .

从近1亿个测序序列中抽取含有阴性对照sgRNA的序列(约600万条/每个工具)作为野生型对照组,绘制了每种SaBE的碱基替换突变概况,包括碱基替换突变频率、位置、以及突变类型的图谱(C到T/G/A)。Sequences containing negative control sgRNA (approximately 6 million per tool) were extracted from nearly 100 million sequencing sequences as wild-type controls, and the base substitution mutation profile of each SaBE was drawn, including the frequency of base substitution mutations, Map of location, and mutation type (C to T/G/A).

根据不同的DNA损伤模式,三种Sa-CBE产生了不同的碱基替换突变图谱,结果见图2。According to different DNA damage patterns, the three Sa-CBEs produced different base substitution mutation patterns. The results are shown in Figure 2.

从图2中B图可知,CBE(AID-SaBE3)主要在编辑窗口内产生C>T的碱基转换突变。As can be seen from Panel B in Figure 2, CBE (AID-SaBE3) mainly produces C>T base conversion mutations within the editing window.

从图2中C图可知,CGBE(AID-SaBE1n)增加了编辑窗口内的C>A/G碱基颠换突变频率。As can be seen from Panel C in Figure 2, CGBE (AID-SaBE1n) increases the frequency of C>A/G base transversion mutations within the editing window.

从图2中D图可知,dCBE(AID-SaBE1)在编辑窗口内甚至sgRNA间隔序列以外还具有较高频率的碱基替换突变,并且具有更高比例的C>A/G碱基颠换突变。As can be seen from Figure 2, D, dCBE (AID-SaBE1) has a higher frequency of base substitution mutations within the editing window and even outside the sgRNA spacer sequence, and has a higher proportion of C>A/G base transversion mutations. .

综合说明,基于AID的胞嘧啶SaBEs的碱基替换突变情况与预期一致。In summary, the base substitution mutations of AID-based cytosine SaBEs are consistent with expectations.

1.2.2深入分析低频碱基编辑产物碱基颠换突变、删除突变1.2.2 In-depth analysis of base transversion mutations and deletion mutations in low-frequency base editing products

从1.2.1的结果可知,CGBE(AID-SaBE1n)和dCBE(AID-SaBE1)会产生更多的碱基颠换突变,进一步分析了在SaBE-Test序列中的碱基替换突变概况。From the results of 1.2.1, it can be seen that CGBE (AID-SaBE1n) and dCBE (AID-SaBE1) will produce more base transversion mutations. The profile of base substitution mutations in the SaBE-Test sequence was further analyzed.

在SaBE-Test序列中,C>G和C>A两种碱基颠换突变的图谱不一样,从五个主要编辑位点来看,C>G倾向于较高频率地发生在特定位点,例如编辑窗口内靠近PAM的位置(AID-SaBE1n和AID-SaBE3),或者是靶向链上的sgRNA间隔区以外的位点(AID-SaBE1),而C>A的产生位置没有表现出任何的偏好性,说明C>G和C>A可能通过不同的DNA损伤修复途径产生。In the SaBE-Test sequence, the patterns of base transversion mutations C>G and C>A are different. Judging from the five main editing sites, C>G tends to occur more frequently at specific sites. , such as positions close to PAM within the editing window (AID-SaBE1n and AID-SaBE3), or positions outside the sgRNA spacer on the targeting strand (AID-SaBE1), while the position where C>A is generated does not show any The preference indicates that C>G and C>A may be produced through different DNA damage repair pathways.

深度测序显示在CBE编辑产物中存在大量碱基删除和插入突变,结果见图3。Deep sequencing showed that there were a large number of base deletions and insertion mutations in the CBE editing products. The results are shown in Figure 3.

从图3中A可知,CBE(AID-SaBE3)编辑产物中大约含有2.7%小片段删除突变。It can be seen from A in Figure 3 that the CBE (AID-SaBE3) editing product contains approximately 2.7% small fragment deletion mutations.

从图3中B可知,在CGBE(AID-SaBE1n)的编辑产物中删除突变的比例增加到了50.6%。As shown in Figure 3B, the proportion of deleted mutations in the edited product of CGBE (AID-SaBE1n) increased to 50.6%.

从图3中C可知,在dCBE(AID-SaBE1)的编辑产物中删除突变占比20.1%。As can be seen from C in Figure 3, the proportion of deleted mutations in the edited product of dCBE (AID-SaBE1) is 20.1%.

进一步,对1-bp和>1bp的删除突变进行分析的结果。Further, the results of the analysis of 1-bp and >1-bp deletion mutations were analyzed.

从图3中D可知,1-bp的删除突变经常发生在靶向编辑的碱基上,dCBE(AID-SaBE1)在sgRNA间隔区以外的区域产生的碱基删除突变频率相对较高。As can be seen from D in Figure 3, 1-bp deletion mutations often occur on targeted edited bases, and the frequency of base deletion mutations produced by dCBE (AID-SaBE1) in regions other than the sgRNA spacer is relatively high.

图3中E-G可知,为对于>1bp的删除突变进行分析的结果。通过统计发生的删除突变的起始和终止位点,碱基删除突变图谱显示起始位点常与编辑位点对应,而终止位点与靶向链上的部分C相对应。As can be seen from E to G in Figure 3, they are the results of analysis of deletion mutations >1 bp. By counting the start and stop sites of deletion mutations that occur, the base deletion mutation map shows that the start site often corresponds to the editing site, while the stop site corresponds to part of the C on the target strand.

从图3中F和G可知,对于>1bp的删除突变,CGBE(AID-SaBE1n)或者CBE(AID-SaBE3)中的nCas9切割酶活性会使其删除突变的终止位点向PAM序列发生偏移。It can be seen from F and G in Figure 3 that for deletion mutations >1 bp, the nCas9 cleavage enzyme activity in CGBE (AID-SaBE1n) or CBE (AID-SaBE3) will shift the termination site of the deletion mutation toward the PAM sequence. .

综上所述,通过深度测序剖析了三种不同胞嘧啶碱基编辑器的碱基删除突变频率和图谱,揭示了不同类型碱基编辑器编辑产生碱基删除突变的具体机制可能并不相同,随后我们进一步深入研究了其机制。In summary, deep sequencing was used to analyze the base deletion mutation frequencies and patterns of three different cytosine base editors, revealing that the specific mechanisms of base deletion mutations produced by different types of base editors may be different. We then further investigated its mechanism.

1.2.3胞嘧啶碱基编辑器介导的DNA损伤修复通路概况1.2.3 Overview of DNA damage repair pathways mediated by cytosine base editors

深度测序的数据首先需要经过一系列的质量控制(QC)步骤,如sgRNA的覆盖率与不同sgRNA的总体分布情况等,然后以SaBE-Test原始序列作为参考序列,将测序的序列进一步分为无突变组、C>T组、C>G组、C>A组、碱基删除突变组和碱基插入突变组。The deep sequencing data first needs to go through a series of quality control (QC) steps, such as the coverage of sgRNA and the overall distribution of different sgRNAs. Then, the SaBE-Test original sequence is used as the reference sequence, and the sequenced sequence is further divided into Mutation group, C>T group, C>G group, C>A group, base deletion mutation group and base insertion mutation group.

为了剖析碱基编辑产物相关的DNA损伤修复途径,首先通过对比碱基编辑突变组和无突变组筛选富集基因,并用GO数据库的信息分析了它们的功能,In order to analyze the DNA damage repair pathways related to base editing products, we first screened the enriched genes by comparing the base editing mutation group and the non-mutation group, and analyzed their functions using the information from the GO database.

1)统计了不同的DNA损伤修复途径,如碱基切除修复通路(BER)、错配修复通路(MMR)和核苷酸切除修复通路(NER)与胞嘧啶碱基编辑器的相关性。具体为统计影响碱基编辑效率的富集基因,经过GO富集分析,得出不同通路富集的显著性p-value,结果见图4。1) Statistics on the correlation between different DNA damage repair pathways, such as base excision repair pathway (BER), mismatch repair pathway (MMR), and nucleotide excision repair pathway (NER), and cytosine base editors. Specifically, the enriched genes that affect base editing efficiency were statistically analyzed. After GO enrichment analysis, the significant p-value of enrichment in different pathways was obtained. The results are shown in Figure 4.

从图4中A可知,不同的DNA损伤修复途径,包括碱基切除修复通路(BER)、错配修复通路(MMR)和核苷酸切除修复通路(NER),均在胞嘧啶碱基编辑中发挥不同的作用。碱基切除修复通路(BER)在胞嘧啶碱基编辑中主要起抑制作用。As can be seen from A in Figure 4, different DNA damage repair pathways, including base excision repair pathway (BER), mismatch repair pathway (MMR) and nucleotide excision repair pathway (NER), are all involved in cytosine base editing. play different roles. The base excision repair pathway (BER) mainly plays an inhibitory role in cytosine base editing.

2)对三种胞嘧啶SaBE的碱基编辑过程中富集的基因进行了相关性分析,结果见图4。2) Correlation analysis was performed on the genes enriched during the base editing process of three cytosine SaBEs. The results are shown in Figure 4.

从图4中B可知,对三种胞嘧啶SaBE的碱基编辑过程中富集的基因做相关性分析,可以发现CGBE和dCBE富集的基因具有较高的相关性(r=0.66)。As can be seen from B in Figure 4, a correlation analysis of the genes enriched during the base editing process of the three cytosine SaBEs revealed that the genes enriched by CGBE and dCBE have a high correlation (r=0.66).

从图4中C可知,CGBE和CBE富集的基因相关性较弱(r=0.39)。It can be seen from C in Figure 4 that the correlation between genes enriched by CGBE and CBE is weak (r=0.39).

综合图4中B和C可知,去除尿嘧啶(CGBE/dCBE)抑或是保留尿嘧啶(CBE)是导致DNA损伤修复因子不同富集结果的关键因素。Based on B and C in Figure 4, it can be seen that removing uracil (CGBE/dCBE) or retaining uracil (CBE) is the key factor leading to different enrichment results of DNA damage repair factors.

从图4中B可知,NER因子包括XPA、USP7和ERCC4,在碱基编辑突变组中富集。As shown in Figure 4B, NER factors include XPA, USP7 and ERCC4, which are enriched in the base editing mutation group.

3)统计了编码糖基化酶及其下游因子的基因的富集情况,结果见图4。3) The enrichment of genes encoding glycosylases and their downstream factors was statistically analyzed. The results are shown in Figure 4.

从图4中B和C可知,在无突变组中富集,但值得注意的是,SMUG1是特殊的促进CBE(AID-SaBE3)编辑生成C>T碱基转换突变的因子,这可能由新近报道的SMUG1在诱导突变产生中发挥作用有关(Taglialatela et al.,2021)。As can be seen from B and C in Figure 4, it is enriched in the non-mutation group, but it is worth noting that SMUG1 is a special factor that promotes CBE (AID-SaBE3) editing to generate C>T base conversion mutations, which may be caused by the recent SMUG1 has been reported to play a role in inducing mutations (Taglialatela et al., 2021).

4)进一步在其中选取了58个在无碱基突变组中显著富集的基因进行聚类分析,结果见图4。4) We further selected 58 genes that were significantly enriched in the abasic mutation group for cluster analysis. The results are shown in Figure 4.

从图4中D可知,对其编辑产物进行聚类分析,可以看出根据碱基插入/删除突变的结果可以明显看出DNA双链断裂修复(DSBR)和非同源性末端连接(NHEJ)因子的聚集,这些因子缺失的细胞中碱基插入/删除突变显著减少。As can be seen from D in Figure 4, cluster analysis of the editing products shows that DNA double-strand break repair (DSBR) and non-homologous end joining (NHEJ) can be clearly seen based on the results of base insertion/deletion mutations. Aggregation of factors, and base insertion/deletion mutations are significantly reduced in cells lacking these factors.

5)不同DNA损伤修复通路基因在CRISPR-Cas9文库筛选实验中,经对比C>T组与无突变组后的基因富集情况,结果见图5。5) In the CRISPR-Cas9 library screening experiment, the gene enrichment of different DNA damage repair pathway genes was compared between the C>T group and the non-mutation group. The results are shown in Figure 5.

图5中A为碱基切除修复通路(BER)因子在编辑突变组和无突变组对比后的富集分析结果图。A in Figure 5 shows the enrichment analysis results of base excision repair pathway (BER) factors after comparing the editing mutation group and the non-mutation group.

从图5中B可知,错配修复通路(MMR)在碱基编辑中发挥的作用较弱,这一点与MMR在多样化的AID缺陷型小鼠B淋巴细胞(Rada et al.,2004)或先导型碱基编辑器(Primeediting)(Chen et al.,2021b)中发挥的重要作用不同,这可能源于HEK293T细胞系本身MMR存在缺陷(Trojan et al.,2002),导致MMR因子无法在本发明的体系中被筛选出来。As shown in Figure 5B, the mismatch repair pathway (MMR) plays a weak role in base editing, which is consistent with the role of MMR in diverse AID-deficient mouse B lymphocytes (Rada et al., 2004) or The important role played by the lead base editor (Primediting) (Chen et al., 2021b) is different, which may be due to the defective MMR of the HEK293T cell line itself (Trojan et al., 2002), resulting in the inability of MMR factors to were screened out from the invented system.

从图5中C可知,核苷酸切除修复通路(NER)促进了胞嘧啶碱基编辑过程。It can be seen from Figure 5, C, that the nucleotide excision repair pathway (NER) promotes the cytosine base editing process.

碱基颠换突变与碱基转换突变分别聚集,表明碱基转换和颠换存在不同的DNA损伤修复途径。从基因的角度来看,我们也发现功能相似的基因往往会聚集在一起,例如:53BP1依赖的DSBR因子显示出明显的聚集。Base transversion mutations and base conversion mutations clustered separately, indicating that there are different DNA damage repair pathways for base conversion and transversion. From a genetic perspective, we also found that genes with similar functions tend to cluster together. For example, 53BP1-dependent DSBR factors show obvious clustering.

总的来说,胞嘧啶碱基编辑器编辑结果的文库筛选概况为分析其相关的DNA损伤修复机制提供了丰富的资源。Overall, the library screening overview of cytosine base editor editing results provides a rich resource for analyzing their related DNA damage repair mechanisms.

1.2.4碱基编辑器介导DSB进而生成小片段碱基缺失突变1.2.4 Base editor mediates DSB to generate small base deletion mutations

通过胞嘧啶碱基编辑结果相关的DNA损伤修复基因文库筛选的超深度测序,能够深入分析胞嘧啶碱基编辑过程中低频小片段碱基删除突变发生的分子机制。Ultra-deep sequencing of DNA damage repair gene library screening related to cytosine base editing results can provide in-depth analysis of the molecular mechanism of low-frequency small fragment base deletion mutations during the cytosine base editing process.

1)通过对比碱基删除突变组与C>T碱基替代组的基因富集情况,结果见图6。1) By comparing the gene enrichment of the base deletion mutation group and the C>T base substitution group, the results are shown in Figure 6.

从图6中A可知,BER、DSBR和NHEJ因子可以促进碱基删除突变的产生。It can be seen from A in Figure 6 that BER, DSBR and NHEJ factors can promote the generation of base deletion mutations.

从图6中B可知,BER因子UNG和APEX1介导碱基编辑器的脱氨基产物转换为DSB,进而激活DSBR和末端连接通路,这些小片段删除突变可能是Artemis依赖的末端切除修复的产物(Lobrich and Jeggo,2017)。It can be seen from Figure 6, B, that BER factors UNG and APEX1 mediate the conversion of the deamination product of the base editor into DSB, thereby activating the DSBR and end joining pathways. These small fragment deletion mutations may be the products of Artemis-dependent end excision repair ( Lobrich and Jeggo, 2017).

2)为了分析不同胞嘧啶碱基编辑器在不同内源基因组位点的碱基删除突变图谱,选取了基于SpCas9的胞嘧啶碱基编辑器,包括BE4max(CBE)(Koblan et al.,2018)和三个CGBE,CGBE1(Kurt et al.,2021)、AXC(Koblan et al.,2021)和AID-BE1n,总共选取了4个内源基因组靶向位点用于测试,结果见图6。2) In order to analyze the base deletion mutation patterns of different cytosine base editors at different endogenous genomic sites, SpCas9-based cytosine base editors were selected, including BE4max (CBE) (Koblan et al., 2018) and three CGBEs, CGBE1 (Kurt et al., 2021), AXC (Koblan et al., 2021) and AID-BE1n, a total of 4 endogenous genome targeting sites were selected for testing. The results are shown in Figure 6.

从图6中C可知,胞嘧啶碱基编辑器BE4max具有较高的编辑效率,而胞嘧啶至鸟嘌呤碱基编辑器CGBE1、AXC和AID-BE1n则产生更多的C>G碱基颠换突变,且与BE4max相比具有更高的碱基删除突变频率。As can be seen from C in Figure 6, the cytosine base editor BE4max has a higher editing efficiency, while the cytosine to guanine base editors CGBE1, AXC and AID-BE1n produce more C>G base transversions mutations, and has a higher frequency of base deletion mutations than BE4max.

3)通过检测HEK392T细胞系中不同碱基编辑器在内源基因组位点的碱基删除突变图谱,分析删除突变在不同位置发生的频率。3) By detecting the base deletion mutation patterns of different base editors at endogenous genomic sites in the HEK392T cell line, analyze the frequency of deletion mutations occurring at different positions.

从图6中D可知,在这些胞嘧啶碱基编辑器中,BE4max(带有UGI)、CGBE1(带有eUNG)和AXC(带有UdgX)的脱氨酶组分是APOBEC1衍生突变体,而AID-BE1n的脱氨酶组分是AID衍生突变体。It can be seen from D in Figure 6 that among these cytosine base editors, the deaminase components of BE4max (with UGI), CGBE1 (with eUNG), and AXC (with UdgX) are APOBEC1-derived mutants, and The deaminase component of AID-BE1n is an AID-derived mutant.

进一步,检测了四个常用的碱基编辑位点,对每个编辑工具靶向的每个位点,分析了100万个碱基编辑细胞的将近100万测序序列,结果见图6。Furthermore, four commonly used base editing sites were detected, and nearly 1 million sequencing sequences of 1 million base-edited cells were analyzed for each site targeted by each editing tool. The results are shown in Figure 6.

从图6中D可知,CGBE1和AID-BE1n在这3个内源基因组编辑位点均会产生较高频率的碱基删除突变。>1bp的碱基删除突变的起始或者终止位点与碱基编辑位点或者Cas9切割位点具有对应性。As can be seen from D in Figure 6, both CGBE1 and AID-BE1n produce higher frequency base deletion mutations at these three endogenous genome editing sites. The start or end site of the base deletion mutation >1 bp corresponds to the base editing site or Cas9 cleavage site.

综合可知,CGBEs产生的DNA损伤更易被处理为DSB和碱基删除突变。Taken together, it can be seen that DNA damage produced by CGBEs is more easily processed into DSBs and base deletion mutations.

以上共同证明了CGBE在靶向编辑位点会产生较高水平的DNA双链断裂损伤(DSB),这值得引起我们的注意,而且这些DSB可以通过DNA末端连接途径加工成小片段碱基删除突变。The above jointly prove that CGBE will produce higher levels of DNA double-strand breaks (DSBs) at the targeted editing site, which deserves our attention, and these DSBs can be processed into small fragment base deletion mutations through the DNA end joining pathway. .

1.2.5 DSB末端连接和跨损伤修复有助于C>G碱基颠换突变的产生1.2.5 DSB end joining and translesion repair contribute to the generation of C>G base transversion mutations

C>G碱基颠换突变是CGBE的目标产物。为了研究参与C>G碱基颠换突变的分子机制,通过对比C>G和C>T两个组别,分析两个组别的富集基因,结果见图7。C>G base transversion mutation is the target product of CGBE. In order to study the molecular mechanism involved in C>G base transversion mutations, the enriched genes of the two groups were analyzed by comparing the C>G and C>T groups. The results are shown in Figure 7.

1)从图7中A可知,通过筛选发现有三类的基因在C>G碱基颠换的产生中发挥作用,第一类以UNG为代表,这与最近开发的CGBE通过引入UNG作用增加C>G颠换突变的理论一致(Koblan et al.,2021;Kurt et al.,2021;Zhao et al.,2021)。1) From A in Figure 7, it can be seen through screening that there are three types of genes that play a role in the generation of C>G base transversions. The first type is represented by UNG. This is consistent with the recently developed CGBE which increases C by introducing UNG. >The theory of G transversion mutation is consistent (Koblan et al., 2021; Kurt et al., 2021; Zhao et al., 2021).

2)从图7中A可知,第二类促进C>G碱基颠换的是跨损伤修复通路(TLS)相关因子,包括编码PCNA泛素连接酶的RFWD3、RAD18和DTL三个基因,以及编码TLS聚合酶的基因REV1。2) From A in Figure 7, it can be seen that the second type of factors that promote C>G base transversion are translesion repair pathway (TLS)-related factors, including the three genes RFWD3, RAD18 and DTL encoding PCNA ubiquitin ligase, and The gene REV1 encoding TLS polymerase.

3)第三类参与C>G碱基颠换过程的是编码两种核酸内切酶的基因以及NHEJ通路的基因。3) The third category involved in the C>G base transversion process is the genes encoding two endonucleases and the genes of the NHEJ pathway.

从图7中A可知,编码两种核酸内切酶的基因为APEX1和XPG。APEX1在BER通路中负责切割AP位点,进而生成单链切口(Robson and Hickson,1991;Seki et al.,1991),而XPG是作为NER通路因子,是一种具有结构特异性的核酸内切酶,负责切割气泡结构双链DNA的3’端(O'Donovan et al.,1994)。It can be seen from A in Figure 7 that the genes encoding two endonucleases are APEX1 and XPG. APEX1 is responsible for cutting the AP site in the BER pathway, thereby generating a single-stranded nick (Robson and Hickson, 1991; Seki et al., 1991), while XPG acts as a NER pathway factor and is a structurally specific endonucleic acid cleavage enzyme responsible for cutting the 3' end of the bubble-structured double-stranded DNA (O'Donovan et al., 1994).

从图7中B可知,说明APEX1和/或XPG可以将碱基编辑器产生的非平末端DNA损伤转换为DSB,损伤末端通过填充碱基进而产生C>G的碱基颠换突变。As can be seen from Figure 7B, APEX1 and/or XPG can convert the non-blunt DNA damage generated by the base editor into DSB, and the damaged end is filled with bases to generate a C>G base transversion mutation.

从图7中A可知,三个核心NHEJ因子XRCC4、XLF和LIG4的富集说明其可能在C>G碱基颠换突变中发挥促进作用。As can be seen from A in Figure 7 , the enrichment of the three core NHEJ factors XRCC4, XLF, and LIG4 indicates that they may play a promoting role in C>G base transversion mutations.

为了验证这一模型,在NHEJ因子缺陷的细胞中进行了C>G的碱基编辑,To test this model, C>G base editing was performed in cells deficient in NHEJ factors.

从图7中C可知,在NHEJ因子XRCC4或者TLS因子REV1缺陷的HEK293T细胞系中,C>G突变占比降低,同时伴随碱基删除突变的频率也减少。由此可见,除了前面观测到的TLS通路介导的C>G颠换突变(Koblan et al.,2021),发现了一个新的NHEJ依赖性的C>G突变生成途径,此过程包含DSB的生成和末端连接过程。As can be seen from Figure 7, C, in the HEK293T cell line deficient in NHEJ factor XRCC4 or TLS factor REV1, the proportion of C>G mutations is reduced, and the frequency of accompanying base deletion mutations is also reduced. It can be seen that, in addition to the previously observed TLS pathway-mediated C>G transversion mutations (Koblan et al., 2021), a new NHEJ-dependent C>G mutation generation pathway has been discovered, and this process includes DSB Generation and end-joining processes.

为了检测不同DNA靶向编辑序列对C>G颠换突变产生的影响,测试了4种工具在野生型、XRCC4(NHEJ因子)敲除和REV1(TLS因子)敲除的HEK293T细胞系着中靶向6个sgRNA的10个碱基位点的突变情况(图7D)。In order to detect the impact of different DNA targeted editing sequences on C>G transversion mutations, four tools were tested on wild-type, XRCC4 (NHEJ factor) knockout and REV1 (TLS factor) knockout HEK293T cell lines. Mutation status to 10 base sites of 6 sgRNAs (Figure 7D).

从图7中D可知,结果表明TLS会影响所有6个检测sgRNA的C>G颠换频率,而NHEJ则仅在特定位点(FANCF)上对C>G颠换突变频率有明显的促进作用。As can be seen from D in Figure 7, the results show that TLS affects the C>G transversion frequency of all six tested sgRNAs, while NHEJ only significantly promotes the C>G transversion mutation frequency at a specific site (FANCF). .

总的来说,说明了顺式的碱基编辑靶向DNA序列和反式的DNA损伤修复因子共同调节了CGBE的碱基编辑结果。In summary, it is demonstrated that the cis-base editing target DNA sequence and the trans-DNA damage repair factors jointly regulate the base editing results of CGBE.

1.2.6 CGBEs会产生染色易位1.2.6 CGBEs can cause chromatic translocation

为了检测胞嘧啶碱基编辑器产生的DSB是否会介导染色体易位的形成,通过调整一种已有高通量全基因组易位测序方法(HTGTS)来更高效地捕获罕见的染色体易位事件。To examine whether DSBs generated by cytosine base editors mediate the formation of chromosomal translocations, an existing high-throughput genome-wide translocation sequencing (HTGTS) method was adapted to capture rare chromosomal translocation events more efficiently. .

新设计的Tn5-HTGTS方法结合了Tn5转位酶标记方法和染色体易位克隆方法,Tn5-HTGTS方法详见图8中A,可以检测到单碱基编辑器靶向编辑位点来源的全基因组易位图谱。The newly designed Tn5-HTGTS method combines the Tn5 translocase labeling method and the chromosomal translocation cloning method. The Tn5-HTGTS method is detailed in Figure 8, A, and can detect the entire genome derived from the single base editor targeted editing site. Translocation map.

为了验证胞嘧啶碱基编辑器在内源基因组中产生的染色体易位情况,对转染了基于SpCas9的胞嘧啶碱基编辑器和4个内源基因靶向sgRNA的细胞样品进行Tn5-HTGTS实验。In order to verify the chromosomal translocation caused by the cytosine base editor in the endogenous genome, Tn5-HTGTS experiments were performed on cell samples transfected with the SpCas9-based cytosine base editor and 4 endogenous gene-targeting sgRNAs. .

从图8中B可知,与BE4max(购买于Addgene#112093)(B中左边circos图)相比,包括CGBE1和AID-BE1n在内的CGBE会产生了较多的染色体易位。As can be seen from Figure 8, B, compared with BE4max (purchased from Addgene #112093) (circos diagram on the left in B), CGBE including CGBE1 and AID-BE1n will produce more chromosomal translocations.

从图8中C可知,CGBE1样品中平均每1000个输入细胞可以检测到3个染色体易位,而对于AID-BE1n样品中平均每1000个输入细胞可以检测到8个染色体易位。As can be seen from Figure 8 C, an average of 3 chromosomal translocations per 1000 input cells can be detected in the CGBE1 sample, while an average of 8 chromosomal translocations can be detected per 1000 input cells in the AID-BE1n sample.

从图8中D可知,在Cas-OT位点,AID-BE1n比CGBE1产生数量更多、连接范围更广泛的染色体易位,说明碱基编辑器的不同脱氨酶组分产生的染色体易位具有不同的特征(Liuet al.,2018),AID-BE1n产生的染色体易位在由包括转录和超级增强子在内的染色质特征定义的AID-OT位点(Meng et al.,2014;Qian et al.,2014)有聚集现象。It can be seen from D in Figure 8 that at the Cas-OT site, AID-BE1n produces a greater number and a wider range of chromosomal translocations than CGBE1, indicating that the chromosomal translocations produced by different deaminase components of the base editor With distinct characteristics (Liue et al., 2018), AID-BE1n produces chromosomal translocations at AID-OT sites defined by chromatin features including transcription and super-enhancers (Meng et al., 2014; Qian et al., 2014) has aggregation phenomenon.

从图8中E可知,含有APOBEC1组分的CGBE1则没有这种聚集现象。It can be seen from E in Figure 8 that CGBE1 containing the APOBEC1 component does not have this aggregation phenomenon.

目前为止,我们没有检测到任何CGBE1的APOBEC1脱氨酶偏好脱靶位点。To date, we have not detected any APOBEC1 deaminase-preferred off-target sites for CGBE1.

从图8中F可知,根据PRO-seq测序的位置比对结果显示,CGBE1、AID-BE1n产生的染色体易位中分别含有7.4%、8.8%的易位连接落在转录区域,其他测序结果的分析也表明这些易位连接位点与转录活跃、染色体开放和活跃的峰值位点相对应(图8F),说明这些染色体易位极有可能带来不良的影响。As can be seen from F in Figure 8, the position comparison results of PRO-seq sequencing show that 7.4% and 8.8% of the chromosomal translocations generated by CGBE1 and AID-BE1n respectively fall in the transcribed region. Other sequencing results Analysis also showed that these translocation junction sites corresponded to peak sites of active transcription, chromosome openness, and activity (Figure 8F), indicating that these chromosomal translocations are likely to have adverse effects.

为了进一步确定少量的染色体易位是否来源于脱氨酶脱靶位点,通过分析染色体易位连接处附近的短片段序列富集情况,In order to further determine whether a small number of chromosomal translocations originate from deaminase off-target sites, we analyzed the enrichment of short fragments near the junction of chromosomal translocations.

从图8中G可知,发现TC基序(APOBEC1偏好基序)在CGBE1样品中富集。As can be seen from G in Figure 8 , the TC motif (APOBEC1 preferential motif) was found to be enriched in CGBE1 samples.

从图8中G可知,有脱氨酶偏好靶向及回文基序的序列更容易发生染色体易位,例如与TCH(H:非G)相比,CGBE1偏好的TCG在发生染色体易位附近的序列中含量相对较高。在回文基序中,正负两条DNA链上同时发生的脱氨基作用,两条链同时经下游DNA损伤修复因子响应后更容易形成DSB。It can be seen from G in Figure 8 that sequences with deaminase preferential targeting and palindromic motifs are more likely to undergo chromosomal translocation. For example, compared with TCH (H: non-G), CGBE1-preferred TCG is near the occurrence of chromosomal translocation. The sequence content is relatively high. In the palindrome motif, deamination occurs simultaneously on both positive and negative DNA strands, and DSBs are more likely to form after both strands are simultaneously responded to by downstream DNA damage repair factors.

总的来说,从图8中H总结示意图可知,CGBE会在靶向编辑位点和脱靶偏好位点之间产生染色体易位,后者包括Cas-OT和脱氨酶-OT,而脱氨酶偏好的具有回文基序特征的序列发生双链断裂的机率更高。In summary, from the schematic diagram summarized by H in Figure 8, it can be seen that CGBE will produce chromosomal translocations between the target editing site and the off-target preference site. The latter includes Cas-OT and deaminase-OT, while deamination Sequences characterized by palindromic motifs that are preferred by the enzyme have a higher probability of double-strand breaks.

实施例2Example 2

本实施例中,构建过表达Hmces的重组载体,然后与含有核酸切口酶和胞嘧啶脱氨酶的载体形成单碱基编辑系统并进行碱基编辑。具体为:将Hmces基因序列克隆到pcDNA3载体中,得到重组载体pcDNA3-Hmces,然后在24孔板中同时转染碱基编辑器和重组载体pcDNA3-Hmces,构建得到单碱基编辑系统CGBE-plus-HMCES(简称为CGBEpH)。包括如下步骤:In this example, a recombinant vector overexpressing Hmces is constructed, and then combined with a vector containing nucleic acid nickase and cytosine deaminase to form a single base editing system and perform base editing. Specifically: clone the Hmces gene sequence into the pcDNA3 vector to obtain the recombinant vector pcDNA3-Hmces, and then simultaneously transfect the base editor and the recombinant vector pcDNA3-Hmces in a 24-well plate to construct a single base editing system CGBE-plus. -HMCES (referred to as CGBEpH). Includes the following steps:

2.1、重组载体pcDNA3-Hmces的构建2.1. Construction of recombinant vector pcDNA3-Hmces

2.1.1设计以及合成引物2.1.1 Design and synthesize primers

委托北京擎科生物科技有限公司合成用于向Hmces序列上添加同源臂的引物。Hmces的氨基酸序列包括如SEQ ID NO.1所示序列,核苷酸序列如SEQ ID NO.2所示序列。正反向引物如下:Beijing Qingke Biotechnology Co., Ltd. was entrusted to synthesize primers for adding homology arms to the Hmces sequence. The amino acid sequence of Hmces includes the sequence shown in SEQ ID NO.1, and the nucleotide sequence is shown in SEQ ID NO.2. The forward and reverse primers are as follows:

用于扩增小鼠Hmces第一段序列的正反向引物(Q-CMV-F和Hmces-EcoRI-R):Forward and reverse primers (Q-CMV-F and Hmces-EcoRI-R) used to amplify the first sequence of mouse Hmces:

Q-CMV-F:AGGCGTGTACGGTGGGAGGT(SEQ ID NO.22)Q-CMV-F:AGGCGTGTACGGTGGGAGGT(SEQ ID NO.22)

Hmces-EcoRI-R:CAGGACGTTCGCCCGCACATggtggcgaattccagcacactggcgg(SEQ IDNO.23)Hmces-EcoRI-R: CAGGACGTTCCGCCCGCACATggtggcgaattccagcacactggcgg (SEQ IDNO.23)

用于扩增小鼠Hmces第二段序列的正反向引物(Hmces-F和Hmces-XhoI-R):Forward and reverse primers (Hmces-F and Hmces-XhoI-R) used to amplify the second sequence of mouse Hmces:

Hmces-F:ATGTGCGGGCGAACGTCCTG(SEQ ID NO.24)Hmces-F: ATGTGCGGGCGAACGTCCTG (SEQ ID NO.24)

Hmces-XhoI-R:tccttgtagtcctcgagGCTGTTAGGCTTCTTGGCCA(SEQ ID NO.25)Hmces-XhoI-R:tccttgtagtcctcgagGCTGTTAGGCTTCTTGGCCA (SEQ ID NO. 25)

2.1.2获得带有同源臂的Hmces序列2.1.2 Obtain Hmces sequence with homology arms

用ddH2O将本实施例中步骤2.1.1的正反向引物溶解至100M,并稀释至10M。Use ddH 2 O to dissolve the forward and reverse primers in step 2.1.1 in this example to 100M and dilute to 10M.

将正反向引物加入如下反应体系,反应体系见表1,进行PCR反应,得到PCR产物。Add the forward and reverse primers to the following reaction system. The reaction system is shown in Table 1. Perform the PCR reaction to obtain the PCR product.

表1Table 1

物质substance 添加量Adding amount 本实施例中经稀释后的正反向引物In this example, the diluted forward and reverse primers 各1μL1μL each Template(cDNA)Template(cDNA) 1μL(约500ng)1μL (about 500ng) HF buffer(5×)HF buffer(5×) 5μL5μL Phusion(F530,Thermo Fisher)Phusion (F530, Thermo Fisher) 0.25μL0.25μL dNTPsdNTPs 0.5μL0.5μL ddH2OddH 2 O 至25μLto 25μL

PCR反应程序:PCR reaction procedure:

胶回收纯化PCR产物片段,按照通用型DNA纯化回收试剂盒(TIANGEN)说明书纯化,得到含有同源臂的Hmces。The PCR product fragment was purified by gel and purified according to the instructions of the universal DNA purification and recovery kit (TIANGEN) to obtain Hmces containing homology arms.

2.1.3酶切质粒pcDNA32.1.3 Restriction digestion of plasmid pcDNA3

用EcoRI和XhoI限制性内切酶酶切质粒pcDNA3,酶切反应体系如下表2,得到酶切产物。Use EcoRI and XhoI restriction endonucleases to digest plasmid pcDNA3. The digestion reaction system is as shown in Table 2 to obtain the digestion product.

表2Table 2

物质substance 添加量Adding amount NEB rCutSmart buffer(10×)NEB rCutSmart buffer(10×) 5μL5μL EcoRIEcoRI 1.5μL1.5μL XhoIikB 1.5μL1.5μL 质粒pcDNA3Plasmid pcDNA3 5μL(约5μg)5μL (approximately 5μg) ddH2OddH 2 O 至50μLto 50μL

反应条件:37℃水浴3h。Reaction conditions: 37°C water bath for 3 hours.

胶回收纯化酶切产物片段,按照通用型DNA纯化回收试剂盒(TIANGEN)说明书纯化,得到酶切后的质粒pcDNA3。Gel recovery and purification of the enzyme digestion product fragment was carried out according to the instructions of the universal DNA purification and recovery kit (TIANGEN) to obtain the digested plasmid pcDNA3.

2.1.4连接酶切后的质粒与目的片段2.1.4 Ligase digested plasmid and target fragment

连接本实施例中步骤2.1.3得到的酶切后的质粒pcDNA3与本实施例中步骤2.1.2得到的带有同源臂的DNA片段,得到连接产物,连接体系见表3。Connect the digested plasmid pcDNA3 obtained in step 2.1.3 in this example and the DNA fragment with homology arms obtained in step 2.1.2 in this example to obtain a ligation product. The ligation system is shown in Table 3.

表3table 3

物质substance 添加量Adding amount 2×Hieff Clone Enzyme Premix(YEASEN)2×Hieff Clone Enzyme Premix(YEASEN) 5μL5μL 本实施例中步骤2.1.3得到的经酶切后的pcDNA3质粒The digested pcDNA3 plasmid obtained in step 2.1.3 in this example 2μL2μL 本实施例中步骤2.1.2得到的含有同源臂的HmcesHmces containing homology arms obtained in step 2.1.2 in this example 3μL3μL ddH2OddH 2 O 至10μLto 10μL

反应条件:50℃,30min。Reaction conditions: 50℃, 30min.

2.1.5转化连接产物2.1.5 Transformation of ligation products

使用DH5α感受态对本实施例中步骤2.1.4得到的连接产物进行转化,在含氨苄抗生素(Amp,100mg/L)LB平板培养过夜,37℃。Use DH5α competent to transform the ligation product obtained in step 2.1.4 in this example, and culture it on an LB plate containing ampicillin antibiotic (Amp, 100 mg/L) overnight at 37°C.

2.1.6挑取单克隆测序2.1.6 Pick single clones for sequencing

从本实施例中步骤2.1.5中的氨苄抗生素LB平板上挑取单菌落,接至LB(Amp,100mg/L)液体培养基中培养过夜。质粒提取,按照质粒小提试剂盒(TIANGEN)说明提取。提取后的质粒送北京擎科生物科技有限公司上海测序部测序。Pick a single colony from the ampicillin antibiotic LB plate in step 2.1.5 in this example, transfer it to LB (Amp, 100 mg/L) liquid medium and culture it overnight. Plasmid extraction was performed according to the instructions of plasmid miniprep kit (TIANGEN). The extracted plasmid was sent to the Shanghai Sequencing Department of Beijing Qingke Biotechnology Co., Ltd. for sequencing.

2.1.7测序成功质粒进行中抽2.1.7 The plasmid was successfully sequenced and extracted.

测序成功的质粒用DH5α感受态细菌重新转化,在含Amp(100mg/L)的LB平板培养过夜,37℃。挑取单菌落接到300ml LB(Amp,100mg/L)液体培养基37℃培养过夜。收集细菌,按照质粒中抽试剂盒(MACHEREY-NAGEL)说明提取质粒,得到重组载体pcDNA3-Hmces。The successfully sequenced plasmid was re-transformed with DH5α competent bacteria and cultured on an LB plate containing Amp (100 mg/L) overnight at 37°C. Pick a single colony and inoculate it into 300ml LB (Amp, 100mg/L) liquid medium and culture it at 37°C overnight. Collect the bacteria and extract the plasmid according to the instructions of the plasmid extraction kit (MACHEREY-NAGEL) to obtain the recombinant vector pcDNA3-Hmces.

2.2、碱基编辑器质粒制备2.2. Base editor plasmid preparation

从Addgene订购质粒CGBE1,含有eUNG、APOBEC1和nCas9表达元件。Order plasmid CGBE1 from Addgene, containing eUNG, APOBEC1 and nCas9 expression elements.

同时,以质粒PCMV-BE3(购买于Addgene#73020)为基础,对其进行改造,去掉了其中的UGI组分,并将APOBEC1组分替换为AID,含有AID和nCas9表达元件,构建获得质粒AID-BE1n。At the same time, the plasmid PCMV-BE3 (purchased from Addgene #73020) was modified, the UGI component was removed, and the APOBEC1 component was replaced with AID, containing AID and nCas9 expression elements, to construct the plasmid AID -BE1n.

对于表达sgRNA的载体构建,于北京擎科生物科技有限公司合成sgRNA正反向序列后,通过加热煮沸后自然冷却,获得正反向序列配对结合的双链核苷酸序列,然后与BbsI酶切PX33R质粒的产物相连接(T4连接酶),连接产物转化至DH5α感受态后,在含氨苄抗生素(Amp,100mg/L)LB平板37℃培养过夜,挑取单菌落在LB培养基扩增培养后,按照质粒小提试剂盒(购买于TIANGEN)说明提取质粒,经北京擎科生物科技有限公司上海测序部测序,比对正确后获得表达sgRNA的载体。For the construction of vectors for expression of sgRNA, after the forward and reverse sequences of sgRNA were synthesized at Beijing Qingke Biotechnology Co., Ltd., they were heated, boiled and then cooled naturally to obtain a double-stranded nucleotide sequence in which the forward and reverse sequences were paired and combined, and then digested with BbsI The products of the PX33R plasmid are ligated (T4 ligase). After the ligation product is transformed into the DH5α competent state, it is cultured on an LB plate containing ampicillin antibiotic (Amp, 100 mg/L) overnight at 37°C. Single colonies are picked and amplified in the LB medium. Afterwards, the plasmid was extracted according to the instructions of the plasmid miniprep kit (purchased from TIANGEN), and sequenced by the Shanghai Sequencing Department of Beijing Qingke Biotechnology Co., Ltd. After correct comparison, the vector expressing sgRNA was obtained.

将质粒CGBE1、质粒AID-BE1n或者表达sgRNA的载体分别转化至DH5α感受态细菌中,在含氨苄抗生素(Amp,100mg/L)LB平板培养过夜,37℃。Plasmid CGBE1, plasmid AID-BE1n or vector expressing sgRNA were transformed into DH5α competent bacteria, and cultured on LB plates containing ampicillin antibiotic (Amp, 100 mg/L) overnight at 37°C.

挑取单菌落接到300mlLB(Amp,100mg/L)液体培养基37℃培养过夜。Pick a single colony and inoculate it into 300ml LB (Amp, 100mg/L) liquid medium and culture it at 37°C overnight.

收集细菌,按照质粒中抽试剂盒(购买于MACHEREY-NAGEL)说明提取质粒。Collect the bacteria and extract the plasmid according to the instructions of the plasmid extraction kit (purchased from MACHEREY-NAGEL).

2.3、用Lipofectamine 2000进行细胞转染2.3. Use Lipofectamine 2000 for cell transfection

1)在转染前16-24小时,将HEK293T细胞以5.5×104个/孔的密度铺至48孔板,在37℃,含有5%CO2细胞培养箱中培养,使得第二天的细胞密度约为60-80%。1) 16-24 hours before transfection, spread HEK293T cells into a 48-well plate at a density of 5.5×10 4 cells/well, and culture them in a 37°C cell culture incubator containing 5% CO 2 so that the next day Cell density is approximately 60-80%.

2)转染前1小时用DMEM完全无抗培养基(加入10%胎牛血清,无青链霉素双抗)进行换液。2) One hour before transfection, replace the medium with DMEM complete anti-antibody medium (added with 10% fetal calf serum, without penicillin and streptomycin antibodies).

3)分别将本实施例中步骤2.2得到的质粒CGBE1和质粒AID-BE1n(各750ng),与本实施例中步骤1.3得到的重组载体pcDNA3-Hmces(500ng),sgRNA(250ng)(其中sgRNA包括FANCF,RNF2,EMX1)通过Lipofectamine 2000(invitrogen)进行细胞转染,每个样品各3个重复。3) Combine the plasmid CGBE1 and plasmid AID-BE1n (each 750ng) obtained in step 2.2 of this example with the recombinant vector pcDNA3-Hmces (500ng) and sgRNA (250ng) obtained in step 1.3 of this example (where sgRNA includes FANCF, RNF2, EMX1) were transfected into cells by Lipofectamine 2000 (invitrogen), with three replicates for each sample.

质粒CGBE1和重组载体pcDNA3-Hmces形成单碱基编辑系统,简称为CGBE1pH;质粒AID-BE1n和和重组载体pcDNA3-Hmces形成单碱基编辑系统,简称为AID-CGBEpH。CGBE1pH和AID-CGBEpH统称为单碱基编辑系统CGBEpH。图10中D为本实施例中构建单碱基编辑系统CGBEpH的示意简图。Plasmid CGBE1 and recombinant vector pcDNA3-Hmces form a single base editing system, referred to as CGBE1pH; plasmid AID-BE1n and recombinant vector pcDNA3-Hmces form a single base editing system, referred to as AID-CGBEpH. CGBE1pH and AID-CGBEpH are collectively referred to as the single base editing system CGBEpH. D in Figure 10 is a schematic diagram of the construction of the single base editing system CGBEpH in this embodiment.

针对不同位点的sgRNAs靶向序列:sgRNAs targeting sequences for different sites:

FANCF:GGAATCCCTTCTGCAGCACC(SEQ ID NO.26)FANCF: GGAATCCCTTCTGCAGCACC(SEQ ID NO.26)

RNF2:GTCATCTTAGTCATTACCTG(SEQ ID NO.27)RNF2:GTCATCTTAGTCATTACCTG(SEQ ID NO.27)

EMX1:GAGTCCGAGCAGAAGAAGAA(SEQ ID NO.28)EMX1:GAGTCCGAGCAGAAGAAGAA(SEQ ID NO.28)

4)转染后三天,收集细胞,提取基因组。4) Three days after transfection, collect the cells and extract the genome.

2.4、制备高通量测序文库2.4. Preparation of high-throughput sequencing libraries

对于用于48孔板转染的细胞样本(>0.3×106个细胞)均用细胞裂解缓冲液(200mMNaCl、50mM Tris-Cl pH8.0、5mM EDTA、0.2%SDS、100μg/ml蛋白酶K)56℃中裂解过夜之后,用乙醇沉淀法沉淀基因组DNA,然后把基因组DNA溶于TE缓冲液(10mM Tris-Cl pH 8.0、0.1mM EDTA)中。然后将其对应30万个细胞的基因组DNA溶液作为50μl PCR反应系统的模板进行第一轮PCR扩增。第一轮PCR扩增过程,针对FANCF采用的扩增引物为FANCF-p5和FANCF-p7,针对RNF2采用的扩增引物为RNF2-p5和RNF2-p7,针对EMX1采用的扩增引物为EMX1-p5和EMX1-p7。For cell samples (>0.3×10 6 cells) used for 48-well plate transfection, cell lysis buffer (200mM NaCl, 50mM Tris-Cl pH8.0, 5mM EDTA, 0.2% SDS, 100μg/ml proteinase K) was used. After lysis at 56°C overnight, genomic DNA was precipitated using ethanol precipitation, and then the genomic DNA was dissolved in TE buffer (10mM Tris-Cl pH 8.0, 0.1mM EDTA). Then the genomic DNA solution corresponding to 300,000 cells was used as a template in a 50 μl PCR reaction system for the first round of PCR amplification. In the first round of PCR amplification process, the amplification primers used for FANCF are FANCF-p5 and FANCF-p7, the amplification primers used for RNF2 are RNF2-p5 and RNF2-p7, and the amplification primers used for EMX1 are EMX1- p5 and EMX1-p7.

在建库中我们使用的PCR扩增酶包括高保真DNA聚合酶Phusion(F530,ThermoFisher)和Pfu(FastPfu Fly DNAPolymerase,TransStart,AP231-01)。由于第一轮PCR为不同的样品添加了序列标签,所以随后可以把第一轮的PCR产物混合后作为模板进行第二轮PCR扩增。第二轮PCR使用PE-P5-Short以及P7-index引物进行扩增。The PCR amplification enzymes we used in library construction include high-fidelity DNA polymerase Phusion (F530, ThermoFisher) and Pfu (FastPfu Fly DNAPolymerase, TransStart, AP231-01). Since the first round of PCR adds sequence tags to different samples, the first round of PCR products can then be mixed and used as templates for the second round of PCR amplification. The second round of PCR used PE-P5-Short and P7-index primers for amplification.

最终的PCR产物我们通过琼脂糖凝胶电泳按照目标产物大小切割后,使用DNA纯化试剂盒(TIANGEN)进行凝胶纯化,然后用Qubit定量试剂盒(Life Technologies)对纯化的DNA产物进行定量,最后通过illumina Hiseq(PE150)对扩增文库进行高通量测序。The final PCR product was cut according to the size of the target product through agarose gel electrophoresis, gel purified using a DNA purification kit (TIANGEN), and then the purified DNA product was quantified using a Qubit quantification kit (Life Technologies). High-throughput sequencing of amplified libraries was performed by illumina Hiseq (PE150).

针对上述不同位点的用于扩增的高通量测序引物:High-throughput sequencing primers for amplification targeting the different sites mentioned above:

(正脱氧寡核苷酸)FANCF-p5:(Normal deoxyoligonucleotide) FANCF-p5:

TTCCCTACACGACGCTCTTCCGATCTggctacCATTGCAGAGAGGCGTATCATTCCCTACACGAGCTCTTCCGATCTggctacCATTGCAGAGAGGCGTATCA

(SEQ ID NO.29)(SEQ ID NO.29)

(反脱氧寡核苷酸)FANCF-p7:(Anti-deoxyoligonucleotide) FANCF-p7:

AGTTCAGACGTGTGCTCTTCCGATCTggctacGGGGTCCCAGGTGCTGACAGTTCAGACGTGTGCTCTTCCGATCTggctacGGGGTCCCAGGTGCTGAC

(SEQ ID NO.30)(SEQ ID NO.30)

(正脱氧寡核苷酸)RNF2-p5:(Normal deoxyoligonucleotide) RNF2-p5:

TTCCCTACACGACGCTCTTCCGATCTggctacAACGTAGGAATTTTGGTGGGACATTCCCTACACGACGCTCTTCCGATCTggctacAACGTAGGAATTTTGGTGGGACA

(SEQ ID NO.31)(SEQ ID NO.31)

(反脱氧寡核苷酸)RNF2-p7:(Anti-deoxyoligonucleotide) RNF2-p7:

AGTTCAGACGTGTGCTCTTCCGATCTggctacACGTCTCATATGCCCCTTGGAGTTCAGACGTGTGCTCTTCCGATCTggctacACGTCTCATATGCCCCTTGG

(SEQ ID NO.32)(SEQ ID NO.32)

(正脱氧寡核苷酸)EMX1-p5:(Normal deoxyoligonucleotide) EMX1-p5:

TTCCCTACACGACGCTCTTCCGATCTggctacGGGCCTCCTGAGTTTCTCATTTCCCTACACGAGCTCTTCCGATCTggctacGGGCCTCCTGAGTTTCTCAT

(SEQ ID NO.33)(SEQ ID NO.33)

(反脱氧寡核苷酸)EMX1-p7:(Anti-deoxyoligonucleotide)EMX1-p7:

AGTTCAGACGTGTGCTCTTCCGATCTggctacCTCGTGGGTTTGTGGTTGCAGTTCAGACGTGTGCTCTTCCGATCTggctacCTCGTGGGTTTTGTGGTTGC

(SEQ ID NO.34)(SEQ ID NO.34)

第一轮PCR扩增体系,进行扩增,得到第一轮PCR扩增产物,PCR扩增体系见表4。The first round of PCR amplification system was used for amplification to obtain the first round of PCR amplification products. The PCR amplification system is shown in Table 4.

物质substance 添加量Adding amount 针对FANCF或RNF2或EMX1的正脱氧寡核苷酸/反脱氧寡核苷酸Ordodeoxyoligonucleotides/antideoxyoligonucleotides targeting FANCF or RNF2 or EMX1 各2μL2μL each 模板(本实施例中步骤2.3转染后的细胞基因组DNA)Template (cell genomic DNA after transfection in step 2.3 in this example) 5μL(约5μg)5μL (approximately 5μg) HF buffer(5×)HF buffer(5×) 10μL10μL Phusion(F530,Thermo Fisher)Phusion (F530, Thermo Fisher) 0.5μL0.5μL dNTPsdNTPs 1μL1μL ddH2OddH 2 O 至50μLto 50μL

第一轮PCR反应程序:First round PCR reaction procedure:

第二轮PCR扩增体系:Second round of PCR amplification system:

正脱氧寡核苷酸(PE-P5-Short):Positive deoxyoligonucleotide (PE-P5-Short):

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC(SEQ ID NO.35)AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC(SEQ ID NO.35)

反脱氧寡核苷酸(P7-index):Anti-deoxyoligonucleotide (P7-index):

CAAGCAGAAGACGGCATACGAGAT******GTGACTGGAGTTCAGACGTGTCAAGCAGAAGACGGCATACGAGAT******GTGACTGGAGTTCAGACGTGT

物质substance 添加量Adding amount 正脱氧寡核苷酸PE-P5-Short/反脱氧寡核苷酸P7-indexPositive deoxyoligonucleotide PE-P5-Short/anti-deoxyoligonucleotide P7-index 各1μL1μL each 第一轮PCR反应产物First round PCR reaction products 2μL2μL HF buffer(5×)HF buffer(5×) 5μL5μL Phusion(F530,Thermo Fisher)Phusion (F530, Thermo Fisher) 0.25μL0.25μL dNTPsdNTPs 0.5μL0.5μL ddH2OddH 2 O 至25μLto 25μL

第二轮PCR反应程序:Second round PCR reaction procedure:

2.5、结果2.5. Results

分析不同单碱基编辑系统转染细胞后产生的碱基缺失突变(见图9E)和大于1bp的重复序列插入突变的概率(见图9E),并分析碱基编辑效率和编辑产生的碱基替换突变产物中C>G占比(见图9F),以及染色体易位分布图(见图9G)。Analyze the probability of base deletion mutations (see Figure 9E) and repetitive sequence insertion mutations greater than 1 bp (see Figure 9E) produced after cells are transfected with different single base editing systems, and analyze the base editing efficiency and the bases produced by editing. The proportion of C>G in substitution mutation products (see Figure 9F), and the chromosomal translocation distribution map (see Figure 9G).

从图9中E可知,CGBEpH可以显著减少碱基删除突变,也显著减少了DSB介导的>1bp重复插入突变。As can be seen from E in Figure 9, CGBEpH can significantly reduce base deletion mutations and also significantly reduce DSB-mediated >1bp repeated insertion mutations.

从图9中F可知,CGBEpH不影响碱基编辑突变效率与C>G突变占比。It can be seen from F in Figure 9 that CGBEpH does not affect the base editing mutation efficiency and the proportion of C>G mutations.

从图9中G可知,与CGBE1相比,CGBE1pH能有效减少CGBE1的染色体易位发生概率。It can be seen from Figure 9G that compared with CGBE1, CGBE1pH can effectively reduce the probability of CGBE1 chromosomal translocation.

实施例3含有HMCES的融合蛋白Example 3 Fusion protein containing HMCES

本实施例中,将HMCES、胞嘧啶脱氨酶和核酸切口酶进行融合表达形成单碱基编辑系统,然后进行碱基编辑。具体为:将小鼠Hmces基因的编码序列扩增后分别克隆到BE4B质粒、CGBE1质粒和AID-BE1n质粒,分别构建得到融合蛋白HMCES-BE4B、HMCES-CGBE1和HMCES-AID-BE1n,然后转染细胞进行单碱基编辑。包括如下步骤:In this example, HMCES, cytosine deaminase and nucleic acid nickase are fused and expressed to form a single base editing system, and then base editing is performed. Specifically: the coding sequence of the mouse Hmces gene was amplified and cloned into the BE4B plasmid, CGBE1 plasmid and AID-BE1n plasmid respectively, and the fusion proteins HMCES-BE4B, HMCES-CGBE1 and HMCES-AID-BE1n were constructed respectively, and then transfected. Cells undergo single base editing. Includes the following steps:

3.1、含有融合蛋白的表达载体的构建3.1. Construction of expression vector containing fusion protein

3.1.1设计以及合成引物3.1.1 Design and synthesize primers

委托北京擎科生物科技有限公司合成用于向Hmces序列上添加同源臂的引物。正反向引物:Beijing Qingke Biotechnology Co., Ltd. was entrusted to synthesize primers for adding homology arms to the Hmces sequence. Forward and reverse primers:

用于构建融合蛋白HMCES-BE4B:Used to construct fusion protein HMCES-BE4B:

用于扩增连接于BE4B的小鼠Hmces第一段序列的正反向引物:Forward and reverse primers used to amplify the first sequence of mouse Hmces linked to BE4B:

模版:BE4BTemplate: BE4B

Q-CMV-F:AGGCGTGTACGGTGGGAGGT(SEQ ID NO.36)Q-CMV-F: AGGCGTGTACGGTGGGAGGT (SEQ ID NO.36)

Hmces-T7-R:ACGTTCGCCCGCACATGGTGGCGGCTCTCCCTATAGT(SEQ ID NO.37)用于扩增连接于BE4B的小鼠Hmces第二段序列的正反向引物:Hmces-T7-R: ACGTTCGCCCGCACATGGTGGCGGCTCTCCCTATAGT (SEQ ID NO.37) forward and reverse primers used to amplify the second sequence of mouse Hmces connected to BE4B:

模版:pcDNA3-HmcesTemplate:pcDNA3-Hmces

Hmces-F:ATGTGCGGGCGAACGTCCTG(SEQ ID NO.38)Hmces-F: ATGTGCGGGCGAACGTCCTG (SEQ ID NO.38)

XTEN3-Hmces-R:CCGCCAGAACTACCACCGCTGCTGTTAGGCTTCTTGGCCA(SEQ ID NO.39)XTEN3-Hmces-R: CCGCCAGAACTACCACCGCTGCTGTTAGGCTTCTTGGCCA (SEQ ID NO.39)

用于扩增连接于BE4B的小鼠Hmces第三段序列的正反向引物:Forward and reverse primers used to amplify the third sequence of mouse Hmces linked to BE4B:

模版:BE4B-N-GFPTemplate:BE4B-N-GFP

XTEN3-F:AGCGGTGGTAGTTCTGGCGG(SEQ ID NO.40)XTEN3-F: AGCGGTGGTAGTTCTGGCGG (SEQ ID NO.40)

XTEN-BamHI-R:TCCTGGTGTCTCGCTGCCAGA(SEQ ID NO.41)XTEN-BamHI-R:TCCTGGTGTCTCGCTGCCAGA (SEQ ID NO.41)

用于构建融合蛋白HMCES-CGBE1:Used to construct fusion protein HMCES-CGBE1:

用于扩增连接于CGBE1的小鼠Hmces第一段序列的正反向引物:Forward and reverse primers used to amplify the first sequence of mouse Hmces linked to CGBE1:

模版:CGBE1Template:CGBE1

Q-CMV-F:AGGCGTGTACGGTGGGAGGT(SEQ ID NO.42)Q-CMV-F: AGGCGTGTACGGTGGGAGGT (SEQ ID NO.42)

Hmces-T7-R:ACGTTCGCCCGCACATGGTGGCGGCTCTCCCTATAGT(SEQ ID NO.43)Hmces-T7-R:ACGTTCGCCCGCACATGGTGGCGGCTCTCCCTATAGT(SEQ ID NO.43)

用于扩增连接于CGBE1的小鼠Hmces第二段序列的正反向引物:Forward and reverse primers used to amplify the second sequence of mouse Hmces connected to CGBE1:

模版:pcDNA3-HmcesTemplate:pcDNA3-Hmces

Hmces-F:ATGTGCGGGCGAACGTCCTG(SEQ ID NO.44)Hmces-F: ATGTGCGGGCGAACGTCCTG (SEQ ID NO.44)

XTEN3-Hmces-R:CCGCCAGAACTACCACCGCTGCTGTTAGGCTTCTTGGCCA(SEQ ID NO.45)XTEN3-Hmces-R: CCGCCAGAACTACCACCGCTGCTGTTAGGCTTCTTGGCCA (SEQ ID NO.45)

用于扩增连接于CGBE1的小鼠Hmces第三段序列的正反向引物:Forward and reverse primers used to amplify the third sequence of mouse Hmces connected to CGBE1:

模版:CGBE1-N-GFPTemplate:CGBE1-N-GFP

XTEN3-F:AGCGGTGGTAGTTCTGGCGG(SEQ ID NO.46)XTEN3-F: AGCGGTGGTAGTTCTGGCGG (SEQ ID NO.46)

XTEN-BamHI-R:TCCTGGTGTCTCGCTGCCAGA(SEQ ID NO.47)XTEN-BamHI-R:TCCTGGTGTCTCGCTGCCAGA (SEQ ID NO.47)

用于构建融合蛋白HMCES-AID-BE1n:Used to construct fusion protein HMCES-AID-BE1n:

用于扩增连接于AID-BE1n的小鼠Hmces第一段序列的正反向引物:Forward and reverse primers used to amplify the first sequence of mouse Hmces linked to AID-BE1n:

模版:AID-BE1nTemplate:AID-BE1n

Q-CMV-F:AGGCGTGTACGGTGGGAGGT(SEQ ID NO.48)Q-CMV-F: AGGCGTGTACGGTGGGAGGT (SEQ ID NO.48)

Hmces-T7-R:ACGTTCGCCCGCACATGGTGGCGGCTCTCCCTATAGT(SEQ ID NO.49)Hmces-T7-R:ACGTTCGCCCGCACATGGTGGCGGCTCTCCCTATAGT(SEQ ID NO.49)

用于扩增连接于AID-BE1n的小鼠Hmces第二段序列的正反向引物:Forward and reverse primers used to amplify the second sequence of mouse Hmces linked to AID-BE1n:

模版:pcDNA3-HmcesTemplate:pcDNA3-Hmces

Hmces-F:ATGTGTGGGCGAACATCCTG(SEQ ID NO.50)Hmces-F: ATGTGTGGGCGAACATCCTG (SEQ ID NO.50)

XTEN3-Hmces-R:ACCGCCAGAACTACCACCGCTCTGGCTGTAAGGACGCTTGG(SEQ ID NO.51)XTEN3-Hmces-R: ACCGCCAGAACTACCACCGCTCTGGCTGTAAGGACGCTTGG (SEQ ID NO.51)

用于扩增连接于AID-BE1n的小鼠Hmces第三段序列的正反向引物:Forward and reverse primers used to amplify the third sequence of mouse Hmces linked to AID-BE1n:

模版:BE4B-N-GFPTemplate:BE4B-N-GFP

XTEN3-F:AGCGGTGGTAGTTCTGGCGG(SEQ ID NO.52)XTEN3-F: AGCGGTGGTAGTTCTGGCGG (SEQ ID NO.52)

XTEN3-NLS-R3:XTEN3-NLS-R3:

ACTTCCACCTGAAGATCCACCAGATGATTCGGGAGTCGCGCTTTCAG(SEQ ID NO.53)ACTTCCACCTGAAGATCCACCAGATGATTCGGGAGTCGCGCTTTCAG(SEQ ID NO.53)

用于扩增连接于AID-BE1n的小鼠Hmces第四段序列的正反向引物:Forward and reverse primers used to amplify the fourth sequence of mouse Hmces linked to AID-BE1n:

模版:AID-BE1nTemplate:AID-BE1n

XTEN3-AIDmono-F:gtggatcttcaggtggaagtATGGACCCCGCTACCTTCAC(SEQ IDNO.54)XTEN3-AIDmono-F:gtggatcttcaggtggaagtATGGACCCCGCTACCTTCAC(SEQ IDNO.54)

dCas9-AID-R:dCas9-AID-R:

GCGGACTCTGAGGTCCCGGGAGTCTCGCTGCCGCTCTGAAGGATGCGCCGAA(SEQ ID NO.55)GCGGACTCTGAGGTCCCGGGAGTCTCGCTGCCGCTCTGAAGGATGCGCCGAA(SEQ ID NO.55)

3.1.2获得目的片段序列3.1.2 Obtain the target fragment sequence

方法与实施例2中步骤2.1.2中的构建方法相同。The method is the same as the construction method in step 2.1.2 in Example 2.

3.1.3分别酶切质粒BE4B、质粒CGBE1和质粒AID-BE1n3.1.3 Respectively digest plasmid BE4B, plasmid CGBE1 and plasmid AID-BE1n

采用NotI和BamHI双酶酶切质粒BE4B(构建的),得到APOBEC1和nCas片段;采用NotI和BamHI双酶酶切质粒CGBE1(购买的),得到eUNG、APOBEC1和nCas片段;采用NotI和SmaI双酶酶切质粒AID-BE1n(构建的),得到AID和nCas片段。Use NotI and BamHI to digest plasmid BE4B (constructed) to obtain APOBEC1 and nCas fragments; use NotI and BamHI to digest plasmid CGBE1 (purchased) to obtain eUNG, APOBEC1 and nCas fragments; use NotI and SmaI to digest plasmid CGBE1 (purchased); use NotI and SmaI to digest plasmid BE4B (constructed). Plasmid AID-BE1n (constructed) was digested to obtain AID and nCas fragments.

酶切体系与方法与实施例2中步骤2.1.3相同。The enzyme digestion system and method were the same as step 2.1.3 in Example 2.

3.1.4连接酶切后的质粒与目的片段3.1.4 Ligase digested plasmid and target fragment

将步骤3.1.2得到的目标片段与步骤3.1.3得到的酶切后的质粒分别进行连接,连接方法与实施例2中步骤2.1.4相同,分别得到HM-BE4B、HM-CGBE1和HM-AID-BE1n。The target fragment obtained in step 3.1.2 was ligated with the digested plasmid obtained in step 3.1.3 respectively. The ligation method was the same as step 2.1.4 in Example 2 to obtain HM-BE4B, HM-CGBE1 and HM- respectively. AID-BE1n.

3.1.5转化连接产物3.1.5 Transformation of ligation products

方法与实施例2中步骤2.1.5相同。The method is the same as step 2.1.5 in Example 2.

3.1.6挑取单克隆测序3.1.6 Pick single clones for sequencing

方法与实施例2中步骤2.1.6相同。The method is the same as step 2.1.6 in Example 2.

融合蛋白的框架:Framework of the fusion protein:

HM-CGBE1:pCMV-Hmces-Ung-APOBEC1(R33A)-nCas9-NLS-EGFPHM-CGBE1: pCMV-Hmces-Ung-APOBEC1(R33A)-nCas9-NLS-EGFP

HM-BE4B:pCMV-Hmces-APOBEC1-nCas9-NLSHM-BE4B: pCMV-Hmces-APOBEC1-nCas9-NLS

HM-AID-BE1n:pCMV-Hmces-AIDmono-nCas9-NLSHM-AID-BE1n: pCMV-Hmces-AIDmono-nCas9-NLS

经测序,融合蛋白的氨基酸序列如下:After sequencing, the amino acid sequence of the fusion protein is as follows:

HM-CGBE1的氨基酸序列如SEQ ID NO.19所示,HM-BE4B的氨基酸序列如SEQ IDNO.20所示,HM-AID-BE1n的氨基酸序列如SEQ ID NO.21所示。The amino acid sequence of HM-CGBE1 is shown in SEQ ID NO. 19, the amino acid sequence of HM-BE4B is shown in SEQ ID NO. 20, and the amino acid sequence of HM-AID-BE1n is shown in SEQ ID NO. 21.

3.1.7测序成功质粒进行中抽3.1.7 The plasmid was successfully sequenced and extracted.

方法与实施例2中步骤2.1.7相同。The method is the same as step 2.1.7 in Example 2.

3.2、用Lipofectamine 2000进行细胞转染3.2. Use Lipofectamine 2000 for cell transfection

与实施例2中步骤2.3相同。Same as step 2.3 in Example 2.

同时,以GFP替代HMCES,分别得到相应的碱基编辑体系,如GFP-BE4B、GFP-CGBE1和GFP-AID-BE1n,作为对照物。At the same time, GFP was used instead of HMCES to obtain corresponding base editing systems, such as GFP-BE4B, GFP-CGBE1 and GFP-AID-BE1n, as controls.

图10中B为本实施例中BE4B质粒、CGBE1质粒和AID-BE1n质粒分别融合表达HMCES后,构建的单碱基编辑系统HMCES-BE4B、HMCES-CGBE1和HMCES-AID-BE1n。B in Figure 10 shows the single base editing systems HMCES-BE4B, HMCES-CGBE1 and HMCES-AID-BE1n constructed after the BE4B plasmid, CGBE1 plasmid and AID-BE1n plasmid were respectively fused to express HMCES in this example.

3.3、制备高通量测序文库3.3. Preparation of high-throughput sequencing libraries

与实施例2中步骤2.4相同。Same as step 2.4 in Example 2.

3.4、结果3.4. Results

分析不同碱基编辑器的编辑效率、产生碱基删除突变的效率以及C>G占比。Analyze the editing efficiency of different base editors, the efficiency of generating base deletion mutations, and the proportion of C>G.

从图10中C可知,融合表达HMCES后,小片段碱基删除突变显著减少,碱基编辑突变的频率没有显著变化,但在表达HMCES-CGBE1的细胞中C>G突变占比略有下降,而在表达HMCES-AID-BE1n的细胞中C>G突变占比没有下降。As can be seen from Figure 10 C, after fusion expression of HMCES, small fragment base deletion mutations were significantly reduced, and the frequency of base editing mutations did not change significantly. However, the proportion of C>G mutations in cells expressing HMCES-CGBE1 decreased slightly. However, the proportion of C>G mutations in cells expressing HMCES-AID-BE1n did not decrease.

实施例4通过顺式设计靶向序列减少CGBE介导DSB的生成Example 4 Reduce the generation of CGBE-mediated DSB by designing targeting sequences in cis

通过实施例1,发现碱基编辑过程中脱氨酶偏好的回文基序更容易发生DNA双链断裂,本实施例中通过顺式和反式设计靶向序列,验证其对碱基编辑的效果。具体为:合成了5个含有脱氨酶偏好的回文基序(TCGA)以及5个不含有回文基序(TCAA)的sgRNA,分别通过慢病毒感染获得稳定表达不同序列的HEK293T细胞。在这些细胞系中转染CGBE,然后经过高通量测序检测CGBE在细胞基因组中产生的碱基删除突变的比例,分析不同序列产生碱基删除突变的偏好性。包括如下步骤:Through Example 1, it was found that the palindromic motif favored by deaminase is more likely to cause DNA double-strand breaks during the base editing process. In this example, the targeting sequence was designed in cis and trans to verify its effect on base editing. Effect. Specifically: 5 sgRNAs containing deaminase-preferred palindromic motifs (TCGA) and 5 sgRNAs without palindromic motifs (TCAA) were synthesized, and HEK293T cells stably expressing different sequences were obtained through lentivirus infection. CGBE was transfected into these cell lines, and then the proportion of base deletion mutations produced by CGBE in the cell genome was detected through high-throughput sequencing, and the preference of different sequences to produce base deletion mutations was analyzed. Includes the following steps:

4.1、包含靶向位点以及包含靶向位点的sgRNA的质粒构建4.1. Construction of plasmids containing targeting sites and sgRNA containing targeting sites

4.1.1、靶向序列合成4.1.1. Target sequence synthesis

委托生工生物工程(上海)股份有限公司合成5个含有回文基序TCGA的底物以及5个不含有回文基序TCAA的底物,并合成对应的sgRNA。Sangon Bioengineering (Shanghai) Co., Ltd. was entrusted to synthesize 5 substrates containing the palindromic motif TCGA and 5 substrates without the palindromic motif TCAA, and synthesize the corresponding sgRNA.

5个含有回文基序TCGA的底物包括HEK3-TCGA-1、HEK3-TCGA-2、GFP153-TCGA-1、GFP153-TCGA-2和SITE13-TCGA-2;对应的sgRNA包括HEK3-TCGA-1-sgRNA-F/R、HEK3-TCGA-2-sgRNA-F/R、GFP153-TCGA-1-sgRNA-F/R、GFP153-TCGA-2-sgRNA-F/R和SITE13-TCGA-2-sgRNA-F/R。The five substrates containing the palindromic motif TCGA include HEK3-TCGA-1, HEK3-TCGA-2, GFP153-TCGA-1, GFP153-TCGA-2 and SITE13-TCGA-2; the corresponding sgRNA includes HEK3-TCGA- 1-sgRNA-F/R, HEK3-TCGA-2-sgRNA-F/R, GFP153-TCGA-1-sgRNA-F/R, GFP153-TCGA-2-sgRNA-F/R and SITE13-TCGA-2- sgRNA-F/R.

5个不含有回文基序TCAA的底物包括HEK3-TCGA-C1、HEK3-TCGA-C2、GFP153-TCGA-C1、GFP153-TCGA-C2和SITE13-TCGA-C2;对应的sgRNA包括HEK3-TCGA-C1-sgRNA-F/R、HEK3-TCGA-C2-sgRNA-F/R、GFP153-TCGA-C1-sgRNA-F/R、GFP153-TCGA-C2-sgRNA-F/R和SITE13-TCGA-C2-sgRNA-F/R。The five substrates that do not contain the palindromic motif TCAA include HEK3-TCGA-C1, HEK3-TCGA-C2, GFP153-TCGA-C1, GFP153-TCGA-C2 and SITE13-TCGA-C2; the corresponding sgRNA includes HEK3-TCGA -C1-sgRNA-F/R, HEK3-TCGA-C2-sgRNA-F/R, GFP153-TCGA-C1-sgRNA-F/R, GFP153-TCGA-C2-sgRNA-F/R and SITE13-TCGA-C2 -sgRNA-F/R.

序列:sequence:

HEK3-TCGA-1:HEK3-TCGA-1:

GTGGAAAGGACGAAATCGACAGACTGAGCACGTGATGGCTAGAAAGCTTGGCGTA(SEQ ID NO.56)GTGGAAAGGACGAAATCGACAGACTGAGCACGTGATGGCTAGAAAGCTTGGCGTA(SEQ ID NO.56)

HEK3-TCGA-C1:HEK3-TCGA-C1:

GTGGAAAGGACGAAATCAACAGACTGAGCACGTGATGGCTAGAAAGCTTGGCGTA(SEQ ID NO.57)GTGGAAAGGACGAAATCAACAGACTGAGCACGTGATGGCTAGAAAGCTTGGCGTA(SEQ ID NO.57)

HEK3-TCGA-2:HEK3-TCGA-2:

GTGGAAAGGACGAAAGGCCTCGACTGAGCACGTGATGGCTAGAAAGCTTGGCGTA(SEQ ID NO.58)GTGGAAAGGACGAAAGGCCTCGACTGAGCACGTGATGGCTAGAAAGCTTGGCGTA(SEQ ID NO.58)

HEK3-TCGA-C2:HEK3-TCGA-C2:

GTGGAAAGGACGAAAGGCCTCAACTGAGCACGTGATGGCTAGAAAGCTTGGCGTA(SEQ ID NO.59)GTGGAAAGGACGAAAGGCCTCAACTGAGCACGTGATGGCTAGAAAGCTTGGCGTA(SEQ ID NO.59)

GFP153-TCGA-1:GFP153-TCGA-1:

GTGGAAAGGACGAAATCGAACGGGCAGCTTGCCGGTGGCTAGAAAGCTTGGCGTA(SEQ ID NO.60)GTGGAAAGGACGAAATCGAACGGGCAGCTTGCCGGTGGCTAGAAAGCTTGGCGTA (SEQ ID NO.60)

GFP153-TCGA-C1:GFP153-TCGA-C1:

GTGGAAAGGACGAAATCAAACGGGCAGCTTGCCGGTGGCTAGAAAGCTTGGCGTA(SEQ ID NO.61)GTGGAAAGGACGAAATCAAACGGGCAGCTTGCCGGTGGCTAGAAAGCTTGGCGTA(SEQ ID NO.61)

GFP153-TCGA-2:GFP153-TCGA-2:

GTGGAAAGGACGAAAGGGCTCGAGCAGCTTGCCGGTGGCTAGAAAGCTTGGCGTA(SEQ ID NO.62)GTGGAAAGGACGAAAGGGCTCGAGCAGCTTGCCGGTGGCTAGAAAGCTTGGCGTA (SEQ ID NO.62)

GFP153-TCGA-C2:GFP153-TCGA-C2:

GTGGAAAGGACGAAAGGGCTCAAGCAGCTTGCCGGTGGCTAGAAAGCTTGGCGTA(SEQ ID NO.63)GTGGAAAGGACGAAAGGGCTCAAGCAGCTTGCCGGTGGCTAGAAAGCTTGGCGTA(SEQ ID NO.63)

SITE13-TCGA-2:SITE13-TCGA-2:

GTGGAAAGGACGAAATATATCGAATAGAGAATAGACTGCTGGCTAGAAAGCTTGGCGTA(SEQ IDNO.64)GTGGAAAGGACGAAATATATCGAATAGAGAATAGACTGCTGGCTAGAAAGCTTGGCGTA(SEQ IDNO.64)

SITE13-TCGA-C2:SITE13-TCGA-C2:

GTGGAAAGGACGAAATATATCAAATAGAGAATAGACTGCTGGCTAGAAAGCTTGGCGTA(SEQ IDNO.65)GTGGAAAGGACGAAATATATCAAATAGAGAATAGACTGCTGGCTAGAAAGCTTGGCGTA(SEQ IDNO.65)

HEK3-TCGA-1-sgRNA-F:caccgTCGACAGACTGAGCACGTGA(SEQ ID NO.66)HEK3-TCGA-1-sgRNA-F: caccgTCGACAGACTGAGCACGTGA (SEQ ID NO.66)

HEK3-TCGA-1-sgRNA-R:aaacTCACGTGCTCAGTCTGTCGA(SEQ ID NO.67)HEK3-TCGA-1-sgRNA-R: aaacTCACGTGCTCAGTCTGTCGA (SEQ ID NO.67)

HEK3-TCGA-2-sgRNA-F:caccgGGCCTCGACTGAGCACGTGA(SEQ ID NO.68)HEK3-TCGA-2-sgRNA-F: caccgGGCCTCGACTGAGCACGTGA (SEQ ID NO.68)

HEK3-TCGA-2-sgRNA-R:aaacTCACGTGCTCAGTCGAGGCC(SEQ ID NO.69)HEK3-TCGA-2-sgRNA-R: aaacTCACGTGCTCAGTCGAGGCC (SEQ ID NO.69)

GFP153-TCGA-1-sgRNA-F:caccgTCGAACGGGCAGCTTGCCGG(SEQ ID NO.70)GFP153-TCGA-1-sgRNA-F: caccgTCGAACGGGCAGCTTGCCGG (SEQ ID NO.70)

GFP153-TCGA-1-sgRNA-R:aaacCCGGCAAGCTGCCCGTTCGA(SEQ ID NO.71)GFP153-TCGA-1-sgRNA-R: aaacCCGGCAAGCTGCCCGTTCGA (SEQ ID NO.71)

GFP153-TCGA-2-sgRNA-F:caccgGGGCTCGAGCAGCTTGCCGG(SEQ ID NO.72)GFP153-TCGA-2-sgRNA-F: caccgGGGCTCGAGCAGCTTGCCGG (SEQ ID NO.72)

GFP153-TCGA-2-sgRNA-R:aaacCCGGCAAGCTGCTCGAGCCC(SEQ ID NO.73)GFP153-TCGA-2-sgRNA-R: aaacCCGGCAAGCTGCTCGAGCCC (SEQ ID NO.73)

SITE13-TCGA-2-sgRNA-F:caccgTCGAATAGAGAATAGACTGC(SEQ ID NO.74)SITE13-TCGA-2-sgRNA-F: caccgTCGAATAGAGAATAGACTGC (SEQ ID NO.74)

SITE13-TCGA-2-sgRNA-R:aaacGCAGTCTATTCTCTATTCGA(SEQ ID NO.75)SITE13-TCGA-2-sgRNA-R: aaacGCAGTCTATTCTCTATTCGA (SEQ ID NO.75)

HEK3-TCGA-C1-sgRNA-F:caccgTCAACAGACTGAGCACGTGA(SEQ ID NO.76)HEK3-TCGA-C1-sgRNA-F: caccgTCAACAGACTGAGCACGTGA (SEQ ID NO.76)

HEK3-TCGA-C1-sgRNA-R:aaacTCACGTGCTCAGTCTGTTGA(SEQ ID NO.77)HEK3-TCGA-C1-sgRNA-R: aaacTCACGTGCTCAGTCTGTTGA (SEQ ID NO.77)

HEK3-TCGA-C2-sgRNA-F:caccgGGCCTCAACTGAGCACGTGA(SEQ ID NO.78)HEK3-TCGA-C2-sgRNA-F: caccgGGCCTCAACTGAGCACGTGA (SEQ ID NO.78)

HEK3-TCGA-C2-sgRNA-R:aaacTCACGTGCTCAGTTGAGGCC(SEQ ID NO.79)HEK3-TCGA-C2-sgRNA-R: aaacTCACGTGCTCAGTTGAGGCC (SEQ ID NO.79)

GFP153-TCGA-C1-sgRNA-F:caccgTCAAACGGGCAGCTTGCCGG(SEQ ID NO.80)GFP153-TCGA-C1-sgRNA-F: caccgTCAAACGGGCAGCTTGCCGG (SEQ ID NO.80)

GFP153-TCGA-C1-sgRNA-R:aaacCCGGCAAGCTGCCCGTTTGA(SEQ ID NO.81)GFP153-TCGA-C1-sgRNA-R: aaacCCGGCAAGCTGCCCGTTTGA (SEQ ID NO.81)

GFP153-TCGA-C2-sgRNA-F:caccgGGGCTCAAGCAGCTTGCCGG(SEQ ID NO.82)GFP153-TCGA-C2-sgRNA-F: caccgGGGCTCAAGCAGCTTGCCGG (SEQ ID NO.82)

GFP153-TCGA-C2-sgRNA-R:aaacCCGGCAAGCTGCTTGAGCCC(SEQ ID NO.83)GFP153-TCGA-C2-sgRNA-R: aaacCCGGCAAGCTGCTTGAGCCC (SEQ ID NO.83)

SITE13-TCGA-C2-sgRNA-F:caccgAATAATAGAGAATAGACTGC(SEQ ID NO.84)SITE13-TCGA-C2-sgRNA-F: caccgAATAATAGAGAATAGACTGC (SEQ ID NO.84)

SITE13-TCGA-C2-sgRNA-R:aaacGCAGTCTATTCTCTATTATT(SEQ ID NO.85)SITE13-TCGA-C2-sgRNA-R:aaacGCAGTCTATTCTCTATTATT(SEQ ID NO.85)

4.1.2、目的片段的获得4.1.2. Obtaining target fragments

包含sgRNA的目的片段获得方法如下:The method to obtain the target fragment containing sgRNA is as follows:

1)用ddH2O将脱氧寡核苷酸溶解至100μM,并稀释至10μM。1) Dissolve the deoxyoligonucleotide to 100 μM with ddH 2 O and dilute to 10 μM.

2)将正反脱氧寡核苷酸加入如下反应体系中,反应体系如表1所示。2) Add forward and reverse deoxyoligonucleotides to the following reaction system. The reaction system is shown in Table 1.

表1Table 1

物质substance 添加量Adding amount 本实施例中4.1.1的sgRNA的正脱氧寡核苷酸/反脱氧寡核苷酸The positive deoxyoligonucleotide/antideoxyoligonucleotide of sgRNA in 4.1.1 in this example 各10μL10μL each Annealing buffer(5×)Annealing buffer(5×) 5μL5μL ddH2OddH 2 O 至25μLto 25μL

反应程序:95℃5min,逐步降温至10℃。Reaction program: 95°C for 5 min, then gradually cool down to 10°C.

4.1.3、酶切载体4.1.3. Enzyme digestion vector

采用BbsI酶切质粒PX33R。酶切体系与方法与实施例2步骤2.1.3相同。Plasmid PX33R was digested with BbsI. The enzyme digestion system and method were the same as step 2.1.3 in Example 2.

4.1.4、连接酶切后的载体与目的片段4.1.4. Ligase digested vector and target fragment

采用T4连接酶把本实施例中步骤4.1.2得到的sgRNA与本实施例中步骤4.1.3得到的经酶切后的质粒PX33R连接,得到连接产物px330R-sgRNA。方法与实施例2中步骤2.1.4相同。Use T4 ligase to connect the sgRNA obtained in step 4.1.2 in this example to the digested plasmid PX33R obtained in step 4.1.3 in this example to obtain the ligation product px330R-sgRNA. The method is the same as step 2.1.4 in Example 2.

4.1.5、转化连接产物4.1.5. Transformation of ligation products

方法与实施例2中步骤2.1.5相同。The method is the same as step 2.1.5 in Example 2.

4.1.6、挑取单克隆测序4.1.6. Select single clones for sequencing

方法与实施例2中步骤2.1.6相同。The method is the same as step 2.1.6 in Example 2.

4.1.7、测序成功质粒进行中抽4.1.7. Successfully sequenced plasmid is extracted.

方法与实施例2中步骤2.1.7相同。The method is the same as step 2.1.7 in Example 2.

4.2、包含靶向位点的细胞系构建4.2. Construction of cell lines containing targeting sites

4.2.1构建表达靶向位点的质粒4.2.1 Construction of plasmids expressing targeting sites

经PCR分别扩增含有(不含有)回文基序的底物序列,随后与NdeI和SmaI双酶切plenti-guide质粒载体连接,连接方法与实施例2中步骤2.1.4相同。经转化连接产物、挑取单克隆测序、Sanger测序比对正确后(方法同上),获得表达包含靶向位点序列的慢病毒载体质粒。The substrate sequences containing (or not containing) palindromic motifs were respectively amplified by PCR, and then ligated with NdeI and SmaI double-digested plenti-guide plasmid vectors. The ligation method was the same as step 2.1.4 in Example 2. After transforming the ligation product, selecting single clones for sequencing, and Sanger sequencing for correct comparison (the method is the same as above), the lentiviral vector plasmid expressing the target site sequence was obtained.

4.2.2用FuGENE进行病毒包装4.2.2 Virus packaging using FuGENE

1)将包装质粒pMD2.G(250ng),包装质粒psPAX2(750ng),lentiGuide-sgRNA-Puro(1μg)通过6μL FuGENE(Roche#11814443001)进行病毒包装。1) Pass packaging plasmid pMD2.G (250ng), packaging plasmid psPAX2 (750ng), lentiGuide-sgRNA-Puro (1μg) into 6μL FuGENE (Roche #11814443001) for virus packaging.

2)12-16小时进行换液,加入5.5mL新鲜的培养基。2) Change the medium every 12-16 hours and add 5.5mL of fresh culture medium.

3)在换液后24小时和48小时收病毒(经0.45μm滤膜过滤),4℃暂存。3) Collect the virus (filtered through a 0.45 μm membrane) 24 hours and 48 hours after changing the medium, and store it temporarily at 4°C.

4.2.3、稳转细胞株的获得4.2.3. Obtaining stably transfected cell lines

1)在感染前16-24小时,将HEK293T细胞以1×106个/孔的密度铺至6厘米培养皿中,使得第二天的细胞密度约为80%。1) 16-24 hours before infection, spread HEK293T cells into a 6 cm culture dish at a density of 1×10 6 cells/well, so that the cell density on the next day is approximately 80%.

2)使用2mL病毒,1mL新鲜培养基以及3μL Polybrene对细胞进行感染。2) Use 2mL virus, 1mL fresh medium and 3μL Polybrene to infect the cells.

3)感染24小时后进行换液,加入4mL新鲜培养基。3) Change the medium 24 hours after infection and add 4 mL of fresh culture medium.

换液24小时后开始加入嘌呤酶素进行筛选,从而得到含有特定靶向位的细胞株。24 hours after changing the medium, purinease was added for screening to obtain cell lines containing specific targeting sites.

4.2.4、用Lipofectamine 2000进行细胞转染4.2.4. Use Lipofectamine 2000 for cell transfection

将质粒CGBE1/AID-BE1n、本实施例中步骤1.4得到的连接产物px330R-sgRNA通过Lipofectamine 2000进行转染Plasmid CGBE1/AID-BE1n and the ligation product px330R-sgRNA obtained in step 1.4 in this example were transfected using Lipofectamine 2000.

具体步骤与实施例2中步骤2.3相同。The specific steps are the same as step 2.3 in Example 2.

4.2.5、制备高通量测序文库4.2.5. Preparation of high-throughput sequencing libraries

与实施例2的步骤2.4相同。Same as step 2.4 of Example 2.

4.3、结果4.3. Results

高通量测序检测CGBE1在细胞基因组中产生的碱基删除突变的比例,分析不同序列产生碱基删除突变的偏好性。随后,统计了ClinVar数据库中能够通过CGBE实现C>G突变进行纠正的致病突变(Arbab et al.,2020)。High-throughput sequencing detects the proportion of base deletion mutations produced by CGBE1 in the cell genome, and analyzes the preference of different sequences to produce base deletion mutations. Subsequently, the pathogenic mutations in the ClinVar database that can be corrected by CGBE to achieve C>G mutations were counted (Arbab et al., 2020).

图11中A为本实施例中CGBE1作用于含有回文基序(TCGA)或者非回文基序(TCAA)的编辑序列上发生的大于1bp的碱基删除突变图谱。A in Figure 11 shows the base deletion mutation map of greater than 1 bp that occurs when CGBE1 acts on the editing sequence containing a palindromic motif (TCGA) or a non-palindromic motif (TCAA) in this example.

从图11中A可知,靶向编辑区域内含有回文基序TCGA的靶向位点序列在碱基编辑器作用编辑过程中产生较高频率的碱基删除突变,导致其目的碱基编辑产物碱基替换突变频率较低;而靶向编辑区域内不含有回文基序的序列碱基编辑副产物碱基删除突变频率降低,并且具有较高的碱基编辑效率。As can be seen from A in Figure 11, the target site sequence containing the palindromic motif TCGA in the targeted editing region produces a higher frequency of base deletion mutations during the editing process of the base editor, resulting in the target base editing product. The frequency of base substitution mutations is low; the frequency of base editing by-products of base editing in targeted editing regions that do not contain palindromic motifs is reduced, and the frequency of base deletion mutations is reduced, and the base editing efficiency is higher.

图11中B为本实施例中质粒CGBE1作用于5对回文/非回文基序序列后产生删除突变的效率差异统计图。B in Figure 11 is a statistical diagram of the difference in efficiency of deletion mutations produced by plasmid CGBE1 in this example after acting on 5 pairs of palindromic/non-palindromic motif sequences.

从图11中B可知,在CGBE1的编辑产物中,含有TCGA回文基序的靶向序列具有较高的碱基删除突变频率。As shown in Figure 11B, in the edited product of CGBE1, the target sequence containing the TCGA palindromic motif has a higher frequency of base deletion mutations.

图11中C为本实施例中ClinVar数据库中3040个潜在CGBE校正位点中,编辑窗口内含有回文基序的占比分析pie图。C in Figure 11 is a pie chart showing the proportion of palindromic motifs contained in the editing window among the 3040 potential CGBE correction sites in the ClinVar database in this embodiment.

从图11中C可知,近10%的致病突变位点含有回文基序CG。From C in Figure 11, it can be seen that nearly 10% of the pathogenic mutation sites contain the palindromic motif CG.

基于合成回文靶向序列以及前面脱氨酶-OT位点的染色体易位的分析,建议在CGBE的临床应用中应尽量避免选择具有脱氨酶偏好回文基序的sgRNA。Based on the analysis of the synthetic palindromic targeting sequence and the chromosomal translocation of the preceding deaminase-OT site, it is recommended that the selection of sgRNAs with deaminase-preferring palindromic motifs should be avoided in the clinical application of CGBE.

上述实施例仅例示性说明本发明的原理及其功效,而非用于限制本发明。任何熟悉此技术的人士皆可在不违背本发明的精神及范畴下,对上述实施例进行修饰或改变。因此,举凡所属技术领域中具有通常知识者在未脱离本发明所揭示的精神与技术思想下所完成的一切等效修饰或改变,仍应由本发明的权利要求所涵盖。The above embodiments only illustrate the principles and effects of the present invention, but are not intended to limit the present invention. Anyone familiar with this technology can modify or change the above embodiments without departing from the spirit and scope of the invention. Therefore, all equivalent modifications or changes made by those with ordinary knowledge in the technical field without departing from the spirit and technical ideas disclosed in the present invention shall still be covered by the claims of the present invention.

Claims (25)

1.一种单碱基编辑系统,其特征在于,所述单碱基编辑系统包括:1. A single base editing system, characterized in that the single base editing system includes: 1)核酸切口酶或其编码多核苷酸;1) Nucleic acid nickase or its encoding polynucleotide; 2)胞嘧啶脱氨酶或其编码多核苷酸;2) Cytosine deaminase or its encoding polynucleotide; 3)无碱基位点保护多肽或其编码多核苷酸。3) Abasic site protected polypeptide or its encoding polynucleotide. 2.如权利要求1所述的单碱基编辑系统,其特征在于,所述无碱基位点保护多肽为APEX1竞争性抑制剂和/或APEX1突变体;优选地,所述APEX1竞争性抑制剂选自HMCES。2. The single base editing system of claim 1, wherein the abasic site protection polypeptide is an APEX1 competitive inhibitor and/or an APEX1 mutant; preferably, the APEX1 competitive inhibitor The agent is selected from HMCES. 3.如权利要求2所述的单碱基编辑系统,其特征在于,所述HMCES的氨基酸序列包括:3. The single base editing system according to claim 2, wherein the amino acid sequence of the HMCES includes: 1)如SEQ ID NO.1所示的氨基酸序列;或,1) The amino acid sequence shown in SEQ ID NO.1; or, 2)与SEQ ID NO.1具有80%以上序列相似性、且具有1)所限定的氨基酸序列的功能的氨基酸序列。2) An amino acid sequence that has 80% or more sequence similarity with SEQ ID NO. 1 and has the function of the amino acid sequence defined in 1). 4.如权利要求1所述的单碱基编辑系统,其特征在于,所述核酸切口酶为Cas9切口酶;和/或,所述胞嘧啶脱氨酶选自APOBEC1、APOBEC2、APOBEC3A、APOBEC3B、APOBEC3C、APOBEC3D/E、APOBEC3F、APOBEC3G、APOBEC3H、APOBEC4及其突变体或AID及其突变体中的一种或多种。4. The single base editing system of claim 1, wherein the nucleic acid nickase is Cas9 nickase; and/or the cytosine deaminase is selected from APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, One or more of APOBEC3C, APOBEC3D/E, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4 and their mutants or AID and their mutants. 5.如权利要求1所述的单碱基编辑系统,其特征在于,所述核酸切口酶或胞嘧啶脱氨酶以融合蛋白的形式表达时,所述融合蛋白包含胞嘧啶脱氨酶片段和核酸切口酶片段;所述无碱基位点保护多肽融合表达或独立表达。5. The single base editing system of claim 1, wherein when the nucleic acid nickase or cytosine deaminase is expressed in the form of a fusion protein, the fusion protein includes a cytosine deaminase fragment and Nucleic acid nickase fragment; the abasic site protected polypeptide is fused or expressed independently. 6.如权利要求5所述的单碱基编辑系统,其特征在于,所述融合蛋白还包括尿嘧啶糖基化蛋白片段和/或核定位片段和/或无碱基位点保护多肽片段。6. The single base editing system according to claim 5, wherein the fusion protein further includes a uracil glycosylated protein fragment and/or a nuclear localization fragment and/or an abasic site protected polypeptide fragment. 7.如权利要求6所述的单碱基编辑系统,其特征在于,所述融合蛋白从N端至C端依次包括胞嘧啶脱氨酶片段和核酸切口酶片段。7. The single base editing system of claim 6, wherein the fusion protein includes a cytosine deaminase fragment and a nucleic acid nickase fragment in order from the N-terminus to the C-terminus. 8.如权利要求7所述的单碱基编辑系统,其特征在于,所述融合蛋白中,胞嘧啶脱氨酶片段的N端连接有尿嘧啶糖基化蛋白片段和/或无碱基位点保护多肽片段。8. The single base editing system according to claim 7, wherein in the fusion protein, the N-terminus of the cytosine deaminase fragment is connected to a uracil glycosylated protein fragment and/or an abasic site. Point-protected peptide fragments. 9.如权利要求7所述的单碱基编辑系统,其特征在于,所述融合蛋白包括氨基酸序列包括如SEQ ID NO.19或SEQ ID NO.20或SEQ ID NO.21所示序列。9. The single base editing system of claim 7, wherein the fusion protein includes an amino acid sequence including the sequence shown in SEQ ID NO. 19, SEQ ID NO. 20, or SEQ ID NO. 21. 10.如权利要求1-9任一项所述的单碱基编辑系统,其特征在于,所述单碱基编辑系统还包括sgRNA或其编码多核苷酸。10. The single base editing system according to any one of claims 1 to 9, characterized in that the single base editing system further includes sgRNA or its encoding polynucleotide. 11.根据权利要求10所述的单碱基编辑系统,其特征在于,所述sgRNA不含有脱氨酶偏好的回文基序;优选的,所述脱氨酶偏好的回文基序为TCGA。11. The single base editing system according to claim 10, characterized in that the sgRNA does not contain a deaminase-preferred palindrome motif; preferably, the deaminase-preferred palindrome motif is TCGA. . 12.一种单碱基编辑用融合蛋白,其特征在于,包括无碱基位点保护多肽片段、胞嘧啶脱氨酶片段及核酸切口酶片段。12. A fusion protein for single base editing, characterized by comprising an abasic site-protecting polypeptide fragment, a cytosine deaminase fragment and a nucleic acid nickase fragment. 13.如权利要求12所述的融合蛋白,其特征在于,所述无碱基位点保护多肽片段为APEX1竞争性抑制剂或APEX1突变体;优选地,所述APEX1竞争性抑制剂选自HMCES。13. The fusion protein of claim 12, wherein the abasic site protected polypeptide fragment is an APEX1 competitive inhibitor or an APEX1 mutant; preferably, the APEX1 competitive inhibitor is selected from HMCES . 14.如权利要求13所述的融合蛋白,其特征在于,所述HMCES的氨基酸序列包括:14. The fusion protein of claim 13, wherein the amino acid sequence of HMCES includes: 1)如SEQ ID NO.1所示的氨基酸序列;或,1) The amino acid sequence shown in SEQ ID NO.1; or, 2)与SEQ ID NO.1具有80%以上序列相似性、且具有1)所限定的氨基酸序列的功能的氨基酸序列。2) An amino acid sequence having at least 80% sequence similarity with SEQ ID NO. 1 and having the function of the amino acid sequence defined in 1). 15.如权利要求12所述的融合蛋白,其特征在于,所述融合蛋白从N端至C端依次包括无碱基位点保护多肽片段、胞嘧啶脱氨酶片段和核酸切口酶片段。15. The fusion protein of claim 12, wherein the fusion protein includes an abasic site protected polypeptide fragment, a cytosine deaminase fragment and a nucleic acid nickase fragment in order from the N-terminus to the C-terminus. 16.如权利要求12所述的融合蛋白,其特征在于,所述融合蛋白还包括尿嘧啶糖基化蛋白片段和/或核定位片段,优选地,所述胞嘧啶脱氨酶片段的N端连接有尿嘧啶糖基化蛋白片段和/或核定位片段。16. The fusion protein of claim 12, wherein the fusion protein further includes a uracil glycosylated protein fragment and/or a nuclear localization fragment, preferably the N-terminus of the cytosine deaminase fragment. Attached are uracil glycosylated protein fragments and/or nuclear localization fragments. 17.如权利要求12所述的融合蛋白,其特征在于,所述融合蛋白包括氨基酸序列包括如SEQ ID NO.19或SEQ ID NO.20或SEQ ID NO.21所示序列。17. The fusion protein of claim 12, wherein the fusion protein includes an amino acid sequence including the sequence shown in SEQ ID NO. 19, SEQ ID NO. 20, or SEQ ID NO. 21. 18.一种多核苷酸,其特征在于,编码如权利要求12~17任一项所述的融合蛋白。18. A polynucleotide, characterized in that it encodes the fusion protein according to any one of claims 12 to 17. 19.一种核酸构建体,其特征在于,所述核酸构建体含有如权利要求18所述的多核苷酸。19. A nucleic acid construct, characterized in that the nucleic acid construct contains the polynucleotide of claim 18. 20.一种表达系统,其特征在于,所述表达系统含有如权利要求19所述的核酸构建体或基因组中整合有如权利要求18所述的多核苷酸。20. An expression system, characterized in that the expression system contains the nucleic acid construct according to claim 19 or the polynucleotide according to claim 18 is integrated into the genome. 21.如权利要求1~11任一所述的单碱基编辑系统、权利要求12~17任一所述的融合蛋白、权利要求18所述的多核苷酸、权利要求19所述的核酸构建体或权利要求20所述的表达系统在碱基编辑中的用途。21. The single base editing system of any one of claims 1 to 11, the fusion protein of any one of claims 12 to 17, the polynucleotide of claim 18, and the nucleic acid construct of claim 19 or the use of the expression system of claim 20 in base editing. 22.一种基因编辑方法,包括:通过如权利要求1~11任一项所述的碱基编辑系统或权利要求12~17任一所述的融合蛋白进行基因编辑。22. A gene editing method, comprising: performing gene editing through the base editing system according to any one of claims 1 to 11 or the fusion protein according to any one of claims 12 to 17. 23.HMCES作为无碱基位点保护多肽用于基因编辑或用于碱基编辑系统的用途。23. The use of HMCES as a base-site protected polypeptide for gene editing or base editing systems. 24.如权利要求23所述的用途,其特征在于,所述HMCES用于在单碱基编辑中减少DNA双链断裂损伤。24. The use of claim 23, wherein the HMCES is used to reduce DNA double-strand break damage in single base editing. 25.如权利要求23所述的用途,其特征在于,所述单碱基编辑为胞嘧啶至鸟嘌呤碱基编辑。25. The use according to claim 23, wherein the single base editing is cytosine to guanine base editing.
CN202210968850.5A 2022-08-12 2022-08-12 A single base editing system and its use Pending CN117586987A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210968850.5A CN117586987A (en) 2022-08-12 2022-08-12 A single base editing system and its use

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210968850.5A CN117586987A (en) 2022-08-12 2022-08-12 A single base editing system and its use

Publications (1)

Publication Number Publication Date
CN117586987A true CN117586987A (en) 2024-02-23

Family

ID=89912118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210968850.5A Pending CN117586987A (en) 2022-08-12 2022-08-12 A single base editing system and its use

Country Status (1)

Country Link
CN (1) CN117586987A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115109798A (en) * 2021-03-09 2022-09-27 上海蓝十字医学科学研究所 Improved CG base editing system
CN115109798B (en) * 2021-03-09 2025-07-29 苏州齐禾生科生物科技有限公司 Improved CG base editing system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115109798A (en) * 2021-03-09 2022-09-27 上海蓝十字医学科学研究所 Improved CG base editing system
CN115109798B (en) * 2021-03-09 2025-07-29 苏州齐禾生科生物科技有限公司 Improved CG base editing system

Similar Documents

Publication Publication Date Title
US11713471B2 (en) Class II, type V CRISPR systems
CN110540991B (en) Enhancement of specificity of RNA-guided genome editing using truncated guide RNA (tru-gRNA)
US11946039B2 (en) Class II, type II CRISPR systems
Byrne et al. Genome editing in human stem cells
US10669539B2 (en) Methods and compositions for generating CRISPR guide RNA libraries
Xie et al. High-fidelity SaCas9 identified by directional screening in human cells
US10557151B2 (en) Somatic human cell line mutations
US10612043B2 (en) Methods of in vivo engineering of large sequences using multiple CRISPR/cas selections of recombineering events
EP3536796A1 (en) Gene knockout method
US12286654B2 (en) Base editing enzymes
CN110520528A (en) Hi-fi CAS9 variant and its application
CN105483118A (en) Gene editing technique taking Argonaute nuclease as core
US20230416710A1 (en) Engineered and chimeric nucleases
CN112746071B (en) A method and product for repairing HBB gene of hematopoietic stem cells
EP3940078A1 (en) Off-target single nucleotide variants caused by single-base editing and high-specificity off-target-free single-base gene editing tool
US20210355475A1 (en) Optimized base editors enable efficient editing in cells, organoids and mice
Huang et al. Engineered Cas12a-Plus nuclease enables gene editing with enhanced activity and specificity
US20210130838A1 (en) SYSTEMS AND METHODS FOR PLANT GENOME EDITING USING CAS 12a ORTHOLOGS
CN116286905B (en) Bovine-derived CRISPR/boCas9 gene editing system, method and application
US20230348877A1 (en) Base editing enzymes
JP7698578B2 (en) A DNA cutting tool based on the Cas9 protein from the biotechnologically important bacterium Clostridium cellulolyticum
US20240002834A1 (en) Adenine base editor lacking cytosine editing activity and use thereof
US20240110163A1 (en) Crispr-associated based-editing of the complementary strand
CN117377762A (en) Novel CRISPR gRNA
CN117586987A (en) A single base editing system and its use

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination