CN118531030A

CN118531030A - Expression cassette, recombinant vector, recombinant protein and application thereof

Info

Publication number: CN118531030A
Application number: CN202410653658.6A
Authority: CN
Inventors: 元英进; 王永安
Original assignee: Tianjin University Advanced Research Institute Of Synthetic Biology
Current assignee: Tianjin University Advanced Research Institute Of Synthetic Biology
Priority date: 2024-05-24
Filing date: 2024-05-24
Publication date: 2024-08-23

Abstract

本发明涉及生物工程领域，尤其涉及表达盒、重组载体、重组蛋白及其应用。本发明提供了表达盒，包括但不限于：前导肽、酶切位点、Linker和GLP‑1受体激动剂主链；所述GLP‑1受体激动剂主链具有：如SEQ ID NO:1所示的氨基酸序列。本发明在前期研究的基础上，进行了多串联重复序列的设计，并实验证明该方案的可行性，通过多串联重复序列，增加了司美格鲁肽主链的生物合成量，经过后续多步纯化后均能获得理想的蛋白常量，为后续实验提供更多的原料。The present invention relates to the field of bioengineering, and in particular to expression cassettes, recombinant vectors, recombinant proteins and their applications. The present invention provides an expression cassette, including but not limited to: a leader peptide, a restriction site, a linker and a GLP-1 receptor agonist main chain; the GLP-1 receptor agonist main chain has: an amino acid sequence as shown in SEQ ID NO: 1. Based on previous studies, the present invention has designed a multi-tandem repeat sequence, and experimentally proved the feasibility of the scheme. Through the multi-tandem repeat sequence, the biosynthesis amount of the semaglutide main chain is increased, and the ideal protein constant can be obtained after subsequent multi-step purification, providing more raw materials for subsequent experiments.

Description

Expression cassette, recombinant vector, recombinant protein and application thereof

技术领域Technical Field

本发明涉及生物工程领域，尤其涉及表达盒、重组载体、重组蛋白及其应用。The present invention relates to the field of bioengineering, and in particular to an expression box, a recombinant vector, a recombinant protein and applications thereof.

背景技术Background Art

糖尿病（Diabetes mellitus，简称DM）是一种常见的代谢性疾病，其主要特征是由于慢性高血糖的直接或间接影响而导致的器官功能障碍，可分为1型糖尿病和2型糖尿病两种主要类型。根据国际糖尿病联合会的数据，2019年全球糖尿病的患病率估计为9.3%，约有4.63亿人患有该疾病。预计到2030年，糖尿病的患病率将升至10.2%（约5.78亿人），到2045年将上升至10.9%（约7亿人）。糖尿病是全球主要的致死率和残疾率原因之一，其不仅威胁着患者的生活质量和寿命，还给个人和全球医疗系统带来了巨大的经济负担。Diabetes mellitus (DM) is a common metabolic disease, the main feature of which is organ dysfunction caused by the direct or indirect effects of chronic hyperglycemia. It can be divided into two main types: type 1 diabetes and type 2 diabetes. According to the International Diabetes Federation, the global prevalence of diabetes was estimated to be 9.3% in 2019, with approximately 463 million people suffering from the disease. It is estimated that by 2030, the prevalence of diabetes will rise to 10.2% (approximately 578 million people) and to 10.9% (approximately 700 million people) by 2045. Diabetes is one of the leading causes of mortality and disability worldwide. It not only threatens the quality of life and life expectancy of patients, but also imposes a huge economic burden on individuals and the global medical system.

肥胖症是一种复杂的慢性代谢性疾病，其主要特征为体内脂肪积聚超过正常水平，对健康产生不利影响。近年来，随着人们生活方式的转变和环境因素的影响，肥胖症的发病率持续上升，已成为全球性的健康挑战。据估计，全球肥胖症患病率约为14%，在某些地区更高达40%以上，相当于大约10亿人口。这一数字预计将于2030年增至15亿。Obesity is a complex chronic metabolic disease, the main feature of which is the accumulation of body fat exceeding normal levels, which has adverse effects on health. In recent years, with the change of people's lifestyles and the influence of environmental factors, the incidence of obesity has continued to rise, and has become a global health challenge. It is estimated that the global prevalence of obesity is about 14%, and in some regions it is as high as more than 40%, equivalent to about 1 billion people. This number is expected to increase to 1.5 billion by 2030.

近年来，多肽药物研发领域取得了显著进展，针对特定疾病目标的多肽药物的设计和开发更加精准。这一进展归功于靶向性的提升、新型制剂技术的引入、合成和修饰技术的不断改进，以及新颖的治疗策略的涌现。这些技术的改进能够有效提升多肽药物的稳定性、溶解度和生物分布，优化多肽药物的药代动力学特性，延长其血浆半衰期，从而提高其药效并减少不良反应。在此背景下，司美格鲁肽作为一种新兴的多肽药物，已在治疗糖尿病和成年人肥胖症方面展现出良好的疗效。In recent years, significant progress has been made in the field of peptide drug research and development, and the design and development of peptide drugs for specific disease targets have become more precise. This progress is attributed to the improvement of targeting, the introduction of new formulation technologies, the continuous improvement of synthesis and modification technologies, and the emergence of novel treatment strategies. These technological improvements can effectively improve the stability, solubility and biodistribution of peptide drugs, optimize the pharmacokinetic properties of peptide drugs, and prolong their plasma half-life, thereby improving their efficacy and reducing adverse reactions. In this context, semaglutide, as an emerging peptide drug, has shown good efficacy in the treatment of diabetes and adult obesity.

司美格鲁肽是由丹麦制药公司Novo Nordisk开发的一种合成类似物，属于GLP-1受体激动剂，用于治疗2型糖尿病。GLP-1是一种由肠细胞产生的胰岛素释放激素，具有促进胰岛素分泌、抑制胰高血糖素分泌以及降低食欲的功能；2017年，FDA批准了品牌名为Wegovy的司美格鲁肽的注射剂，用于治疗成人肥胖症；2024年1月，中国国家药监局批准了首款口服司美格鲁肽用于治疗2型糖尿病，这也是国内首个获批上市的口服 GLP-1 受体激动剂。Semaglutide is a synthetic analog developed by Danish pharmaceutical company Novo Nordisk. It is a GLP-1 receptor agonist used to treat type 2 diabetes. GLP-1 is an insulin-releasing hormone produced by intestinal cells, which has the functions of promoting insulin secretion, inhibiting glucagon secretion and reducing appetite. In 2017, the FDA approved the injection of semaglutide under the brand name Wegovy for the treatment of adult obesity. In January 2024, the China National Medical Products Administration approved the first oral semaglutide for the treatment of type 2 diabetes, which is also the first oral GLP-1 receptor agonist approved for marketing in China.

如图1所示，司美格鲁肽由具有一个长侧链修饰的31个氨基酸组成。由于GLP-1主链的半衰期极短，只有1-2分钟左右，为了防止该多肽的降解，需要对该多肽进行相关的改造修饰以延长药物的半衰期和减小药物的毒副作用。因此，该多肽的第8位氨基酸（丙氨酸）被替换为了非天然氨基酸（a-氨基异丁酸），以防止被DPP4酶导致的GLP-1的快速降解以达到延长半衰期；在底26位赖氨酸进行了长侧链的修饰（为谷氨酸连接的PEG和C18脂肪二酸链），该修饰增加了多肽的亲水性和稳定性，防止该多肽的降解，有效延长司美格鲁肽的半衰期。As shown in Figure 1, semaglutide is composed of 31 amino acids with a long side chain modification. Since the half-life of the GLP-1 main chain is extremely short, only about 1-2 minutes, in order to prevent the degradation of the polypeptide, the polypeptide needs to be modified to extend the half-life of the drug and reduce the toxic side effects of the drug. Therefore, the 8th amino acid (alanine) of the polypeptide was replaced with a non-natural amino acid (a-aminoisobutyric acid) to prevent the rapid degradation of GLP-1 caused by the DPP4 enzyme to achieve the purpose of extending the half-life; the 26th lysine at the bottom was modified with a long side chain (PEG and C18 fatty diacid chain connected to glutamic acid), which increased the hydrophilicity and stability of the polypeptide, prevented the degradation of the polypeptide, and effectively extended the half-life of semaglutide.

生物合成主要以大肠杆菌（E.Coli）和毕赤酵母（Pichia pastoris）为主，通过合理设计多肽序列，利用特殊工具酶进行切割并最终获得司美格鲁肽主链的方式。The biosynthesis is mainly based on Escherichia coli (E.Coli) and Pichia pastoris. The peptide sequence is rationally designed and cut using special tool enzymes to finally obtain the main chain of semaglutide.

发明内容Summary of the invention

有鉴于此，本发明提供了表达盒、重组载体、重组蛋白及其应用。本发明在前期研究的基础上，进行了多串联重复序列的设计，并实验证明该方案的可行性，通过多串联重复序列，增加了司美格鲁肽主链的生物合成量，经过后续纯化后均能获得理想的蛋白常量，为单一结构蛋白多串联表达提供了可行的方案，解决了单一结构蛋白多串联重复序列无法正常表达和组装的问题，并且可以通过引入特殊蛋白酶切位点，可通过后期酶切获得完整的目的蛋白序列。In view of this, the present invention provides an expression cassette, a recombinant vector, a recombinant protein and its application. Based on the previous research, the present invention designs a multi-tandem repeat sequence and experimentally proves the feasibility of the scheme. The multi-tandem repeat sequence increases the biosynthesis amount of the semaglutide main chain, and after subsequent purification, the ideal protein constant can be obtained, which provides a feasible scheme for the multi-tandem expression of a single structural protein, solves the problem that the multi-tandem repeat sequence of a single structural protein cannot be expressed and assembled normally, and can introduce a special protease cleavage site, and obtain a complete target protein sequence by late enzymatic cleavage.

为了实现上述发明目的，本发明提供以下技术方案：In order to achieve the above-mentioned invention object, the present invention provides the following technical solutions:

本发明提供了表达盒，包括但不限于：前导肽、酶切位点、Linker和GLP-1受体激动剂主链；The present invention provides an expression cassette, including but not limited to: a leader peptide, an enzyme cleavage site, a linker and a GLP-1 receptor agonist backbone;

所述GLP-1受体激动剂主链具有：The GLP-1 receptor agonist backbone has:

（1）、如SEQ ID NO:1所示的氨基酸序列；或(1) the amino acid sequence shown in SEQ ID NO: 1; or

（2）、在如（1）所示的氨基酸序列的基础上经取代、缺失、添加和/或替换1个或多个氨基酸的序列；或(2) A sequence in which one or more amino acids are substituted, deleted, added and/or replaced based on the amino acid sequence shown in (1); or

（3）、与如（1）所示的氨基酸序列同源性90%以上的序列。(3) A sequence having a homology of more than 90% with the amino acid sequence shown in (1).

在本发明的一些实施方案中，上述表达盒中，SEQ ID NO:1的序列为：EGTFTSDVSSYLEGQAAKEFIAWLVRGRG。In some embodiments of the present invention, in the above expression cassette, the sequence of SEQ ID NO: 1 is: EGTFTSDVSSYLEGQAAKEFIAWLVRGRG.

在本发明的一些实施方案中，上述表达盒中，编码所述GLP-1受体激动剂主链的核酸分子具有如SEQ ID NO:4~SEQ ID NO:9任意所示的序列；In some embodiments of the present invention, in the above expression cassette, the nucleic acid molecule encoding the GLP-1 receptor agonist backbone has a sequence as shown in any of SEQ ID NO:4 to SEQ ID NO:9;

SEQ ID NO:4的序列为：GAAGGCACGTTTACCTCTGATGTGAGCTCTTATTTAGAAGGCCAGGCGGCTAAAGAATTTATTGCGTGGCTTGTGCGCGGCCGCGGC；The sequence of SEQ ID NO:4 is: GAAGGCACGTTTACCTCTGATGTGAGCTCTTATTTAGAAGGCCAGGCGGCTAAAGAATTTATTGCGTGGCTTGTGCGCGGCCGCGGC;

SEQ ID NO:5的序列为：GAAGGCACCTTTACCAGCGATGTGAGCAGCTATCTGGAAGGCCAGGCGGCGAAAGAGTTTATTGCGTGGTTAGTGCGCGGTCGCGGT；The sequence of SEQ ID NO:5 is: GAAGGCACCTTTACCAGCGATGTGAGCAGCTATCTGGAAGGCCAGGCGGCGAAAGAGTTTATTGCGTGGTTAGTGCGCGGTCGCGGT;

SEQ ID NO:6的序列为：GAAGGCACCTTTACGAGCGATGTGAGCAGCTATTTAGAAGGTCAGGCGGCGAAAGAATTCATTGCGTGGTTAGTGCGTGGTCGTGGT；The sequence of SEQ ID NO: 6 is: GAAGGCACCTTTACGAGCGATGTGAGCAGCTATTTAGAAGGTCAGGCGGCGAAAGAATTCATTGCGTGGTTAGTGCGTGGTCGTGGT;

SEQ ID NO:7的序列为：GAGGGCACCTTTACCTCCGATGTGAGCAGCTATTTGGAAGGCCAGGCGGCGAAGGAATTTATTGCGTGGCTGGTTCGTGGTCGCGGT；The sequence of SEQ ID NO:7 is: GAGGGCACCTTTACCTCCGATGTGAGCAGCTATTTGGAAGGCCAGGCGGCGAAGGAATTTATTGCGTGGCTGGTTCGTGGTCGCGGT;

SEQ ID NO:8的序列为：GAAGGCACCTTTACCTCTGATGTGAGCAGCTATCTCGAAGGCCAGGCGGCCAAAGAATTTATTGCATGGCTGGTTCGTGGCCGAGGT；The sequence of SEQ ID NO: 8 is: GAAGGCACCTTTACCTCTGATGTGAGCAGCTATCTCGAAGGCCAGGCGGCCAAAGAATTTATTGCATGGCTGGTTCGTGGCCGAGGT;

SEQ ID NO:9的序列为：GAAGGCACGTTTACCAGCGATGTGAGCTCTTACCTGGAAGGCCAGGCGGCAAAAGAATTTATCGCGTGGTTGGTGCGTGGTCGTGGC。The sequence of SEQ ID NO: 9 is: GAAGGCACGTTTACCAGCGATGTGAGCTCTTACCTGGAAGGCCAGGCGGCAAAAGAATTTATCGCGTGGTTGGTGCGTGGTCGTGGC.

在本发明的一些实施方案中，上述表达盒中，所述GLP-1受体激动剂的数量不少于6个。In some embodiments of the present invention, in the above expression cassette, the number of the GLP-1 receptor agonists is no less than 6.

在本发明的一些实施方案中，上述表达盒中，所述GLP-1受体激动剂的数量为6、7、8、9个。In some embodiments of the present invention, in the above expression cassette, the number of the GLP-1 receptor agonists is 6, 7, 8, or 9.

在本发明的一些实施方案中，上述表达盒中，所述酶切位点包括：EK酶酶切位点、KEX2酶酶切位点和CPB酶酶切位点中的一种或多种。In some embodiments of the present invention, in the above expression cassette, the restriction enzyme cleavage site comprises: one or more of an EK enzyme cleavage site, a KEX2 enzyme cleavage site and a CPB enzyme cleavage site.

在本发明的一些实施方案中，上述表达盒中，所述EK酶酶切位点位于第4个所述表达盒的N端。In some embodiments of the present invention, in the above expression cassettes, the EK enzyme cleavage site is located at the N-terminus of the fourth expression cassette.

在本发明的一些实施方案中，上述表达盒中，所述KEX2酶酶切位点和所述CPB酶酶切位点位于所述GLP-1受体激动剂主链的N端。In some embodiments of the present invention, in the above expression cassette, the KEX2 enzyme cleavage site and the CPB enzyme cleavage site are located at the N-terminus of the GLP-1 receptor agonist backbone.

在本发明的一些实施方案中，上述表达盒中，所述EK酶酶切位点的氨基酸序列为：DDDDK。In some embodiments of the present invention, in the above expression cassette, the amino acid sequence of the EK enzyme cleavage site is: DDDDK.

在本发明的一些实施方案中，上述表达盒中，编码所述EK酶酶切位点的核酸分子的序列如SEQ ID NO:10或SEQ ID NO:11所示：GATGATGATGACAAG（SEQ ID NO:10）或GACGACGACGACAAG（SEQ ID NO:11）。In some embodiments of the present invention, in the above expression cassette, the sequence of the nucleic acid molecule encoding the EK enzyme cleavage site is as shown in SEQ ID NO: 10 or SEQ ID NO: 11: GATGATGATGACAAG (SEQ ID NO: 10) or GACGACGACGACAAG (SEQ ID NO: 11).

在本发明的一些实施方案中，上述表达盒中，所述KEX2酶酶切位点的氨基酸序列为：RR。In some embodiments of the present invention, in the above expression cassette, the amino acid sequence of the KEX2 enzyme cleavage site is: RR.

在本发明的一些实施方案中，上述表达盒中，编码所述KEX2酶酶切位点的核酸分子的序列如SEQ ID NO:12~SEQ ID NO:15任意所示：CGCCGT（SEQ ID NO:12）、AGACGT（SEQID NO:13）、GCCGCT（SEQ ID NO:14）或CGTCGC（SEQ ID NO:15）。In some embodiments of the present invention, in the above expression cassette, the sequence of the nucleic acid molecule encoding the KEX2 enzyme cleavage site is shown in any of SEQ ID NO:12 to SEQ ID NO:15: CGCCGT (SEQ ID NO:12), AGACGT (SEQ ID NO:13), GCCGCT (SEQ ID NO:14) or CGTCGC (SEQ ID NO:15).

在本发明的一些实施方案中，上述表达盒中，所述CPB酶酶切位点的氨基酸序列为：R。In some embodiments of the present invention, in the above expression cassette, the amino acid sequence of the CPB enzyme cleavage site is: R.

在本发明的一些实施方案中，上述表达盒中，编码所述CPB酶酶切位点的核酸分子的序列如SEQ ID NO:16~SEQ ID NO:18任意所示：CGC（SEQ ID NO:16）、CGT（SEQ ID NO:17）或AGA（SEQ ID NO:18）。In some embodiments of the present invention, in the above expression cassette, the sequence of the nucleic acid molecule encoding the CPB enzyme cleavage site is shown in any of SEQ ID NO:16 to SEQ ID NO:18: CGC (SEQ ID NO:16), CGT (SEQ ID NO:17) or AGA (SEQ ID NO:18).

在本发明的一些实施方案中，上述表达盒中，所述前导肽具有：In some embodiments of the present invention, in the above expression cassette, the leader peptide has:

（4）、如SEQ ID NO:2所示的氨基酸序列；或(4) the amino acid sequence shown in SEQ ID NO: 2; or

（5）、在如（4）所示的氨基酸序列的基础上经取代、缺失、添加和/或替换1个或多个氨基酸的序列；或(5) A sequence in which one or more amino acids are substituted, deleted, added and/or replaced based on the amino acid sequence shown in (4); or

（6）、与如（4）所示的氨基酸序列同源性90%以上的序列。(6) A sequence having a homology of more than 90% with the amino acid sequence shown in (4).

在本发明的一些实施方案中，上述表达盒中，SEQ ID NO:2的序列为：LVPRGSGMKETAAAKFERQHMDSPDLGTDDDDKAMADIGSMRLNSA。In some embodiments of the present invention, in the above expression cassette, the sequence of SEQ ID NO: 2 is: LVPRGSGMKETAAAKFERQHMDSPDLGTDDDDKAMADIGSMRLNSA.

在本发明的一些实施方案中，上述表达盒中，编码所述前导肽的核酸分子具有如SEQ ID NO:22所示的序列：CTGGTGCCACGCGGTTCTGGTATGAAAGAAACCGCTGCTGCTAAATTCGAACGCCAGCACATGGACAGCCCAGATCTGGGTACCGACGACGACGACAAGGCCATGGCTGATATCGGATCCATGCGCCTGAACAGCGCG。In some embodiments of the present invention, in the above-mentioned expression cassette, the nucleic acid molecule encoding the leader peptide has a sequence as shown in SEQ ID NO:22: CTGGTGCCACGCGGTTCTGGTATGAAAGAAACCGCTGCTGCTAAATTCGAACGCCAGCACATGGACAGCCCAGATCTGGGTACCGACGACGACGACAAGGCCATGGCTGATATCGGATCCATGCGCCTGAACAGCGCG.

在本发明的一些实施方案中，上述表达盒中，所述Linker具有：In some embodiments of the present invention, in the above expression cassette, the Linker has:

（7）、如SEQ ID NO:3所示的氨基酸序列；或(7) the amino acid sequence shown in SEQ ID NO: 3; or

（8）、在如（7）所示的氨基酸序列的基础上经取代、缺失、添加和/或替换1个或多个氨基酸的序列；或(8) A sequence in which one or more amino acids are substituted, deleted, added and/or replaced based on the amino acid sequence shown in (7); or

（9）、与如（7）所示的氨基酸序列同源性90%以上的序列。(9) A sequence having a homology of more than 90% with the amino acid sequence shown in (7).

在本发明的一些实施方案中，上述表达盒中，SEQ ID NO:3的序列为：GSGSEEGSGS。In some embodiments of the present invention, in the above expression cassette, the sequence of SEQ ID NO: 3 is: GSGSEEGSGS.

在本发明的一些实施方案中，上述表达盒中，编码所述Linker的核酸分子的核苷酸序列如SEQ ID NO:19所示：GGTAGCGGTTCTGAGGAAGGTTCTGGAAGC。In some embodiments of the present invention, in the above expression cassette, the nucleotide sequence of the nucleic acid molecule encoding the Linker is shown in SEQ ID NO: 19: GGTAGCGGTTCTGAGGAAGGTTCTGGAAGC.

在本发明的一些实施方案中，上述表达盒中，还包括：Trx-6x His标签。In some embodiments of the present invention, the above expression cassette further comprises: a Trx-6x His tag.

在本发明的一些实施方案中，上述表达盒中，所述Trx-6x His标签的氨基酸序列如SEQ ID NO:20所示：MSDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLAGSGSGHMHHHHHHSSG。In some embodiments of the present invention, in the above expression cassette, the amino acid sequence of the Trx-6x His tag is as shown in SEQ ID NO: 20: MSDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLAGSGSGHMHHHHHHSSG.

在本发明的一些实施方案中，上述表达盒中，编码所述Trx-6x His标签的核苷酸序列如SEQ ID NO:21所示：ATGAGCGATAAAATTATTCACCTGACTGACGACAGTTTTGACACGGATGTACTCAAAGCGGACGGGGCGATCCTCGTCGATTTCTGGGCAGAGTGGTGCGGTCCGTGCAAAATGATCGCCCCGATTCTGGATGAAATCGCTGACGAATATCAGGGCAAACTGACCGTTGCAAAACTGAACATCGATCAAAACCCTGGCACTGCGCCGAAATATGGCATCCGTGGTATCCCGACTCTGCTGCTGTTCAAAAACGGTGAAGTGGCGGCAACCAAAGTGGGTGCACTGTCTAAAGGTCAGTTGAAAGAGTTCCTCGACGCTAACCTGGCCGGTTCTGGTTCTGGCCATATGCACCATCATCATCATCATTCTTCTGGT。In some embodiments of the present invention, in the above expression cassette, the nucleotide sequence encoding the Trx-6x His tag is as shown in SEQ ID NO:21: ATGAGCGATAAAATTATTCACCTGACTGACGACAGTTTTGACACGGATGTACTCAAAGCGGACGGGGCGATCCTCGTCGATTTCTGGGCAGAGTGGTGCGGTCCGTGCAAAATGATCGCCCCGATTCTGGATGAAATCGCTGACGAATATCAGGGCAAACTGACCGTTGCAAAACTGAACATCGATCAAAACCCTGGCACTGCGCCGAAATATGGCATCCGTGGTATCCCGACTCTGCTGCTGTTCAAAAACGGTGAAGTGGCGGCAACCAAAGTGGGTGCACTGTCTAAAGGTCAGTTGAAAGAGTTCCTCGACGCTAACCTGGCCGGTTCTGGTTCTGGCCATATGCACCATCATCATCATCATTCTTCTGGT.

在本发明的一些实施方案中，上述表达盒中，所述Trx-6x His标签位于所述前导肽的N端。In some embodiments of the present invention, in the above expression cassette, the Trx-6x His tag is located at the N-terminus of the leader peptide.

本发明还提供了重组载体，包括：上述表达盒。The present invention also provides a recombinant vector, comprising: the above expression cassette.

在本发明的一些实施方案中，上述重组载体中，所述表达盒的数量大于等于1个。In some embodiments of the present invention, in the above-mentioned recombinant vector, the number of the expression cassette is greater than or equal to one.

在本发明的一些实施方案中，上述重组载体中，所述表达盒的数量为1、2或3个。In some embodiments of the present invention, in the above-mentioned recombinant vector, the number of the expression cassettes is 1, 2 or 3.

本发明还提供了宿主，转化和/或转染上述重组载体。The present invention also provides a host for transforming and/or transfecting the above recombinant vector.

本发明还提供了重组蛋白，经上述宿主表达和变复性后获得。The present invention also provides a recombinant protein obtained after being expressed and renatured by the above host.

本发明还提供了上述表达盒、上述重组载体、上述宿主和/或如上述蛋白在提高GLP-1受体激动剂的产量和/或表达量中的应用。The present invention also provides the use of the above expression cassette, the above recombinant vector, the above host and/or the above protein in increasing the production and/or expression level of the GLP-1 receptor agonist.

本发明优化设计多串联重复序列，在现有专利基础上，大大增加串联重复数，最高达24个串联重复序列，为单一结构蛋白多串联表达提供了可行的方案，解决了单一结构蛋白多串联重复序列无法正常表达和组装的问题，并且可以通过引入特殊蛋白酶切位点，可通过后期酶切获得完整的目的蛋白序列。The present invention optimizes the design of multiple tandem repeat sequences and greatly increases the number of tandem repeats on the basis of existing patents, up to 24 tandem repeat sequences, providing a feasible solution for the multiple tandem expression of single structural proteins, solving the problem that multiple tandem repeat sequences of single structural proteins cannot be expressed and assembled normally, and by introducing special protease cleavage sites, the complete target protein sequence can be obtained by later enzymatic cleavage.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art are briefly introduced below.

图1示司美格鲁肽药物结构示意图；Figure 1 shows a schematic diagram of the drug structure of semaglutide;

图2示多串联重复序列的设计原理；Figure 2 shows the design principle of multiple tandem repeat sequences;

图3示本发明设计的多串联序列数；FIG3 shows the number of multiple series sequences designed by the present invention;

图4示蛋白纯化SDS-PAGE图；Fig. 4 shows the SDS-PAGE image of protein purification;

图5示多表达盒蛋白纯化SDS-PAGE图。FIG. 5 shows the SDS-PAGE image of the protein purification of the multiple expression cassette.

具体实施方式DETAILED DESCRIPTION

本发明公开了表达盒、重组载体、重组蛋白及其应用。The invention discloses an expression box, a recombinant vector, a recombinant protein and applications thereof.

应该理解，表述“……中的一种或多种”单独地包括每个在所述表述后叙述的物体以及所述叙述的物体中的两者或更多者的各种不同组合，除非从上下文和用法中另有理解。与三个或更多个叙述的物体相结合的表述“和/或”应该被理解为具有相同的含义，除非从上下文另有理解。It should be understood that the expression "one or more of..." includes individually each of the objects recited after the expression and various different combinations of two or more of the recited objects, unless otherwise understood from the context and usage. The expression "and/or" in combination with three or more recited objects should be understood to have the same meaning, unless otherwise understood from the context.

术语“包括”、“具有”或“含有”，包括其语法同义语的使用，通常应该被理解为开放性和非限制性的，例如不排除其他未叙述的要素或步骤，除非另有具体陈述或从上下文另有理解。The use of the terms "comprising", "having" or "containing", including their grammatical synonyms, should generally be understood as open and non-restrictive, for example not excluding other unrecited elements or steps, unless otherwise specifically stated or otherwise understood from the context.

应该理解，只要本发明仍可操作，步骤的顺序或执行某些行动的顺序并不重要。此外，两个或更多个步骤或行动可以同时进行。It should be understood that the order of steps or the order in which certain actions are performed is not important as long as the present invention remains operable. In addition, two or more steps or actions may be performed simultaneously.

本文中的任何和所有实例或示例性语言如“例如”或“包括”的使用，仅仅打算更好地说明本发明，并且除非提出权利要求，否则不对本发明的范围构成限制。本说明书中的任何语言都不应解释为指示任何未要求保护的要素对于本发明的实践是必不可少的。The use of any and all examples or exemplary language, such as "for example" or "including", herein is intended only to better illustrate the invention and does not limit the scope of the invention unless otherwise claimed. No language in this specification should be construed as indicating that any non-claimed element is essential to the practice of the invention.

此外，用以界定本发明的数值范围与参数皆是约略的数值，此处已尽可能精确地呈现具体实施例中的相关数值。然而，任何数值本质上不可避免地含有因个别测试方法所致的标准偏差。因此，除非另有明确的说明，应当理解本公开所用的所有范围、数量、数值与百分比均经过“约”的修饰。在此处，“约”通常是指实际数值在一特定数值或范围的正负10%、5%、1%或0.5%之内。In addition, the numerical ranges and parameters used to define the present invention are approximate values, and the relevant values in the specific embodiments have been presented as accurately as possible. However, any numerical value inherently inevitably contains standard deviations due to individual test methods. Therefore, unless otherwise expressly stated, it should be understood that all ranges, quantities, values and percentages used in this disclosure are modified by "about". Here, "about" generally means that the actual value is within plus or minus 10%, 5%, 1% or 0.5% of a specific value or range.

本发明实施例1~实施例6和效果例中，实验中使用的不同培养基及Buffer的配方：In Examples 1 to 6 of the present invention and the effect examples, the formulas of different culture media and buffers used in the experiments are as follows:

LB培养基（1 L）:10 g蛋白胨，10 g氯化钠，5 g酵母提取物；LB medium (1 L): 10 g peptone, 10 g sodium chloride, 5 g yeast extract;

TB培养基/发酵培养基（1 L）:18g 蛋白胨，12 g 酵母提取物，10 g氯化钠；TB medium/fermentation medium (1 L): 18 g peptone, 12 g yeast extract, 10 g sodium chloride;

补料培养基（1 L）:25~65%葡萄糖，20 g蛋白胨，10 g酵母提取物；Feed medium (1 L): 25-65% glucose, 20 g peptone, 10 g yeast extract;

Buffer A：50 mM Tirs·HCl pH 6.5~8.5,300 mM 氯化钠；Buffer A: 50 mM Tirs·HCl pH 6.5-8.5, 300 mM NaCl;

Buffer B：50 mM Tirs·HCl pH 6.5~8.5,300 mM 氯化钠，2 M 尿素；Buffer B: 50 mM Tirs·HCl pH 6.5-8.5, 300 mM NaCl, 2 M urea;

Buffer C：50 mM Tirs·HCl pH 6.5~8.5,300 mM 氯化钠，8 M 尿素；Buffer C: 50 mM Tirs·HCl pH 6.5-8.5, 300 mM NaCl, 8 M urea;

Buffer D：50 mM Tirs·HCl pH 6.5~8.5,300 mM 氯化钠，梯度尿素(4-2-1-0 )M。Buffer D: 50 mM Tirs·HCl pH 6.5~8.5, 300 mM sodium chloride, gradient urea (4-2-1-0 )M.

本发明实施例1~实施例6和效果例中，所用原料及试剂均可由市场购得。In Examples 1 to 6 and the Effect Examples of the present invention, the raw materials and reagents used can be purchased from the market.

下面结合实施例，进一步阐述本发明：The present invention will be further described below in conjunction with embodiments:

实施例1 串联重复序列设计原理和路线Example 1 Tandem repeat sequence design principles and routes

本专利设计原理如图2所示。以pQLinkN/pET-32a为表达载体。序列设计的原则以多个司美格鲁肽主链串联重复进行。如图2所示，在N-端添加了Trx-标签，此序列为促进蛋白翻译；随后添加EK酶（肠激酶）切位点（DDDDK-，Asp-Asp-Asp-Asp-Lys-）通过串联重复多个多拷贝序列，每两个司美格鲁肽主链串联重复之间以序列“-RR-GSGSEEGSGS-RR-”或“-RR-GSGSEEGSGS-DDDDK-”的linker相连接。The design principle of this patent is shown in Figure 2. pQLinkN/pET-32a is used as the expression vector. The principle of sequence design is to repeat multiple semaglutide backbones in tandem. As shown in Figure 2, a Trx-tag is added to the N-terminus. This sequence is to promote protein translation; then an EK enzyme (enterokinase) cleavage site (DDDDK-, Asp-Asp-Asp-Asp-Lys-) is added to repeat multiple copies of the sequence in tandem, and each two semaglutide backbone tandem repeats are connected by a linker with the sequence "-RR-GSGSEEGSGS-RR-" or "-RR-GSGSEEGSGS-DDDDK-".

其中，序列“DDDDK-”能被EK酶（肠激酶）从C-端特异性识别并切割；“RR-”序列能被KEX2酶（丝氨酸蛋白酶）进行切割，因为KEX2酶能够特异性识别RR或KR两个碱性氨基酸残基，并从这两个残基的C-端进行切割；“-R-R”则被CPB酶（羧肽酶B）切割，因为CPB酶具有特异性识别K或R两个碱性氨基酸的外切酶活性，从C-端依次切除这两个多余氨基酸，最终形成完整的司美格鲁肽主链。图2为多串联重复序列的设计原理，图3为本发明涉及的多串联序列。Among them, the sequence "DDDDK-" can be specifically recognized and cut by the EK enzyme (enterokinase) from the C-terminus; the "RR-" sequence can be cut by the KEX2 enzyme (serine protease), because the KEX2 enzyme can specifically recognize the two basic amino acid residues of RR or KR, and cut from the C-terminus of these two residues; "-R-R" is cut by the CPB enzyme (carboxypeptidase B), because the CPB enzyme has the exonuclease activity of specifically recognizing the two basic amino acids of K or R, and sequentially removes the two redundant amino acids from the C-terminus, and finally forms a complete semaglutide backbone. Figure 2 shows the design principle of the multiple tandem repeat sequence, and Figure 3 shows the multiple tandem sequence involved in the present invention.

具体包括如下步骤：从表达盒阅读框起始密码子（ATG）开始，此为N-端；到表达盒读框起终止码子（TGA）结束，此为C-端；如无特殊说明，本专利的表达设计顺序为N-端开始至C-端结束。Specifically, the steps include: starting from the start codon (ATG) of the expression cassette reading frame, which is the N-terminus; ending at the termination codon (TGA) of the expression cassette reading frame, which is the C-terminus; unless otherwise specified, the expression design sequence of this patent starts from the N-terminus and ends at the C-terminus.

实施例2 载体的选择和构建技术Example 2 Vector selection and construction technology

载体选择pQLinkN为表达载体。该载体的特点是能够在单个载体上通过重组的方式，串联多个独立运行的多拷贝克隆表达盒，理论是该载体可具备N个多克隆表达盒，前期研究证实独立完成蛋白表的的多克隆表达盒不少于5个，即可以一个载体表达5个蛋白。pQLinkN was selected as the expression vector. The characteristic of this vector is that it can connect multiple independently running multi-copy clone expression cassettes in series through recombination on a single vector. Theoretically, this vector can have N multi-clone expression cassettes. Previous studies have confirmed that there are no less than 5 multi-clone expression cassettes that can independently complete protein expression, that is, one vector can express 5 proteins.

因此，本发明在进行了司美格鲁肽主链多串联重复序列的同时，以该串联序列为一个拷贝单位，再次进行了多克隆基因表达盒的构建，此时蛋白的主链的蛋白表达理论值为多克隆重复序列的倍数增长，以5串联重复序列为例：5×重复序列为一个拷贝单位，建立2个基因表达盒则主链的数量为2倍即10个重复序列（5×2），建立3个基因表达盒则主链的数量为3倍即15个重复序列（5×3），以此重复。Therefore, in the present invention, while constructing the multiple tandem repeat sequence of the semaglutide backbone, the tandem sequence is used as a copy unit to construct a polyclonal gene expression cassette again. At this time, the theoretical protein expression value of the backbone of the protein is a multiple increase of the polyclonal repeat sequence. Taking 5 tandem repeat sequences as an example: 5× repeat sequences are a copy unit, and when 2 gene expression cassettes are established, the number of backbones is 2 times, that is, 10 repeat sequences (5×2), and when 3 gene expression cassettes are established, the number of backbones is 3 times, that is, 15 repeat sequences (5×3), and this is repeated.

本发明选用的载体为商用载体pET-32a，并选择Nco1（CCATGG）和Xho1（CTCGA）进行序列插入。首先设计单表达盒的重复序列6-9个串联，以下标记为6 X - 9 X；然后设计多表达盒重复，来增加大于10 X的串联重复序列，本发明设计了单表达盒最高9个多串联表达序列、多表达盒最高3个即24个多串联表达序列。The vector selected by the present invention is the commercial vector pET-32a, and Nco1 (CCATGG) and Xho1 (CTCGA) are selected for sequence insertion. First, 6-9 tandem repeat sequences of a single expression cassette are designed, which are hereinafter marked as 6 X - 9 X; then multiple expression cassette repeats are designed to increase the tandem repeat sequence greater than 10 X. The present invention designs a single expression cassette with a maximum of 9 multiple tandem expression sequences and a multiple expression cassette with a maximum of 3, that is, 24 multiple tandem expression sequences.

本发明涉及以下组合：The present invention relates to the following combination:

6 X：N-leading peptide-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-EK-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB；（CPB-GLP-1主链序列的顺序：SEQ ID NO:4-SEQ ID NO:5-SEQ ID NO:6-SEQ IDNO:7-SEQ ID NO:8-SEQ ID NO:9）6 X: N-leading peptide-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-EK-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB; (the order of the CPB-GLP-1 backbone sequence: SEQ ID NO:4-SEQ ID NO:5-SEQ ID NO:6-SEQ ID NO:7-SEQ ID NO:8-SEQ ID NO:9)

7 X：N-leading peptide-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-EK-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB；（CPB-GLP-1主链序列的顺序：SEQ IDNO:5-SEQ ID NO:6-SEQ ID NO:7-SEQ ID NO:4-SEQ ID NO:6-SEQ ID NO:9-SEQ ID NO:5）7 X: N-leading peptide-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-EK-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB; (the order of the CPB-GLP-1 backbone sequence: SEQ ID NO: 5-SEQ ID NO: 6-SEQ ID NO: 7-SEQ ID NO: 4-SEQ ID NO: 6-SEQ ID NO: 9-SEQ ID NO: 5)

8 X：N-leading peptide-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-EK-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-EK-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB；（CPB-GLP-1主链序列的顺序：SEQ ID NO:5-SEQ ID NO:6-SEQ ID NO:7-SEQ IDNO:4-SEQ ID NO:6-SEQ ID NO:9-SEQ ID NO:5-SEQ ID NO:6）8 X: N-leading peptide-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-EK-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-EK-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB; (the order of the CPB-GLP-1 backbone sequence: SEQ ID NO: 5-SEQ ID NO: 6-SEQ ID NO: 7-SEQ ID NO: 8-SEQ ID NO: 9-SEQ ID NO: 10-SEQ ID NO: 11-SEQ ID NO: 12-SEQ ID NO: 13-SEQ ID NO: 14-SEQ ID NO: 15 NO:6-SEQ ID NO:9-SEQ ID NO:5-SEQ ID NO:6)

9 X：N-leading peptide-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-EK-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-EK-KEX2/CPB-linker-EK-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB-linker-KEX2/CPB-GLP-1主链-KEX2/CPB；（CPB-GLP-1主链序列的顺序：SEQ IDNO:5-SEQ ID NO:6-SEQ ID NO:7-SEQ ID NO:4-SEQ ID NO:5-SEQ ID NO:6-SEQ ID NO:7-SEQ ID NO:8-SEQ ID NO:9）9 X：N-leading peptide-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-EK-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-EK-KEX2/CPB-linker-EK-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB-linker-KEX2/CPB-GLP-1 backbone-KEX2/CPB; (sequence of CPB-GLP-1 backbone: SEQ ID NO: 5-SEQ ID NO: 6). NO:6-SEQ ID NO:7-SEQ ID NO:4-SEQ ID NO:5-SEQ ID NO:6-SEQ ID NO:7-SEQ ID NO:8-SEQ ID NO:9)

16 X：以1个8 X为重复单位，2个重复单位串联获得16 X；16 X: 1 8 X is used as a repeating unit, and 2 repeating units are connected in series to obtain 16 X;

24 X：以1个8 X为重复单位，3个重复单位串联获得24 X；24 X: 1 8 X is used as a repeating unit, and 3 repeating units are connected in series to obtain 24 X;

获得序列设计后，我们通过基因合成的方式合成了串联蛋白优化后的密码子序列并重组到表达载体pET-32a，经测序正确后进行蛋白的表达。After obtaining the sequence design, we synthesized the optimized codon sequence of the tandem protein by gene synthesis and recombined it into the expression vector pET-32a. After correct sequencing, the protein was expressed.

最终，我们设计了6-9个的串联重复序列，通过增加蛋白表达串联重复数的反式来提高司美格鲁肽主链蛋白表达量和产率。Finally, we designed 6-9 tandem repeat sequences to increase the expression level and yield of semaglutide backbone protein by increasing the number of tandem repeats in trans.

实施例3 重组蛋白的生物合成技术Example 3 Biosynthesis technology of recombinant protein

如无特殊说明，大肠杆菌的转化、普通培养基（LB\TB，发酵培养基）、补料培养基、发酵方案和蛋白纯化方案均采用常规实验室方式进行。Unless otherwise specified, transformation of E. coli, common culture medium (LB\TB, fermentation medium), feed medium, fermentation protocol, and protein purification protocol were performed using conventional laboratory methods.

将冻存质粒选用BL-21(DE3)进行常规转化和涂板，37℃过夜培养，挑取多个单菌落于LB培养基分别过夜培养，此为一级种子液；把过夜培养的一级种子液加入适量的TB培养基，培养至OD₆₀₀为2-7，此为二级种子液，可进行发酵罐的接种。The frozen plasmid was transformed and plated using BL-21 (DE3), cultured overnight at 37°C, and several single colonies were picked and cultured overnight in LB medium. This was the primary seed solution. The primary seed solution cultured overnight was added to an appropriate amount of TB medium and cultured to an OD ₆₀₀ of 2-7. This was the secondary seed solution, which could be used to inoculate the fermenter.

将二级种子液按照0.1-1%的接种量进行接种。发酵温度37℃，转速设置300-1200r/min，通气量设置500 mL/min，pH 6.5-7.5进行发酵培养。当发酵液中底糖耗尽时开始补料，并根据发酵参数（如：溶氧、pH等）进行补料的调控，直至发酵结束。当发酵液OD₆₀₀达到100时，加入0.1-1.5 mM的IPTG进行诱导，并根据蛋白特性适当降低发酵温度，诱导温度设置在15-35区间，该温度可在前期小试期间确定。发酵过程中需每两小时进行OD₆₀₀的检测，当发酵OD进入平台期后，发酵即可终止，可以开始下罐，通过高速离心机富集菌体，根据实验安排进行后续破碎处理或暂存-80℃冰箱备用。The secondary seed liquid was inoculated at an inoculation rate of 0.1-1%. The fermentation temperature was 37°C, the speed was set at 300-1200r/min, the ventilation was set at 500 mL/min, and the pH was 6.5-7.5 for fermentation culture. When the substrate sugar in the fermentation broth was exhausted, feeding began, and the feeding was regulated according to the fermentation parameters (such as dissolved oxygen, pH, etc.) until the fermentation was completed. When the OD ₆₀₀ of the fermentation broth reached 100, 0.1-1.5 mM IPTG was added for induction, and the fermentation temperature was appropriately lowered according to the characteristics of the protein. The induction temperature was set in the range of 15-35, which can be determined during the early small test. During the fermentation process, OD ₆₀₀ needs to be tested every two hours. When the fermentation OD enters the plateau phase, the fermentation can be terminated, and the tank can be started. The bacteria are enriched by a high-speed centrifuge, and the subsequent crushing treatment or temporary storage in a -80°C refrigerator is carried out according to the experimental arrangement.

本发明中涉及的司美格鲁肽主链多串联重复序列蛋白均以包涵体的形式存在，因此需要进行包涵体的变复性实验。取适当菌体用buffer A按照1:5~20（W/V）重悬，进行高压破碎（大体积）或超声破碎（小体积），使用预冷的高速离心机18000 rpm，离心40-60 min并收集沉淀，此为粗包涵体。用buffer B按照1:10（W/V）对粗包涵体重悬，高速离心机18000rpm，离心40-60 min并收集沉淀，重复此过程3次，此为洗杂后的包涵体，可用于后续变复性实验。The semaglutide main chain multi-tandem repeat sequence proteins involved in the present invention all exist in the form of inclusion bodies, so it is necessary to perform a denaturation experiment of the inclusion bodies. Take appropriate bacteria and resuspend them with buffer A at 1:5~20 (W/V), perform high-pressure crushing (large volume) or ultrasonic crushing (small volume), use a pre-cooled high-speed centrifuge at 18000 rpm, centrifuge for 40-60 min and collect the precipitate, which is the crude inclusion body. Resuspend the crude inclusion body with buffer B at 1:10 (W/V), centrifuge at 18000 rpm, centrifuge for 40-60 min and collect the precipitate, repeat this process 3 times, this is the inclusion body after washing, which can be used for subsequent denaturation experiments.

取适当洗杂后的包涵体用buffer C按照1:20（W/V）重悬，室温孵育，使包涵体充分溶解使（变性），用预冷的高速离心机18000 rpm，离心40-60 min并弃沉淀，收集上清，此为变性后的包涵体蛋白。包涵体蛋白的复性可根据实验具体情况来实施，可选择稀释复性、透析复性和亲和柱的柱上复性，本发明采用透析复性方式进行。根据蛋白分子量，选择合适分子量孔径的透析袋，加入适量变性后的蛋白进行复性实验。使用buffer D进行复性实验，每隔6-8 h进行换液，最终把变性剂（尿素或盐酸胍）缓慢去除，得到复性的蛋白，此蛋白即为司美格鲁肽主链多串联重复序列蛋白。Take the inclusion bodies after appropriate washing and resuspend them with buffer C at 1:20 (W/V), incubate at room temperature, fully dissolve the inclusion bodies (denature), use a pre-cooled high-speed centrifuge at 18000 rpm, centrifuge for 40-60 min and discard the precipitate, collect the supernatant, which is the denatured inclusion body protein. The renaturation of the inclusion body protein can be implemented according to the specific conditions of the experiment, and dilution renaturation, dialysis renaturation and on-column renaturation of the affinity column can be selected. The present invention adopts dialysis renaturation. According to the molecular weight of the protein, a dialysis bag with a suitable molecular weight pore size is selected, and an appropriate amount of denatured protein is added to perform a renaturation experiment. Use buffer D for the renaturation experiment, change the liquid every 6-8 hours, and finally slowly remove the denaturant (urea or guanidine hydrochloride) to obtain the renatured protein, which is the main chain multi-tandem repeat sequence protein of semaglutide.

实施例4Example 4

选择8 X单表达盒的载体进行蛋白纯化表达，具体实验细节如“实施例3重组蛋白的生物合成技术”中介绍。经过蛋白表达盒包涵体变复性实验，获得了纯度较好的蛋白。The vector of 8X single expression cassette was selected for protein purification and expression. The specific experimental details are described in "Example 3 Biosynthesis Technology of Recombinant Protein". After the protein expression cassette inclusion body renaturation experiment, a protein with good purity was obtained.

实施例5Example 5

选择8 X 2表达盒的载体进行蛋白纯化表达，细节如实施例1~实施例3。The vector of 8×2 expression cassette was selected for protein purification and expression, and the details are as in Examples 1 to 3.

实施例6Example 6

选择8 X 3表达盒的载体进行蛋白纯化表达，细节如实施例1~实施例3。The vector of 8×3 expression cassette was selected for protein purification and expression, and the details are as in Examples 1 to 3.

效果例Effect example

本发明通过蛋白表达纯化方法的设计盒优化，所设计多肽的多串联重复序列均能够获得目的蛋白。其中，所有设计多肽的多串联重复序列蛋白的在原核细胞（E.coli）中均以包涵体的形式存在，需经包涵体变复性实验获得可溶的目的蛋白；随着表达盒的增加，蛋白产率也随之提高，与8 X相比，8 X 3表达盒的蛋白产率提高了64%，结果如表1所示：The present invention optimizes the design cassette of the protein expression and purification method, and the multi-tandem repeat sequence of the designed polypeptide can obtain the target protein. Among them, the multi-tandem repeat sequence protein of all designed polypeptides exists in the form of inclusion bodies in prokaryotic cells (E.coli), and the soluble target protein needs to be obtained through inclusion body renaturation experiment; as the number of expression cassettes increases, the protein yield also increases. Compared with 8 X, the protein yield of 8 X 3 expression cassettes increases by 64%, and the results are shown in Table 1:

表1Table 1

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above is only a preferred embodiment of the present invention. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principle of the present invention. These improvements and modifications should also be regarded as the scope of protection of the present invention.

Claims

1. An expression cassette, characterized by including but not limited to: a leader peptide, a restriction site, a linker and a GLP-1 receptor agonist backbone;

The GLP-1 receptor agonist backbone has:

(1) the amino acid sequence shown in SEQ ID NO: 1; or

(2) A sequence in which one or more amino acids are substituted, deleted, added and/or replaced based on the amino acid sequence shown in (1); or

(3) A sequence having a homology of more than 90% with the amino acid sequence shown in (1).

2. The expression cassette of claim 1, wherein the number of the GLP-1 receptor agonists is no less than 6.

3. The expression cassette according to claim 1 or 2, characterized in that the restriction sites include: one or more of an EK enzyme restriction site, a KEX2 enzyme restriction site and a CPB enzyme restriction site.

4. The expression cassette according to any one of claims 1 to 3, wherein the leader peptide has:

(4) the amino acid sequence shown in SEQ ID NO: 2; or

(5) A sequence in which one or more amino acids are substituted, deleted, added and/or replaced based on the amino acid sequence shown in (4); or

(6) A sequence having a homology of more than 90% with the amino acid sequence shown in (4).

5. The expression cassette according to any one of claims 1 to 4, wherein the Linker has:

(7) the amino acid sequence shown in SEQ ID NO: 3; or

(8) A sequence in which one or more amino acids are substituted, deleted, added and/or replaced based on the amino acid sequence shown in (7); or

(9) A sequence having a homology of more than 90% with the amino acid sequence shown in (7).

6. A recombinant vector, characterized in that it comprises: an expression cassette according to any one of claims 1 to 5.

7. The recombinant vector according to claim 6, wherein the number of the expression cassette is greater than or equal to 1.

8. A host, characterized in that it is transformed and/or transfected with the recombinant vector according to claim 6 or 7.

9. The recombinant protein is characterized in that it is obtained after being expressed and renatured by the host as described in claim 8.

10. Use of the expression cassette according to any one of claims 1 to 5, the recombinant vector according to claim 6 or 7, the host according to claim 8 and/or the protein according to claim 9 in increasing the production and/or expression of a GLP-1 receptor agonist.