CN105985938B

CN105985938B - Glycosyltransferase mutant protein and its application

Info

Publication number: CN105985938B
Application number: CN201510051992.5A
Authority: CN
Inventors: 周志华; 魏维; 严兴; 王平平; 魏勇军; 杨成帅
Original assignee: Shenghe Everything Shanghai Biotechnology Co ltd
Current assignee: Shenghe Everything Suzhou Biotechnology Co ltd
Priority date: 2015-01-30
Filing date: 2015-01-30
Publication date: 2024-11-05
Anticipated expiration: 2035-01-30
Also published as: WO2016119756A1; CN105985938A

Abstract

The invention discloses a glycosyltransferase mutein and application thereof. Specifically, the invention utilizes homology alignment and site-directed mutagenesis to find a novel mutant protein of glycosyltransferase, wherein the mutant protein is a non-natural protein, has glycosyltransferase catalytic activity, and contains the following core amino acids related to the enzymatic activity: amino acid 142 is A; amino acid 144 is H or F; and amino acid 339 is G; wherein the amino acid position numbering is based on the sequence set forth in SEQ ID NO. 13. Experiments prove that after modification of key sites in glycosyltransferase, the enzyme which is inactive or has poor activity originally can be changed or new enzyme activity can be endowed. Furthermore, the present inventors have found that the glycosyltransferase of the present invention has a significant increase in its expression level after terminal truncation.

Description

Glycosyltransferase mutant protein and its application

技术领域Technical Field

本发明涉及生物技术和植物生物学领域，具体地，本发明涉及一种糖基转移酶突变蛋白及其应用。The present invention relates to the fields of biotechnology and plant biology, and in particular to a glycosyltransferase mutant protein and application thereof.

背景技术Background Art

人参皂苷是从人参及其同属植物(如三七、西洋参等)中分离到的皂苷的总称，属于三萜类皂苷，是人参中的主要有效成份。目前，已经从人参中分离出了至少100种皂苷。从结构上来看，人参皂苷是皂苷元经过糖基化后形成的生物活性小分子。人参皂苷的皂苷元只有有限的几种，主要是达玛烷型的原人参二醇和原人参三醇，以及齐墩果烷酸。皂苷元通过糖基化后，可以提高水溶性，并产生出不同的生理活性。原人参三醇在C3,C6和C20位都有一个羟基，目前发现的原人参三醇型皂苷都是在C6和(或C20)的羟基上结合糖基化，在C3上结合糖基的原人参三醇型皂苷尚未见报道。糖基可以是葡萄糖、鼠李糖、木糖、和阿拉伯糖。Ginsenoside is a general term for saponins isolated from ginseng and its congener plants (such as Panax notoginseng, American ginseng, etc.). It belongs to the triterpenoid saponin and is the main active ingredient in ginseng. At present, at least 100 saponins have been isolated from ginseng. From a structural point of view, ginsenosides are bioactive small molecules formed by glycosylation of sapogenins. There are only a limited number of sapogenins in ginsenosides, mainly dammarane-type protopanaxadiol and protopanaxatriol, as well as oleanolic acid. After glycosylation, sapogenins can increase water solubility and produce different physiological activities. Protopanaxatriol has a hydroxyl group at C3, C6 and C20. The protopanaxatriol-type saponins discovered so far are all glycosylated on the hydroxyl groups of C6 and (or C20). Protopanaxatriol-type saponins with glycosyl groups at C3 have not been reported. The glycosyl group can be glucose, rhamnose, xylose, and arabinose.

一些原人参三醇型皂苷被证实具有广泛的生理功能和药用价值：包括抗过敏和抗炎症、免疫调节、抗疲劳、护心等功能。人参皂苷F1(20-O-β-D-glucopyranosyl-20(S)-protopanaxatriol)属于原人参三醇型皂苷，它在人参中的含量也非常低，也属于稀有人参皂苷。人参皂苷F1的结构与CK非常接近，也是在皂苷元的C-20位羟基上连有一个葡萄糖基。人参皂苷F1也具有独特的药用价值。它具有抗衰老和抗氧化的功能。人参皂苷Rh1(6-O-β-D-glucopyranosyl-20(S)-protopanaxatriol)属于原人参三醇型皂苷，它在人参中的含量也非常低，也属于稀有人参皂苷。人参皂苷Rh1的结构与F1非常接近，但是它的糖基化位点是在C6位上的羟基。人参皂苷Rh1也具有特殊的生理功能，能够抗过敏和抗炎症。人参皂苷Rg1(6-O-β-(D-glucopyranosyl)-20-O-β-(D-glucopyranosyl)-protopanaxadiol)属于原人参三醇型皂苷，具有抗疲劳、护心等功能。Some protopanaxatriol saponins have been shown to have a wide range of physiological functions and medicinal values, including anti-allergic and anti-inflammatory, immune regulation, anti-fatigue, and heart protection. Ginsenoside F1 (20-O-β-D-glucopyranosyl-20(S)-protopanaxatriol) belongs to the protopanaxatriol saponin, and its content in ginseng is also very low, and it is also a rare ginsenoside. The structure of ginsenoside F1 is very similar to CK, and there is also a glucose group attached to the C-20 hydroxyl group of the sapogenin. Ginsenoside F1 also has unique medicinal value. It has anti-aging and antioxidant functions. Ginsenoside Rh1 (6-O-β-D-glucopyranosyl-20(S)-protopanaxatriol) belongs to the protopanaxatriol saponin, and its content in ginseng is also very low, and it is also a rare ginsenoside. The structure of ginsenoside Rh1 is very similar to that of F1, but its glycosylation site is the hydroxyl group at the C6 position. Ginsenoside Rh1 also has special physiological functions and can resist allergies and inflammation. Ginsenoside Rg1 (6-O-β-(D-glucopyranosyl)-20-O-β-(D-glucopyranosyl)-protopanaxadiol) belongs to the protopanaxatriol type saponin, which has anti-fatigue and heart protection functions.

稀有人参皂苷的糖基化修饰程度一般都比较低(只含有1-2个糖基)，而丰富皂苷的糖基化修饰程度更高(由多个糖基组成的糖链)。稀有人参皂苷单体的传统生产方法，是以人参或者三七的总皂苷或者丰富皂苷为原料，依赖化学、酶和微生物发酵的水解方法。由于野生的人参资源已基本耗竭，人参皂苷资源目前来源于人参或三七的人工栽培，而其人工栽培的生长周期长(一般需要5-7年以上)，并且受到地域的限制，还经常受到病虫害而需要施用大量的农药，所以，人参或三七的人工栽培有严重的连作障碍(人参或三七种植地需要休耕5-15年以上才能克服连作障碍)，所以人参皂苷的产量、品质及安全性都面临挑战。The degree of glycosylation modification of rare ginsenosides is generally low (containing only 1-2 sugar groups), while the degree of glycosylation modification of rich saponins is higher (sugar chains composed of multiple sugar groups). The traditional production method of rare ginsenoside monomers is to use the total saponins or rich saponins of ginseng or Panax notoginseng as raw materials, relying on chemical, enzymatic and microbial fermentation hydrolysis methods. Since wild ginseng resources have been basically exhausted, ginsenoside resources currently come from the artificial cultivation of ginseng or Panax notoginseng, and the growth cycle of artificial cultivation is long (generally more than 5-7 years), and is subject to geographical restrictions. It is also often affected by pests and diseases and requires the application of large amounts of pesticides. Therefore, the artificial cultivation of ginseng or Panax notoginseng has serious continuous cropping obstacles (ginseng or Panax notoginseng planting areas need to be fallow for more than 5-15 years to overcome the continuous cropping obstacles), so the yield, quality and safety of ginsenosides are facing challenges.

合成生物学的发展为植物来源的天然产物的异源合成提供了新的机遇。以酵母为底盘，通过代谢途径的组装和优化，已经实现了用廉价的单糖来发酵合成青蒿酸或者双氢青蒿酸，继而再通过一步化学转化的方法生产青蒿素，这表明合成生物学在天然产物的药物合成方面具有的巨大潜力。利用酵母底盘细胞通过合成生物学方法异源合成稀有人参皂苷单体，原料为廉价的单糖，制备过程为安全性可调控的发酵过程，避免了任何外来污染(例如，原料植物人工种植时使用的农药)，因此，通过合成生物学技术制备稀有人参皂苷单体，不仅具有成本优势，而且，可以保证成品的品质与安全性。利用合成生物学技术制备足够量的各种高纯度稀有人参皂苷单体，用于活性测定及临床实验，促进稀有人参皂苷的创新药物研发。The development of synthetic biology has provided new opportunities for the heterologous synthesis of natural products from plants. Using yeast as the chassis, through the assembly and optimization of metabolic pathways, it has been realized to use cheap monosaccharides to ferment and synthesize artemisinic acid or dihydroartemisinic acid, and then produce artemisinin through a one-step chemical transformation method, which shows the great potential of synthetic biology in the drug synthesis of natural products. Rare ginsenoside monomers are heterologously synthesized by synthetic biology methods using yeast chassis cells. The raw materials are cheap monosaccharides, and the preparation process is a safe and controllable fermentation process, avoiding any external pollution (for example, pesticides used when the raw material plants are artificially planted). Therefore, the preparation of rare ginsenoside monomers by synthetic biology technology not only has cost advantages, but also can ensure the quality and safety of the finished product. Use synthetic biology technology to prepare sufficient amounts of various high-purity rare ginsenoside monomers for activity determination and clinical trials, and promote the development of innovative drugs for rare ginsenosides.

利用合成生物学方法来人工合成这些具有药用活性的三醇型人参皂苷，不仅需要构建合成原人参三醇的代谢途径，还需要鉴定催化人参皂苷的糖基化的糖基转移酶。糖基转移酶的功能是将糖基供体(核苷二磷酸糖，例如UDP-葡萄糖)上的糖基转移到不同的糖基受体上。根据氨基酸序列的不同，目前糖基转移酶已有94个家族。目前已测序的植物基因组中，发现了上百种以上不同的糖基转移酶。这些糖基转移酶的糖基受体包括糖、脂、蛋白、核酸、抗生素和其它的小分子。The use of synthetic biology methods to artificially synthesize these triol-type ginsenosides with medicinal activity requires not only the construction of a metabolic pathway for the synthesis of protopanaxatriol, but also the identification of glycosyltransferases that catalyze the glycosylation of ginsenosides. The function of glycosyltransferases is to transfer the glycosyl on the glycosyl donor (nucleoside diphosphate sugar, such as UDP-glucose) to different glycosyl acceptors. Depending on the amino acid sequence, there are currently 94 families of glycosyltransferases. More than a hundred different glycosyltransferases have been found in the currently sequenced plant genomes. The glycosyl acceptors of these glycosyltransferases include sugars, lipids, proteins, nucleic acids, antibiotics and other small molecules.

因此，本领域需要对糖基转移酶进行更多的研究和开发，通过更多高产的糖基转移酶，以经济、高效的方式获取稀有人参皂苷。Therefore, more research and development of glycosyltransferases is needed in this field to obtain rare ginsenosides in an economical and efficient manner through more high-yield glycosyltransferases.

发明内容Summary of the invention

本发明提供了一种糖基转移酶的突变蛋白，通过采用所述的突变蛋白可以改善其酶活性或增加其表达量。The invention provides a mutant protein of glycosyltransferase, and the enzyme activity thereof can be improved or the expression amount thereof can be increased by adopting the mutant protein.

本发明的第一方面，提供了一种糖基转移酶的突变蛋白，所述的突变蛋白为非天然蛋白，且所述突变蛋白具有糖基转移酶催化活性，并且所述突变蛋白含有与酶催化活性相关的以下核心氨基酸：In a first aspect of the present invention, a mutant protein of a glycosyltransferase is provided, wherein the mutant protein is a non-natural protein, and the mutant protein has glycosyltransferase catalytic activity, and the mutant protein contains the following core amino acids related to the enzyme catalytic activity:

第142位氨基酸为A；The 142nd amino acid is A;

第144位氨基酸为H或F；和Amino acid at position 144 is H or F; and

第339位氨基酸为G；The 339th amino acid is G;

其中，所述氨基酸位置编号基于SEQ ID NO.:13所示的序列。Wherein, the amino acid position numbering is based on the sequence shown in SEQ ID NO.:13.

在另一优选例中，所述的突变蛋白除142、144、339位氨基酸外，其余氨基酸与SEQID NO.:13所示的序列相同或基本相同。In another preferred embodiment, except for amino acids 142, 144, and 339, the remaining amino acids of the mutant protein are identical or substantially identical to the sequence shown in SEQ ID NO.:13.

在另一优选例中，所述的基本相同是至多有50个(较佳地为1-20个，更佳地为1-10个)氨基酸不相同，其中，所述的不相同包括氨基酸的取代、缺失或添加，且所述的突变蛋白仍具有糖基转移酶活性。In another preferred embodiment, the basic identity means that there are at most 50 (preferably 1-20, more preferably 1-10) amino acid differences, wherein the differences include amino acid substitutions, deletions or additions, and the mutant protein still has glycosyltransferase activity.

在另一优选例中，与SEQ ID NO.:13所示序列的同源性至少为80％，较佳地至少为85％-90％，更佳地至少为95％，最佳地至少为98％。In another preferred embodiment, the homology with the sequence shown in SEQ ID NO.: 13 is at least 80%, preferably at least 85%-90%, more preferably at least 95%, and most preferably at least 98%.

在另一优选例中，所述的突变蛋白如SEQ ID NO.:13、或16所示。In another preferred embodiment, the mutant protein is shown in SEQ ID NO.: 13 or 16.

在另一优选例中，所述的突变蛋白不是SEQ ID NO.:1、2或4所示的序列。In another preferred embodiment, the mutant protein is not the sequence shown in SEQ ID NO.: 1, 2 or 4.

在另一优选例中，所述糖基转移酶催化活性包括催化以下一种或多种反应：In another preferred embodiment, the glycosyltransferase catalytic activity includes catalyzing one or more of the following reactions:

(a)将来自糖基供体的糖基转移到四环三萜类化合物的C-20位的羟基；(a) transferring a glycosyl group from a glycosyl donor to the hydroxyl group at the C-20 position of a tetracyclic triterpenoid compound;

(b)将来自糖基供体的糖基转移到四环三萜类化合物的C-6位的羟基。(b) Transferring the glycosyl group from the glycosyl donor to the hydroxyl group at the C-6 position of the tetracyclic triterpenoid compound.

在另一优选例中，所述的突变蛋白如SEQ ID NO.:6、11-12所示，且所述的糖基转移酶催化活性包括将来自糖基供体的糖基转移到四环三萜类化合物的C-20位以及C-6位的羟基。In another preferred embodiment, the mutant protein is as shown in SEQ ID NO.: 6, 11-12, and the glycosyltransferase catalytic activity includes transferring the glycosyl from the glycosyl donor to the C-20 position and the C-6 hydroxyl group of the tetracyclic triterpenoid compound.

在另一优选例中，所述的突变蛋白如SEQ ID NO.:6、8、11-12、16所示，且所述的糖基转移酶活性包括将来自糖基供体的糖基转移到四环三萜类化合物的C-6位的羟基。In another preferred embodiment, the mutant protein is as shown in SEQ ID NO.: 6, 8, 11-12, 16, and the glycosyltransferase activity includes transferring the glycosyl from the glycosyl donor to the C-6 hydroxyl group of the tetracyclic triterpenoid compound.

在另一优选例中，所述的突变蛋白如SEQ ID NO.:6-7、9-13所示，且所述的糖基转移酶活性包括将来自糖基供体的糖基转移到四环三萜类化合物的C-20位的羟基。In another preferred embodiment, the mutant protein is as shown in SEQ ID NO.: 6-7, 9-13, and the glycosyltransferase activity includes transferring the glycosyl from the glycosyl donor to the C-20 hydroxyl group of the tetracyclic triterpenoid compound.

在另一优选例中，所述的糖基转移酶催化活性包括催化以下一种或多种反应：In another preferred embodiment, the glycosyltransferase catalytic activity includes catalyzing one or more of the following reactions:

(A)(A)

其中，所述式(I)化合物是原人参三醇(protopanaxatriol)，式(II)化合物为人参皂苷F1(20-O-β-D-glucopyranosyl-20(S)-protopanaxatriol)，所述的突变蛋白包括SEQID NO.:6-7、9-13；Wherein, the compound of formula (I) is protopanaxatriol, the compound of formula (II) is ginsenoside F1 (20-O-β-D-glucopyranosyl-20(S)-protopanaxatriol), and the mutant protein includes SEQ ID NO.: 6-7, 9-13;

(B)(B)

其中，所述式(I)化合物是原人参三醇(protopanaxatriol)，式(III)化合物为人参皂苷Rh1(6-O-β-D-glucopyranosyl-20(S)-protopanaxatriol)；或Wherein, the compound of formula (I) is protopanaxatriol, and the compound of formula (III) is ginsenoside Rh1 (6-O-β-D-glucopyranosyl-20(S)-protopanaxatriol); or

式(II)化合物是人参皂苷F1(20-O-β-D-glucopyranosyl-20(S)-protopanaxatriol)，式(IV)化合物是人参皂苷Rg1(6-O-β-(D-glucopyranosyl)-20-O-β-(D-glucopyranosyl)-protopanaxatriol)；The compound of formula (II) is ginsenoside F1 (20-O-β-D-glucopyranosyl-20(S)-protopanaxatriol), and the compound of formula (IV) is ginsenoside Rg1 (6-O-β-(D-glucopyranosyl)-20-O-β-(D-glucopyranosyl)-protopanaxatriol);

所述的突变蛋白包括SEQ ID NO.:6、8、11-12、16；The mutant proteins include SEQ ID NO.: 6, 8, 11-12, 16;

(C)(C)

其中，所述式(I)化合物是原人参三醇(protopanaxatriol)，所述式(II)化合物是人参皂苷F1(20-O-β-D-glucopyranosyl-20(S)-protopanaxatriol)，式(IV)化合物为人参皂苷Rg1(6-O-β-(D-glucopyranosyl)-20-O-β-(D-glucopyranosyl)-protopanaxatriol)，所述的突变蛋白包括SEQ ID NO.:6、11-12；Wherein, the compound of formula (I) is protopanaxatriol, the compound of formula (II) is ginsenoside F1 (20-O-β-D-glucopyranosyl-20(S)-protopanaxatriol), the compound of formula (IV) is ginsenoside Rg1 (6-O-β-(D-glucopyranosyl)-20-O-β-(D-glucopyranosyl)-protopanaxatriol), and the mutant protein includes SEQ ID NO.: 6, 11-12;

(D)(D)

其中，所述式(I)化合物是原人参三醇(protopanaxatriol)，所述式(II)化合物是人参皂苷F1(20-O-β-D-glucopyranosyl-20(S)-protopanaxatriol)，式(III)化合物为人参皂苷Rh1(6-O-β-D-glucopyranosyl-20(S)-protopanaxatriol)，所述的突变蛋白包括SEQ ID NO.:11-12。Wherein, the compound of formula (I) is protopanaxatriol, the compound of formula (II) is ginsenoside F1 (20-O-β-D-glucopyranosyl-20(S)-protopanaxatriol), the compound of formula (III) is ginsenoside Rh1 (6-O-β-D-glucopyranosyl-20(S)-protopanaxatriol), and the mutant protein includes SEQ ID NO.: 11-12.

在另一优选例中，所述的糖基供体包括核苷二磷酸糖，优选地，所述的糖基供体选自下组：UDP-葡萄糖，ADP-葡萄糖，TDP-葡萄糖，CDP-葡萄糖，GDP-葡萄糖，UDP-半乳糖醛酸，ADP-半乳糖醛酸，TDP-半乳糖醛酸，CDP-半乳糖醛酸，GDP-半乳糖醛酸，UDP-半乳糖，ADP-半乳糖，TDP-半乳糖，CDP-半乳糖，GDP-半乳糖，UDP-阿拉伯糖，ADP-阿拉伯糖，TDP-阿拉伯糖，CDP-阿拉伯糖，GDP-阿拉伯糖，UDP-鼠李糖，ADP-鼠李糖，TDP-鼠李糖，CDP-鼠李糖，GDP-鼠李糖，或其他核苷二磷酸己糖或核苷二磷酸戊糖，或其组合。In another preferred embodiment, the glycosyl donor includes a nucleoside diphosphate sugar. Preferably, the glycosyl donor is selected from the following group: UDP-glucose, ADP-glucose, TDP-glucose, CDP-glucose, GDP-glucose, UDP-galacturonic acid, ADP-galacturonic acid, TDP-galacturonic acid, CDP-galacturonic acid, GDP-galacturonic acid, UDP-galactose, ADP-galactose, TDP-galactose, CDP-galactose, GDP-galactose, UDP-arabinose, ADP-arabinose, TDP-arabinose, CDP-arabinose, GDP-arabinose, UDP-rhamnose, ADP-rhamnose, TDP-rhamnose, CDP-rhamnose, GDP-rhamnose, or other nucleoside diphosphate hexoses or nucleoside diphosphate pentoses, or a combination thereof.

在另一优选例中，所述的糖基供体包括尿苷二磷酸(UDP)糖，优选地，所述的糖基供体选自下组：UDP-葡萄糖，UDP-半乳糖醛酸，UDP-半乳糖，UDP-阿拉伯糖，UDP-鼠李糖，或其他尿苷二磷酸己糖或尿苷二磷酸戊糖，或其组合。In another preferred embodiment, the glycosyl donor includes uridine diphosphate (UDP) sugar, preferably, the glycosyl donor is selected from the following group: UDP-glucose, UDP-galacturonic acid, UDP-galactose, UDP-arabinose, UDP-rhamnose, or other uridine diphosphate hexose or uridine diphosphate pentose, or a combination thereof.

在另一优选例中，反应体系的pH为：pH4.0-10.0，优选pH为6.0-8.5。In another preferred embodiment, the pH of the reaction system is: pH 4.0-10.0, preferably pH 6.0-8.5.

在另一优选例中，反应体系的温度为：10℃-105℃，优选25℃-35℃。In another preferred embodiment, the temperature of the reaction system is 10°C-105°C, preferably 25°C-35°C.

在另一优选例中，所述的突变蛋白选自：In another preferred embodiment, the mutant protein is selected from:

(a1)SEQ ID NO.:13所示的序列；或(a1) the sequence shown in SEQ ID NO.: 13; or

(b1)SEQ ID NO.:13所示的序列经过一个或多个(优选为1-50个，较佳地，为1-20个，更佳地，为1-10个，如1、2、3、4、5、6、7、8、9或10个)氨基酸的取代、添加和/或缺失后，并保留糖基转移酶催化活性的突变蛋白。(b1) A mutant protein having the glycosyltransferase catalytic activity after one or more (preferably 1-50, preferably 1-20, more preferably 1-10, such as 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10) amino acid substitutions, additions and/or deletions of the sequence shown in SEQ ID NO.: 13.

在另一优选例中，所述的取代为1-50个保守氨基酸的取代，为1-20个，更佳地，为1-10个，如1、2、3、4、5、6、7、8、9或10个。In another preferred embodiment, the substitution is 1-50 conservative amino acid substitutions, preferably 1-20, more preferably 1-10, such as 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10.

在另一优选例中，所述的缺失为C端1-5个氨基酸的缺失，较佳地为2-4个氨基酸的缺失。In another preferred embodiment, the deletion is a deletion of 1-5 amino acids at the C-terminus, preferably a deletion of 2-4 amino acids.

在另一优选例中，所述的突变蛋白序列如SEQ ID NO.:6-8所示。In another preferred embodiment, the mutant protein sequence is shown in SEQ ID NO.: 6-8.

在另一优选例中，当所述突变蛋白的C端缺失1-5个氨基酸后，其表达量比SEQ IDNO.:13所示的蛋白提高了至少50％，更佳地为70％-100％。In another preferred embodiment, when 1-5 amino acids are missing from the C-terminus of the mutant protein, its expression level is increased by at least 50%, more preferably 70%-100%, compared with the protein shown in SEQ ID NO.: 13.

在另一优选例中，所述的添加为添加标签序列、信号序列或分泌信号序列。In another preferred embodiment, the addition is the addition of a tag sequence, a signal sequence or a secretion signal sequence.

在另一优选例中，所述的突变蛋白还含有以下一个或多个氨基酸：In another preferred embodiment, the mutant protein further contains one or more of the following amino acids:

第10位氨基酸为V；The 10th amino acid is V;

第13位氨基酸为F；The 13th amino acid is F;

第82位氨基酸为C；和Amino acid at position 82 is C; and

第186位氨基酸为L。The 186th amino acid is L.

本发明第二方面，提供了一种多核苷酸，所述的多核苷酸编码本发明第一方面所述的突变蛋白。The second aspect of the present invention provides a polynucleotide, wherein the polynucleotide encodes the mutant protein described in the first aspect of the present invention.

在另一优选例中，所述的多核苷酸序列如SEQ ID NO.:19和21所示，分别编码SEQID NO.:13和16所示的序列。In another preferred embodiment, the polynucleotide sequences are as shown in SEQ ID NOs.: 19 and 21, encoding the sequences shown in SEQ ID NOs.: 13 and 16, respectively.

本发明第三方面，提供了一种载体，所述的载体含有本发明第二方面所述的多核苷酸。The third aspect of the present invention provides a vector, wherein the vector contains the polynucleotide described in the second aspect of the present invention.

在另一优选例中，所述载体包括表达载体、穿梭载体、整合载体。In another preferred embodiment, the vector includes an expression vector, a shuttle vector, and an integration vector.

本发明第四方面，提供了一种宿主细胞，所述的宿主细胞含有本发明第三方面所述的载体，或其基因组中整合有本发明第二方面所示的多核苷酸。The fourth aspect of the present invention provides a host cell, wherein the host cell contains the vector described in the third aspect of the present invention, or the polynucleotide described in the second aspect of the present invention is integrated into its genome.

在另一优选例中，所述的宿主细胞为真核细胞，如酵母细胞或植物细胞。In another preferred embodiment, the host cell is a eukaryotic cell, such as a yeast cell or a plant cell.

在另一优选例中，所述的宿主细胞为原核细胞，如大肠杆菌。In another preferred embodiment, the host cell is a prokaryotic cell, such as Escherichia coli.

在另一优选例中，所述的宿主细胞为人参细胞。In another preferred embodiment, the host cell is a ginseng cell.

本发明第五方面，提供了一种改造糖基转移酶的方法，包括步骤：In a fifth aspect, the present invention provides a method for modifying glycosyltransferase, comprising the steps of:

(i)将所述糖基转移酶与SEQ ID NO.:13所示的序列进行同源性比对；(i) performing homology comparison between the glycosyltransferase and the sequence shown in SEQ ID NO.: 13;

(ii)筛选出步骤(i)中与SEQ ID NO.:13所示序列同源性至少为80％的序列；较佳地至少为85％-90％，更佳地至少为95％，最佳地至少为98％；(ii) screening out the sequence in step (i) having at least 80% homology to the sequence shown in SEQ ID NO.: 13; preferably at least 85%-90%, more preferably at least 95%, and most preferably at least 98%;

(iii)对步骤(ii)中筛选出的序列进行改造，改造后的糖基转移酶含有以下核心氨基酸：(iii) modifying the sequence selected in step (ii), wherein the modified glycosyltransferase contains the following core amino acids:

第142位氨基酸为A；The 142nd amino acid is A;

第144位氨基酸为H或F；和Amino acid at position 144 is H or F; and

第339位氨基酸为G；The 339th amino acid is G;

其中，所述氨基酸位置编号基于SEQ ID NO.:13所示的序列，Wherein, the amino acid position numbering is based on the sequence shown in SEQ ID NO.: 13,

且所述的核心氨基酸至少有一个是经人为改造的。And at least one of the core amino acids is artificially modified.

在另一优选例中，所述改造后的糖基转移酶还含有以下核心氨基酸：In another preferred embodiment, the modified glycosyltransferase further contains the following core amino acids:

第10位氨基酸为V；The 10th amino acid is V;

第13位氨基酸为F；The 13th amino acid is F;

第82位氨基酸为C；和/或The amino acid at position 82 is C; and/or

第186位氨基酸为L；The 186th amino acid is L;

在另一优选例中，所述改造后的糖基转移酶C末端缺失了1-5个氨基酸。In another preferred embodiment, the C-terminus of the modified glycosyltransferase lacks 1-5 amino acids.

本发明第六方面，提供了本发明第一方面所述的突变蛋白的用途，所述的突变蛋白用于催化以下一种或多种反应，或被用于制备催化以下一种或多种反应的催化制剂：In a sixth aspect, the present invention provides the use of the mutant protein described in the first aspect of the present invention, wherein the mutant protein is used to catalyze one or more of the following reactions, or is used to prepare a catalytic preparation for catalyzing one or more of the following reactions:

在另一优选例中，所述的突变蛋白用于催化下述一种或多种反应或被用于制备催化下述一种或多种反应的催化制剂：In another preferred embodiment, the mutant protein is used to catalyze one or more of the following reactions or is used to prepare a catalytic preparation for catalyzing one or more of the following reactions:

本发明第七方面，提供了一种进行糖基催化反应的方法，包括步骤：在本发明第一方面所述突变蛋白的存在下，进行糖基化反应。The seventh aspect of the present invention provides a method for performing a glycosylation catalytic reaction, comprising the steps of: performing a glycosylation reaction in the presence of the mutant protein described in the first aspect of the present invention.

在另一优选例中，所述的糖基化反应包括在本发明第一方面所述突变蛋白的存在下，将糖基供体转移到四环三萜类化合物的以下位点上：In another preferred embodiment, the glycosylation reaction comprises transferring a glycosyl donor to the following site of the tetracyclic triterpenoid compound in the presence of the mutant protein according to the first aspect of the present invention:

C-20位和/或C-6位，从而形成糖基化的四环三萜类化合物。C-20 and/or C-6, thereby forming a glycosylated tetracyclic triterpenoid compound.

在另一优选例中，所述的糖基催化反应的底物包括达玛烷型四环三萜类化合物、羊毛脂烷型四环三萜类化合物、甘遂烷型四环三萜类化合物、环阿屯烷(环阿尔廷烷)型四环三萜类化合物、葫芦烷四环三萜类化合物、或楝烷型四环三萜类化合物。In another preferred embodiment, the substrates of the sugar-catalyzed reaction include dammarane-type tetracyclic triterpenoid compounds, lanolin-type tetracyclic triterpenoid compounds, gansuane-type tetracyclic triterpenoid compounds, cycloartanane (cycloartinane)-type tetracyclic triterpenoid compounds, cucurbitane-type tetracyclic triterpenoid compounds, or meline-type tetracyclic triterpenoid compounds.

本发明第八方面，提供了一种提高糖基转移酶表达量的方法，包括步骤：In an eighth aspect, the present invention provides a method for increasing the expression level of glycosyltransferase, comprising the steps of:

(iii)对步骤(ii)中筛选出的序列进行C端1-5个氨基酸的缺失。(iii) deleting 1 to 5 amino acids at the C-terminus of the sequence selected in step (ii).

本发明第九方面，提供了本发明第四方面所述宿主细胞的用途，用于制备糖基转移酶，或作为催化细胞、或生产糖基化后的四环三萜类化合物。The ninth aspect of the present invention provides the use of the host cell described in the fourth aspect of the present invention for preparing glycosyltransferase, or as a catalytic cell, or for producing glycosylated tetracyclic triterpenoid compounds.

本发明第十方面，提供了一种产生转基因植物的方法，包括步骤：将本发明第四方面所述的宿主细胞再生为植物，其中，所述的宿主细胞为植物细胞。The tenth aspect of the present invention provides a method for producing transgenic plants, comprising the steps of: regenerating the host cell described in the fourth aspect of the present invention into a plant, wherein the host cell is a plant cell.

本发明第十一方面，提供了一种糖基转移酶的突变蛋白，所述的突变蛋白为非天然蛋白，且所述突变蛋白具有糖基转移酶催化活性，并且所述突变蛋白含有与酶催化活性相关的以下核心氨基酸：In the eleventh aspect of the present invention, a mutant protein of a glycosyltransferase is provided, wherein the mutant protein is a non-natural protein, and the mutant protein has glycosyltransferase catalytic activity, and the mutant protein contains the following core amino acids related to the enzyme catalytic activity:

第82位氨基酸为C；和/或The amino acid at position 82 is C; and/or

第144位氨基酸为FThe 144th amino acid is F

其中，所述氨基酸位置编号基于SEQ ID NO.:11所示的序列，且所述的糖基转移酶催化活性包括将来自糖基供体的糖基转移到四环三萜类化合物的C-20位以及C-6位的羟基。Wherein, the amino acid position numbering is based on the sequence shown in SEQ ID NO.: 11, and the glycosyltransferase catalytic activity includes transferring the glycosyl from the glycosyl donor to the C-20 position and the C-6 hydroxyl group of the tetracyclic triterpene compound.

在另一优选例中，本发明第十一方面所述的突变蛋白与SEQ ID NO.:11所示序列同源性至少为90％，较佳地至少为95％，更佳地至少为98％。In another preferred embodiment, the mutant protein described in the eleventh aspect of the present invention has a sequence homology of at least 90%, preferably at least 95%, and more preferably at least 98% with the sequence shown in SEQ ID NO.: 11.

在另一优选例中，本发明第十一方面所述的突变蛋白除82、和/或144位氨基酸外，其余氨基酸与SEQ ID NO.:11所示的序列相同或基本相同。In another preferred embodiment, except for amino acid 82 and/or amino acid 144, the remaining amino acids of the mutant protein described in the eleventh aspect of the present invention are identical or substantially identical to the sequence shown in SEQ ID NO.: 11.

在另一优选例中，所述的基本相同是至多有50个(较佳地为1-20个，更佳地为1-10个)氨基酸不相同，其中，所述的不相同包括氨基酸的取代、缺失或添加，且所述的突变蛋白具有将来自糖基供体的糖基转移到四环三萜类化合物的C-20位的羟的糖基转移酶活性。In another preferred embodiment, the basic identity is that there are at most 50 (preferably 1-20, more preferably 1-10) amino acids that are different, wherein the difference includes amino acid substitution, deletion or addition, and the mutant protein has glycosyltransferase activity of transferring the glycosyl from the glycosyl donor to the C-20 hydroxyl of the tetracyclic triterpene compound.

本发明第十二方面，提供了一种糖基转移酶的突变蛋白，所述的突变蛋白为非天然蛋白，且所述突变蛋白具有糖基转移酶催化活性，并且所述突变蛋白含有与酶催化活性相关的以下核心氨基酸：In a twelfth aspect of the present invention, a mutant protein of a glycosyltransferase is provided, wherein the mutant protein is a non-natural protein, and the mutant protein has glycosyltransferase catalytic activity, and the mutant protein contains the following core amino acids related to the enzyme catalytic activity:

第142位氨基酸为A；The 142nd amino acid is A;

第186位氨基酸为L；和The amino acid at position 186 is L; and

第338位氨基酸为GThe 338th amino acid is G

其中，所述氨基酸位置编号基于SEQ ID NO.:16所示的序列，且所述的糖基转移酶催化活性包括将来自糖基供体的糖基转移到四环三萜类化合物的C-6位的羟基。Wherein, the amino acid position numbering is based on the sequence shown in SEQ ID NO.: 16, and the glycosyltransferase catalytic activity includes transferring the glycosyl from the glycosyl donor to the C-6 hydroxyl group of the tetracyclic triterpene compound.

在另一优选例中，本发明第十二方面所述的突变蛋白与SEQ ID NO.:16所示序列同源性至少为90％，较佳地至少为95％，更佳地至少为98％。In another preferred embodiment, the mutant protein described in the twelfth aspect of the present invention has a sequence homology of at least 90% with that of SEQ ID NO.: 16, preferably at least 95%, and more preferably at least 98%.

在另一优选例中，本发明第十二方面所述的突变蛋白除142、186、338位氨基酸外，其余氨基酸与SEQ ID NO.:16所示的序列相同或基本相同。In another preferred embodiment, except for amino acids 142, 186, and 338, the remaining amino acids of the mutant protein described in the twelfth aspect of the present invention are identical or substantially identical to the sequence shown in SEQ ID NO.:16.

在另一优选例中，所述的基本相同是至多有50个(较佳地为1-20个，更佳地为1-10个)氨基酸不相同，其中，所述的不相同包括氨基酸的取代、缺失或添加，且所述的突变蛋白具有将来自糖基供体的糖基转移到四环三萜类化合物的C-6位的羟的糖基转移酶活性。In another preferred embodiment, the basic identity is that there are at most 50 (preferably 1-20, more preferably 1-10) amino acids that are different, wherein the difference includes amino acid substitution, deletion or addition, and the mutant protein has glycosyltransferase activity that transfers the glycosyl from the glycosyl donor to the C-6 hydroxyl of the tetracyclic triterpene compound.

应理解，在本发明范围内中，本发明的上述各技术特征和在下文(如实施例)中具体描述的各技术特征之间都可以互相组合，从而构成新的或优选的技术方案。限于篇幅，在此不再一一累述。It should be understood that within the scope of the present invention, the above-mentioned technical features of the present invention and the technical features specifically described below (such as embodiments) can be combined with each other to form a new or preferred technical solution. Due to space limitations, they will not be described one by one here.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1.五个糖基转移酶的氨基酸序列比对。Figure 1. Amino acid sequence alignment of five glycosyltransferases.

图2.五个糖基转移酶体外催化的HPLC图。1，UGTPg1催化PPT；2,UGTPg100催化PPT；3,UGTPg101催化PPT；4,UGTPg102催化PPT；5,UGTPg103催化PPT；6,pET28a空载体作为control；7，UGTPg101催化F1；8，UGTPg101催化Rh1。Figure 2. HPLC images of five glycosyltransferases catalyzed in vitro. 1, UGTPg1 catalyzes PPT; 2, UGTPg100 catalyzes PPT; 3, UGTPg101 catalyzes PPT; 4, UGTPg102 catalyzes PPT; 5, UGTPg103 catalyzes PPT; 6, pET28a empty vector as control; 7, UGTPg101 catalyzes F1; 8, UGTPg101 catalyzes Rh1.

图3.决定糖基转移酶UGTPg1和UGTPg102催化C20-OH糖基化的关键氨基酸位点。HPLC图如下：1，UGTPg1催化PPT；2，UGTPg102催化PPT；3，UGTPg1-L38F催化PPT；4，UGTPg1-A40S催化PPT；5，UGTPg1-F85L催化PPT；6，UGTPg1-T134I催化PPT；7，UGTPg1-H144Y催化PPT；8，UGTPg1-Q388H催化PPT；9，UGTPg102-Y144H催化PPT。Figure 3. Key amino acid sites that determine the glycosylation of C20-OH catalyzed by glycosyltransferases UGTPg1 and UGTPg102. The HPLC graphs are as follows: 1, PPT catalyzed by UGTPg1; 2, PPT catalyzed by UGTPg102; 3, PPT catalyzed by UGTPg1-L38F; 4, PPT catalyzed by UGTPg1-A40S; 5, PPT catalyzed by UGTPg1-F85L; 6, PPT catalyzed by UGTPg1-T134I; 7, PPT catalyzed by UGTPg1-H144Y; 8, PPT catalyzed by UGTPg1-Q388H; 9, PPT catalyzed by UGTPg102-Y144H.

图4.决定糖基转移酶UGTPg100和UGTPg103催化C6-OH糖基化的关键氨基酸位点。HPLC图如下：1，UGTPg100催化PPT；2，UGTPg103催化PPT；3，UGTPg100-A142T催化PPT；4，UGTPg100-L186S催化PPT；5，UGTPg100-W205R催化PPT；6，UGTPg100-G338R催化PPT；7，UGTPg103-T142A催化PPT；8，UGTPg103-T142A/S186L催化PPT；9，UGTPg103-T142A/S186L/G338R催化PPT。Figure 4. Key amino acid sites that determine the glycosylation of C6-OH catalyzed by glycosyltransferases UGTPg100 and UGTPg103. The HPLC graphs are as follows: 1, PPT catalyzed by UGTPg100; 2, PPT catalyzed by UGTPg103; 3, PPT catalyzed by UGTPg100-A142T; 4, PPT catalyzed by UGTPg100-L186S; 5, PPT catalyzed by UGTPg100-W205R; 6, PPT catalyzed by UGTPg100-G338R; 7, PPT catalyzed by UGTPg103-T142A; 8, PPT catalyzed by UGTPg103-T142A/S186L; 9, PPT catalyzed by UGTPg103-T142A/S186L/G338R.

图5.改变UGTPg1底物区域特异性的关键氨基酸位点。HPLC图如下：1，UGTPg1催化PPT；2，UGTPg100催化PPT；3，UGTPg1-A10V催化PPT；4，UGTPg1-I13F催化PPT；5，UGTPg1-H82C催化PPT；6，UGTPg1-H144F催化PPT。Figure 5. Key amino acid sites that change the substrate regiospecificity of UGTPg1. The HPLC graphs are as follows: 1, UGTPg1 catalyzed PPT; 2, UGTPg100 catalyzed PPT; 3, UGTPg1-A10V catalyzed PPT; 4, UGTPg1-I13F catalyzed PPT; 5, UGTPg1-H82C catalyzed PPT; 6, UGTPg1-H144F catalyzed PPT.

图6.截短UGTPg1，UGTPg100和UGTPg101的C末端氨基酸增强其在大肠杆菌中的可溶表达量。A，野生型UGTs以及C末端截短后的UGTs在大肠杆菌中表达后细胞裂解液上清的Western blot。C2，C4，C5分别表示C末端截短2，4，5个氨基酸。B，野生型UGTs以及C末端截短后的UGTs催化PPT的TLC图。Figure 6. Truncated C-terminal amino acids of UGTPg1, UGTPg100, and UGTPg101 enhance their soluble expression in E. coli. A, Western blot of cell lysate supernatant after wild-type UGTs and C-terminally truncated UGTs were expressed in E. coli. C2, C4, and C5 indicate C-terminally truncated 2, 4, and 5 amino acids, respectively. B, TLC of PPT catalyzed by wild-type UGTs and C-terminally truncated UGTs.

具体实施方式DETAILED DESCRIPTION

通过广泛而深入的研究，本发明人通过同源比对、定点突变等方式确定了糖基转移酶催化C-6或C-20的羟基糖基化的关键氨基酸位点。本发明发现，对糖基转移酶中的关键位点进行改造后，可以改变原本无活性或活性较差的酶，或者赋予新的酶活性。此外，本发明人还发现，对本发明糖基转移酶进行末端截短后，其表达量具有明显的上升。在此基础上，完成了本发明。Through extensive and in-depth research, the inventors determined the key amino acid sites of glycosyltransferase catalyzing the glycosylation of C-6 or C-20 hydroxyl groups by homology comparison, site-directed mutagenesis, etc. The present invention found that after the key sites in the glycosyltransferase were modified, the originally inactive or poorly active enzymes could be changed, or new enzyme activities could be conferred. In addition, the inventors also found that after the terminal truncation of the glycosyltransferase of the present invention, its expression level had a significant increase. On this basis, the present invention was completed.

本发明突变蛋白及其编码核酸Mutant protein and its encoding nucleic acid of the present invention

如本文所用，术语“突变蛋白”、“本发明突变蛋白”、“本发明糖基转移酶”均指非天然存在的糖基转移酶突变蛋白，且所述突变蛋白为SEQ ID NO.:13(或SEQ ID NO.:16)所示蛋白，或基于SEQ ID NO.:13(或SEQ ID NO.:16)所示蛋白进行人工改造的蛋白，其中，所述的突变蛋白含有与酶催化活性相关的核心氨基酸，且所述核心氨基酸中至少有一个是经过人工改造的；并且本发明突变蛋白具有催化以下一种或多种反应的糖基转移酶活性：As used herein, the terms "mutant protein", "mutant protein of the present invention", and "glycosyltransferase of the present invention" all refer to non-naturally occurring glycosyltransferase mutant proteins, and the mutant protein is a protein shown in SEQ ID NO.: 13 (or SEQ ID NO.: 16), or a protein artificially modified based on the protein shown in SEQ ID NO.: 13 (or SEQ ID NO.: 16), wherein the mutant protein contains core amino acids related to enzyme catalytic activity, and at least one of the core amino acids is artificially modified; and the mutant protein of the present invention has glycosyltransferase activity that catalyzes one or more of the following reactions:

如非特别说明，本文所说的人参皂苷和皂苷元，是的C20位S构型的人参皂苷和皂苷元。Unless otherwise specified, the ginsenosides and sapogenins mentioned in this article are ginsenosides and sapogenins with S configuration at the C20 position.

术语“核心氨基酸”指的是基于SEQ ID NO.:13(或SEQ ID NO.:16)，且与SEQ IDNO.:13(或SEQ ID NO.:16)同源性达至少80％，如84％、85％、90％、92％、95％、98％的序列中，相应位点是本文所述的特定氨基酸，如基于SEQ ID NO.:13所示的序列，核心氨基酸为：The term "core amino acid" refers to a sequence based on SEQ ID NO.: 13 (or SEQ ID NO.: 16) and having at least 80%, such as 84%, 85%, 90%, 92%, 95%, 98% homology to SEQ ID NO.: 13 (or SEQ ID NO.: 16), wherein the corresponding positions are specific amino acids described herein, such as based on the sequence shown in SEQ ID NO.: 13, the core amino acids are:

第142位氨基酸为A；The 142nd amino acid is A;

第144位氨基酸为H或F；和Amino acid at position 144 is H or F; and

第339位氨基酸为G，且含有所述核心氨基酸的突变蛋白具有将来自糖基供体的糖基转移到四环三萜类化合物的C-20位的羟基的催化活性。The amino acid at position 339 is G, and a mutant protein containing the core amino acid has a catalytic activity of transferring a glycosyl group from a glycosyl donor to a hydroxyl group at position C-20 of a tetracyclic triterpenoid compound.

又如基于SEQ ID NO.:16所示的序列，核心氨基酸为：For example, based on the sequence shown in SEQ ID NO.: 16, the core amino acids are:

第142位氨基酸为A；The 142nd amino acid is A;

第186位氨基酸为L；和The amino acid at position 186 is L; and

第338位氨基酸为G，且含有所述核心氨基酸的突变蛋白具有将来自糖基供体的糖基转移到四环三萜类化合物的C-6位的羟基的催化活性。The amino acid at position 338 is G, and a mutant protein containing the core amino acid has a catalytic activity of transferring a glycosyl group from a glycosyl donor to a hydroxyl group at the C-6 position of a tetracyclic triterpenoid compound.

应理解，本发明突变蛋白中的氨基酸编号基于SEQ ID NO.:13(或SEQ ID NO.:16)作出，当某一具体突变蛋白与SEQ ID NO.:13(或SEQ ID NO.:16)所示序列的同源性达到80％或以上时，突变蛋白的氨基酸编号可能会有相对于SEQ ID NO.:13(或SEQ ID NO.:16)的氨基酸编号的错位，如向氨基酸的N末端或C末端错位1-5位，而采用本领域常规的序列比对技术，本领域技术人员通常可以理解这样的错位是在合理范围内的，且不应当由于氨基酸编号的错位而使同源性达80％(如90％、95％、98％)的、具有相同或相似糖基转移酶活性的突变蛋白不在本发明突变蛋白的范围内。It should be understood that the amino acid numbering in the mutant protein of the present invention is based on SEQ ID NO.: 13 (or SEQ ID NO.: 16). When the homology between a specific mutant protein and the sequence shown in SEQ ID NO.: 13 (or SEQ ID NO.: 16) reaches 80% or more, the amino acid numbering of the mutant protein may be misplaced relative to the amino acid numbering of SEQ ID NO.: 13 (or SEQ ID NO.: 16), such as misplacement of 1-5 positions to the N-terminus or C-terminus of the amino acid. Using conventional sequence alignment techniques in the art, those skilled in the art can generally understand that such misplacement is within a reasonable range, and mutant proteins with a homology of 80% (such as 90%, 95%, 98%) and the same or similar glycosyltransferase activity should not be excluded from the scope of the mutant protein of the present invention due to the misplacement of amino acid numbering.

本发明突变蛋白是合成蛋白或重组蛋白，即可以是化学合成的产物，或使用重组技术从原核或真核宿主(例如，细菌、酵母、植物)中产生。根据重组生产方案所用的宿主，本发明的突变蛋白可以是糖基化的，或可以是非糖基化的。本发明的突变蛋白还可包括或不包括起始的甲硫氨酸残基。Mutants of the present invention are synthetic proteins or recombinant proteins, i.e., they can be the product of chemical synthesis, or they can be produced from prokaryotic or eukaryotic hosts (e.g., bacteria, yeast, plants) using recombinant technology. Depending on the host used in the recombinant production scheme, the mutagen of the present invention can be glycosylated, or it can be non-glycosylated. The mutagen of the present invention may also include or not include an initial methionine residue.

本发明还包括所述突变蛋白的片段、衍生物和类似物。如本文所用，术语“片段”、“衍生物”和“类似物”是指基本上保持所述突变蛋白相同的生物学功能或活性的蛋白。The present invention also includes fragments, derivatives and analogs of the mutant protein. As used herein, the terms "fragment", "derivative" and "analog" refer to proteins that substantially retain the same biological function or activity of the mutant protein.

本发明的突变蛋白片段、衍生物或类似物可以是(i)有一个或多个保守或非保守性氨基酸残基(优选保守性氨基酸残基)被取代的突变蛋白，而这样的取代的氨基酸残基可以是也可以不是由遗传密码编码的，或(ii)在一个或多个氨基酸残基中具有取代基团的突变蛋白，或(iii)成熟突变蛋白与另一个化合物(比如延长突变蛋白半衰期的化合物，例如聚乙二醇)融合所形成的突变蛋白，或(iv)附加的氨基酸序列融合到此突变蛋白序列而形成的突变蛋白(如前导序列或分泌序列或用来纯化此突变蛋白的序列或蛋白原序列，或与抗原IgG片段的形成的融合蛋白)。根据本文的教导，这些片段、衍生物和类似物属于本领域熟练技术人员公知的范围。本发明中，保守性替换的氨基酸最好根据表I进行氨基酸替换而产生。Mutant protein fragments, derivatives or analogs of the present invention can be (i) substituted mutant proteins with one or more conservative or non-conservative amino acid residues (preferably conservative amino acid residues), and such substituted amino acid residues may or may not be encoded by the genetic code, or (ii) mutant proteins with substituent groups in one or more amino acid residues, or (iii) mutant proteins formed by fusion of mature mutant proteins with another compound (such as a compound that prolongs the half-life of the mutant protein, such as polyethylene glycol), or (iv) mutant proteins formed by fusion of additional amino acid sequences to the mutant protein sequence (such as a leader sequence or secretory sequence or a sequence or proprotein sequence used to purify the mutant protein, or a fusion protein formed with an antigen IgG fragment). According to the teachings of this article, these fragments, derivatives and analogs belong to the well-known scope of those skilled in the art. In the present invention, the conservatively substituted amino acids are preferably produced by amino acid substitution according to Table I.

表ITable I

最初的残基Initial residue 代表性的取代Representative replacement 优选的取代Preferred substitutions Ala(A)Ala(A) Val；Leu；IleVal; Leu; Ile ValVal Arg(R)Arg(R) Lys；Gln；AsnLys; Gln; Asn LysLys Asn(N)Asn(N) Gln；His；Lys；ArgGln; His; Lys; Arg GlnGln Asp(D)Asp(D) GluGlu GluGlu Cys(C)Cys(C) SerSer SerSer Gln(Q)Gln(Q) AsnAsn AsnAsn Glu(E)Glu(E) AspAsp AspAsp Gly(G)Gly(G) Pro；AlaPro; Ala AlaAla His(H)His(H) Asn；Gln；Lys；ArgAsn; Gln; Lys; Arg ArgArg Ile(I)Ile(I) Leu；Val；Met；Ala；PheLeu; Val; Met; Ala; Phe LeuLeu Leu(L)Leu(L) Ile；Val；Met；Ala；PheIle; Val; Met; Ala; Phe IleIle Lys(K)Lys(K) Arg；Gln；AsnArg; Gln; Asn ArgArg Met(M)Met(M) Leu；Phe；IleLeu; Phe; Ile LeuLeu Phe(F)Phe(F) Leu；Val；Ile；Ala；TyrLeu; Val; Ile; Ala; Tyr LeuLeu Pro(P)Pro(P) AlaAla AlaAla Ser(S)Ser(S) ThrThr ThrThr Thr(T)Thr(T) SerSer SerSer Trp(W)Trp(W) Tyr；PheTyr; Phe TyrTyr Tyr(Y)Tyr(Y) Trp；Phe；Thr；SerTrp; Phe; Thr; Ser PhePhe Val(V)Val(V) Ile；Leu；Met；Phe；AlaIle; Leu; Met; Phe; Ala LeuLeu

在本发明的活性突变蛋白具有糖基转移酶活性，并且能够催化以下一种或多种反应：The active mutant protein of the present invention has glycosyltransferase activity and can catalyze one or more of the following reactions:

(A)(A)

(B)(B)

(C)(C)

(D)(D)

通常，除了具有上述核心氨基酸以外，本发明突变蛋白还可以具有以下一个或多个氨基酸：Generally, in addition to the above core amino acids, the mutant protein of the present invention may also have one or more of the following amino acids:

第10位氨基酸为V；The 10th amino acid is V;

第13位氨基酸为F；The 13th amino acid is F;

第82位氨基酸为C；和Amino acid at position 82 is C; and

第186位氨基酸为L。The 186th amino acid is L.

优选地，所述的突变蛋白如SEQ ID NO.:9-13、16所示，且所述的突变蛋白不是SEQID NO.:1-3所示的序列。Preferably, the mutant protein is as shown in SEQ ID NO.: 9-13, 16, and the mutant protein is not the sequence shown in SEQ ID NO.: 1-3.

本发明突变蛋白还包括将具有糖基转移酶活性的突变蛋白经C末端缺失(截短)后的突变蛋白，例如SEQ ID NO.:6-8所示的序列。The mutant proteins of the present invention also include mutant proteins obtained by deleting (truncating) the C-terminus of the mutant protein having glycosyltransferase activity, such as the sequences shown in SEQ ID NO.: 6-8.

应理解，本发明突变蛋白与SEQ ID NO.:13所示的序列相比，通常具有较高的同源性(相同性)，优选地，所述的突变蛋白与SEQ ID NO.:13所示序列的同源性至少为80％，较佳地至少为85％-90％，更佳地至少为95％，最佳地至少为98％。It should be understood that the mutant protein of the present invention generally has a higher homology (identity) than the sequence shown in SEQ ID NO.: 13. Preferably, the homology of the mutant protein to the sequence shown in SEQ ID NO.: 13 is at least 80%, more preferably at least 85%-90%, more preferably at least 95%, and most preferably at least 98%.

此外，还可以对本发明突变蛋白进行修饰。修饰(通常不改变一级结构)形式包括：体内或体外的突变蛋白的化学衍生形式如乙酰化或羧基化。修饰还包括糖基化，如那些在突变蛋白的合成和加工中或进一步加工步骤中进行糖基化修饰而产生的突变蛋白。这种修饰可以通过将突变蛋白暴露于进行糖基化的酶(如哺乳动物的糖基化酶或去糖基化酶)而完成。修饰形式还包括具有磷酸化氨基酸残基(如磷酸酪氨酸，磷酸丝氨酸，磷酸苏氨酸)的序列。还包括被修饰从而提高了其抗蛋白水解性能或优化了溶解性能的突变蛋白。In addition, the mutant protein of the present invention can also be modified. Modifications (usually without changing the primary structure) include: chemical derivatization forms of mutant proteins in vivo or in vitro, such as acetylation or carboxylation. Modifications also include glycosylation, such as those mutant proteins produced by glycosylation modification during the synthesis and processing of mutant proteins or in further processing steps. This modification can be accomplished by exposing the mutant protein to an enzyme that performs glycosylation (such as a mammalian glycosylase or deglycosylation enzyme). Modified forms also include sequences with phosphorylated amino acid residues (such as phosphotyrosine, phosphoserine, phosphothreonine). Also included are mutant proteins that have been modified to improve their anti-proteolytic properties or optimize their solubility properties.

术语“编码突变蛋白的多核苷酸”可以是包括编码本发明突变蛋白的多核苷酸，也可以是还包括附加编码和/或非编码序列的多核苷酸。The term "polynucleotide encoding a mutant protein" may include a polynucleotide encoding a mutant protein of the present invention, or may include a polynucleotide further including additional coding and/or non-coding sequences.

本发明还涉及上述多核苷酸的变异体，其编码与本发明有相同的氨基酸序列的多肽或突变蛋白的片段、类似物和衍生物。这些核苷酸变异体包括取代变异体、缺失变异体和插入变异体。如本领域所知的，等位变异体是一个多核苷酸的替换形式，它可能是一个或多个核苷酸的取代、缺失或插入，但不会从实质上改变其编码的突变蛋白的功能。The present invention also relates to variants of the above-mentioned polynucleotides, which encode fragments, analogs and derivatives of polypeptides or mutant proteins having the same amino acid sequence as the present invention. These nucleotide variants include substitution variants, deletion variants and insertion variants. As known in the art, an allelic variant is a replacement form of a polynucleotide, which may be a substitution, deletion or insertion of one or more nucleotides, but will not substantially change the function of the mutant protein encoded by it.

本发明还涉及与上述的序列杂交且两个序列之间具有至少50％，较佳地至少70％，更佳地至少80％相同性的多核苷酸。本发明特别涉及在严格条件(或严紧条件)下与本发明所述多核苷酸可杂交的多核苷酸。在本发明中，“严格条件”是指：(1)在较低离子强度和较高温度下的杂交和洗脱，如0.2×SSC，0.1％SDS，60℃；或(2)杂交时加有变性剂，如50％(v/v)甲酰胺，0.1％小牛血清/0.1％Ficoll，42℃等；或(3)仅在两条序列之间的相同性至少在90％以上，更好是95％以上时才发生杂交。The present invention also relates to polynucleotides that hybridize to the above-mentioned sequences and have at least 50%, preferably at least 70%, and more preferably at least 80% identity between the two sequences. The present invention particularly relates to polynucleotides that can hybridize to the polynucleotides of the present invention under stringent conditions (or stringent conditions). In the present invention, "stringent conditions" refer to: (1) hybridization and elution at lower ionic strength and higher temperature, such as 0.2×SSC, 0.1% SDS, 60°C; or (2) the addition of denaturing agents during hybridization, such as 50% (v/v) formamide, 0.1% calf serum/0.1% Ficoll, 42°C, etc.; or (3) hybridization occurs only when the identity between the two sequences is at least 90%, preferably 95%.

本发明的突变蛋白和多核苷酸优选以分离的形式提供，更佳地，被纯化至均质。The mutant proteins and polynucleotides of the present invention are preferably provided in an isolated form, and more preferably, purified to homogeneity.

本发明多核苷酸全长序列通常可以通过PCR扩增法、重组法或人工合成的方法获得。对于PCR扩增法，可根据本发明所公开的有关核苷酸序列，尤其是开放阅读框序列来设计引物，并用市售的cDNA库或按本领域技术人员已知的常规方法所制备的cDNA库作为模板，扩增而得有关序列。当序列较长时，常常需要进行两次或多次PCR扩增，然后再将各次扩增出的片段按正确次序拼接在一起。The full-length sequence of the polynucleotide of the present invention can usually be obtained by PCR amplification, recombination or artificial synthesis. For PCR amplification, primers can be designed based on the relevant nucleotide sequences disclosed in the present invention, especially the open reading frame sequences, and commercially available cDNA libraries or cDNA libraries prepared by conventional methods known to those skilled in the art are used as templates to amplify and obtain the relevant sequences. When the sequence is long, it is often necessary to perform two or more PCR amplifications, and then splice the fragments amplified in each time together in the correct order.

一旦获得了有关的序列，就可以用重组法来大批量地获得有关序列。这通常是将其克隆入载体，再转入细胞，然后通过常规方法从增殖后的宿主细胞中分离得到有关序列。Once the relevant sequence is obtained, it can be obtained in large quantities by recombinant methods. This is usually done by cloning it into a vector, then transferring it into cells, and then isolating the relevant sequence from the propagated host cells by conventional methods.

此外，还可用人工合成的方法来合成有关序列，尤其是片段长度较短时。通常，通过先合成多个小片段，然后再进行连接可获得序列很长的片段。In addition, artificial synthesis methods can also be used to synthesize related sequences, especially when the fragment length is shorter. Usually, a long fragment of sequence can be obtained by synthesizing multiple small fragments first and then connecting them.

目前，已经可以完全通过化学合成来得到编码本发明蛋白(或其片段，或其衍生物)的DNA序列。然后可将该DNA序列引入本领域中已知的各种现有的DNA分子(或如载体)和细胞中。此外，还可通过化学合成将突变引入本发明蛋白序列中。At present, the DNA sequence encoding the protein of the present invention (or its fragment, or its derivative) can be obtained completely by chemical synthesis. The DNA sequence can then be introduced into various existing DNA molecules (or vectors) and cells known in the art. In addition, mutations can also be introduced into the protein sequence of the present invention by chemical synthesis.

应用PCR技术扩增DNA/RNA的方法被优选用于获得本发明的多核苷酸。特别是很难从文库中得到全长的cDNA时，可优选使用RACE法(RACE-cDNA末端快速扩增法)，用于PCR的引物可根据本文所公开的本发明的序列信息适当地选择，并可用常规方法合成。可用常规方法如通过凝胶电泳分离和纯化扩增的DNA/RNA片段。The method of amplifying DNA/RNA using PCR technology is preferably used to obtain the polynucleotides of the present invention. In particular, when it is difficult to obtain full-length cDNA from a library, the RACE method (RACE-cDNA terminal rapid amplification method) can be preferably used. The primers used for PCR can be appropriately selected based on the sequence information of the present invention disclosed herein, and can be synthesized by conventional methods. The amplified DNA/RNA fragments can be separated and purified by conventional methods such as by gel electrophoresis.

野生型糖基转移酶Wild-type glycosyltransferase

在本发明中，自糖基转移酶中筛选出具有C20-OH、C6-OH糖基化活性的系列基因进行研究。其中：In the present invention, a series of genes with C20-OH and C6-OH glycosylation activities were screened out from glycosyltransferases for research. Among them:

UGTPg101基因具有序列表中SEQ ID NO:17的核苷酸序列。自SEQ ID NO:17的5’端第1-1428位核苷酸为UGTPg1的开放阅读框，自SEQ ID NO:17的5’端的第1-3位核苷酸为UGTPg1基因的起始密码子ATG，自SEQ ID NO:17的5’端的第1426-1428位核苷酸为UGTPg1基因的终止密码子TAA。糖基转移酶基因UGTPg1编码一个含有475个氨基酸的蛋白质UGTPg1，具有SEQ ID NO:1的氨基酸残基序列，用软件预测到该蛋白质的理论分子量大小为53.4kDa，等电点pI为5.01。自SEQ ID NO:1的氨基端的第344位氨基酸至第387位氨基酸为植物小分子糖基化修饰的糖基转移酶的保守PSPG基序。The UGTPg101 gene has the nucleotide sequence of SEQ ID NO:17 in the sequence list. The nucleotides 1-1428 from the 5' end of SEQ ID NO:17 are the open reading frame of UGTPg1, the nucleotides 1-3 from the 5' end of SEQ ID NO:17 are the start codon ATG of the UGTPg1 gene, and the nucleotides 1426-1428 from the 5' end of SEQ ID NO:17 are the stop codon TAA of the UGTPg1 gene. The glycosyltransferase gene UGTPg1 encodes a protein UGTPg1 containing 475 amino acids, having the amino acid residue sequence of SEQ ID NO:1. The theoretical molecular weight of the protein predicted by the software is 53.4 kDa, and the isoelectric point pI is 5.01. The amino acids 344 to 387 from the amino terminal of SEQ ID NO:1 are the conserved PSPG motifs of the glycosyltransferase modified by plant small molecule glycosylation.

UGTPg1基因具有序列表中SEQ ID NO:18的核苷酸序列。自SEQ ID NO:18的5’端第1-1428位核苷酸为UGTPg1的开放阅读框(Open Reading Frame,ORF)，自SEQ ID NO:18的5’端的第1-3位核苷酸为UGTPg1基因的起始密码子ATG，自SEQ ID NO:18的5’端的第1426-1428位核苷酸为UGTPg1基因的终止密码子TAA。糖基转移酶基因UGTPg1编码一个含有475个氨基酸的蛋白质UGTPg1，具有SEQ ID NO:的氨基酸残基序列，用软件预测到该蛋白质的理论分子量大小为53.4kDa，等电点pI为5.14。自SEQ ID NO:2的氨基端的第344位氨基酸至第387位氨基酸为为植物小分子糖基化修饰的糖基转移酶的保守PSPG(Plant SecondaryProduct Glycosyltransferase)基序。The UGTPg1 gene has the nucleotide sequence of SEQ ID NO: 18 in the sequence table. The nucleotides 1-1428 from the 5' end of SEQ ID NO: 18 are the open reading frame (ORF) of UGTPg1, the nucleotides 1-3 from the 5' end of SEQ ID NO: 18 are the start codon ATG of the UGTPg1 gene, and the nucleotides 1426-1428 from the 5' end of SEQ ID NO: 18 are the stop codon TAA of the UGTPg1 gene. The glycosyltransferase gene UGTPg1 encodes a protein UGTPg1 containing 475 amino acids, having the amino acid residue sequence of SEQ ID NO:, and the theoretical molecular weight of the protein predicted by the software is 53.4 kDa, and the isoelectric point pI is 5.14. The amino acid from position 344 to position 387 of the amino terminal of SEQ ID NO: 2 is a conserved PSPG (Plant Secondary Product Glycosyltransferase) motif of a glycosyltransferase that modifies plant small molecule glycosylation.

UGTPg102基因具有序列表中SEQ ID NO:19的核苷酸序列。自SEQ ID NO:19的5’端第1-1428位核苷酸为UGTPg1的开放阅读框，自SEQ ID NO:19的5’端的第1-3位核苷酸为UGTPg1基因的起始密码子ATG，自SEQ ID NO:19的5’端的第1426-1428位核苷酸为UGTPg1基因的终止密码子TAA。糖基转移酶基因UGTPg1编码一个含有475个氨基酸的蛋白质UGTPg1，具有SEQ ID NO:3的氨基酸残基序列，用软件预测到该蛋白质的理论分子量大小为53.6kDa，等电点pI为5.08。自SEQ ID NO:3的氨基端的第344位氨基酸至第387位氨基酸为为植物小分子糖基化修饰的糖基转移酶的保守PSPG基序。The UGTPg102 gene has the nucleotide sequence of SEQ ID NO:19 in the sequence list. The nucleotides 1-1428 from the 5' end of SEQ ID NO:19 are the open reading frame of UGTPg1, the nucleotides 1-3 from the 5' end of SEQ ID NO:19 are the start codon ATG of the UGTPg1 gene, and the nucleotides 1426-1428 from the 5' end of SEQ ID NO:19 are the stop codon TAA of the UGTPg1 gene. The glycosyltransferase gene UGTPg1 encodes a protein UGTPg1 containing 475 amino acids, having the amino acid residue sequence of SEQ ID NO:3. The theoretical molecular weight of the protein predicted by the software is 53.6 kDa, and the isoelectric point pI is 5.08. The amino acids 344 to 387 from the amino terminal of SEQ ID NO:3 are the conserved PSPG motifs of the glycosyltransferase modified by plant small molecule glycosylation.

UGTPg100基因具有序列表中SEQ ID NO:20的核苷酸序列。自SEQ ID NO:20的5’端第1-1419位核苷酸为UGTPg1的开放阅读框，自SEQ ID NO:20的5’端的第1-3位核苷酸为UGTPg1基因的起始密码子ATG，自SEQ ID NO:20的5’端的第1417-1419位核苷酸为UGTPg1基因的终止密码子TAA。糖基转移酶基因UGTPg1编码一个含有472个氨基酸的蛋白质UGTPg1，具有SEQ ID NO:4的氨基酸残基序列，用软件预测到该蛋白质的理论分子量大小为53.1kDa，等电点pI为5.08。自SEQ ID NO.:4的氨基端的第343位氨基酸至第386位氨基酸为为植物小分子糖基化修饰的糖基转移酶的保守PSPG基序。The UGTPg100 gene has the nucleotide sequence of SEQ ID NO:20 in the sequence list. The nucleotides 1-1419 from the 5' end of SEQ ID NO:20 are the open reading frame of UGTPg1, the nucleotides 1-3 from the 5' end of SEQ ID NO:20 are the start codon ATG of the UGTPg1 gene, and the nucleotides 1417-1419 from the 5' end of SEQ ID NO:20 are the stop codon TAA of the UGTPg1 gene. The glycosyltransferase gene UGTPg1 encodes a protein UGTPg1 containing 472 amino acids, having the amino acid residue sequence of SEQ ID NO:4. The theoretical molecular weight of the protein predicted by the software is 53.1 kDa, and the isoelectric point pI is 5.08. The amino acids 343 to 386 from the amino terminal of SEQ ID NO.:4 are the conserved PSPG motifs of glycosyltransferases for plant small molecule glycosylation modification.

UGTPg103基因具有序列表中SEQ ID NO:21的核苷酸序列。自SEQ ID NO:21的5’端第1-1419位核苷酸为UGTPg1的开放阅读框，自SEQ ID NO:21的5’端的第1-3位核苷酸为UGTPg1基因的起始密码子ATG，自SEQ ID NO:21的5’端的第1417-1419位核苷酸为UGTPg1基因的终止密码子TAA。糖基转移酶基因UGTPg1编码一个含有472个氨基酸的蛋白质UGTPg1，具有SEQ ID NO:5的氨基酸残基序列，用软件预测到该蛋白质的理论分子量大小为53.2kDa，等电点pI为5.23。自SEQ ID NO.:5的氨基端的第343位氨基酸至第386位氨基酸为为植物小分子糖基化修饰的糖基转移酶的保守PSPG基序。The UGTPg103 gene has the nucleotide sequence of SEQ ID NO:21 in the sequence list. The nucleotides 1-1419 from the 5' end of SEQ ID NO:21 are the open reading frame of UGTPg1, the nucleotides 1-3 from the 5' end of SEQ ID NO:21 are the start codon ATG of the UGTPg1 gene, and the nucleotides 1417-1419 from the 5' end of SEQ ID NO:21 are the stop codon TAA of the UGTPg1 gene. The glycosyltransferase gene UGTPg1 encodes a protein UGTPg1 containing 472 amino acids, having the amino acid residue sequence of SEQ ID NO:5. The theoretical molecular weight of the protein predicted by the software is 53.2 kDa, and the isoelectric point pI is 5.23. The amino acids 343 to 386 from the amino terminal of SEQ ID NO.:5 are the conserved PSPG motifs of glycosyltransferases for plant small molecule glycosylation modification.

上述涉及的野生蛋白、本发明突变蛋白及其衍生蛋白的序列信息如表2所示：The sequence information of the wild protein, mutant protein of the present invention and its derivative protein involved above is shown in Table 2:

表2Table 2

表达载体Expression vector

本发明也涉及包含本发明的多核苷酸的载体，以及用本发明的载体或本发明突变蛋白编码序列经基因工程产生的宿主细胞，以及经重组技术产生本发明所述多肽的方法。The present invention also relates to a vector comprising the polynucleotide of the present invention, a host cell produced by genetic engineering using the vector of the present invention or the mutant protein coding sequence of the present invention, and a method for producing the polypeptide of the present invention by recombinant technology.

通过常规的重组DNA技术，可利用本发明的多聚核苷酸序列可用来表达或生产重组的突变蛋白。一般来说有以下步骤：The polynucleotide sequences of the present invention can be used to express or produce recombinant mutant proteins by conventional recombinant DNA techniques. Generally, the following steps are involved:

(1).用本发明的编码本发明突变蛋白的多核苷酸(或变异体)，或用含有该多核苷酸的重组表达载体转化或转导合适的宿主细胞；(1) Transforming or transducing a suitable host cell with a polynucleotide (or variant) encoding a mutant protein of the present invention, or a recombinant expression vector containing the polynucleotide;

(2).在合适的培养基中培养的宿主细胞；(2) Host cells cultured in a suitable culture medium;

(3).从培养基或细胞中分离、纯化蛋白质(3) Isolation and purification of proteins from culture medium or cells

本发明中，编码突变蛋白的多核苷酸序列可插入到重组表达载体中。术语“重组表达载体”指本领域熟知的细菌质粒、噬菌体、酵母质粒、植物细胞病毒、哺乳动物细胞病毒如腺病毒、逆转录病毒或其他载体。只要能在宿主体内复制和稳定，任何质粒和载体都可以用。表达载体的一个重要特征是通常含有复制起点、启动子、标记基因和翻译控制元件。In the present invention, the polynucleotide sequence encoding the mutant protein can be inserted into a recombinant expression vector. The term "recombinant expression vector" refers to bacterial plasmids, bacteriophages, yeast plasmids, plant cell viruses, mammalian cell viruses such as adenoviruses, retroviruses or other vectors well known in the art. As long as they can replicate and be stable in the host, any plasmid and vector can be used. An important feature of an expression vector is that it usually contains a replication origin, a promoter, a marker gene and a translation control element.

本领域的技术人员熟知的方法能用于构建含本发明突变蛋白编码DNA序列和合适的转录/翻译控制信号的表达载体。这些方法包括体外重组DNA技术、DNA合成技术、体内重组技术等。所述的DNA序列可有效连接到表达载体中的适当启动子上，以指导mRNA合成。这些启动子的代表性例子有：大肠杆菌的lac或trp启动子；λ噬菌体PL启动子；真核启动子包括CMV立即早期启动子、HSV胸苷激酶启动子、早期和晚期SV40启动子、反转录病毒的LTRs和其他一些已知的可控制基因在原核或真核细胞或其病毒中表达的启动子。表达载体还包括翻译起始用的核糖体结合位点和转录终止子。Methods well known to those skilled in the art can be used to construct expression vectors containing the DNA sequence encoding the mutant protein of the present invention and appropriate transcription/translation control signals. These methods include in vitro recombinant DNA technology, DNA synthesis technology, in vivo recombination technology, etc. The DNA sequence can be effectively linked to an appropriate promoter in the expression vector to guide mRNA synthesis. Representative examples of these promoters include: lac or trp promoters of Escherichia coli; λ phage PL promoter; eukaryotic promoters include CMV immediate early promoter, HSV thymidine kinase promoter, early and late SV40 promoter, retroviral LTRs and other known promoters that can control gene expression in prokaryotic or eukaryotic cells or their viruses. The expression vector also includes a ribosome binding site for translation initiation and a transcription terminator.

此外，表达载体优选地包含一个或多个选择性标记基因，以提供用于选择转化的宿主细胞的表型性状，如真核细胞培养用的二氢叶酸还原酶、新霉素抗性以及绿色荧光蛋白(GFP)，或用于大肠杆菌的四环素或氨苄青霉素抗性。In addition, the expression vector preferably contains one or more selectable marker genes to provide a phenotypic trait for selection of transformed host cells, such as dihydrofolate reductase, neomycin resistance and green fluorescent protein (GFP) for eukaryotic cell culture, or tetracycline or ampicillin resistance for Escherichia coli.

包含上述的适当DNA序列以及适当启动子或者控制序列的载体，可以用于转化适当的宿主细胞，以使其能够表达蛋白质。The vector containing the above-mentioned appropriate DNA sequence and an appropriate promoter or control sequence can be used to transform appropriate host cells to enable them to express proteins.

宿主细胞可以是原核细胞，如细菌细胞；或是低等真核细胞，如酵母细胞；或是高等真核细胞，如哺乳动物细胞。代表性例子有：大肠杆菌，链霉菌属；鼠伤寒沙门氏菌的细菌细胞；真菌细胞如酵母、植物细胞(如人参细胞)。Host cells can be prokaryotic cells, such as bacterial cells; or lower eukaryotic cells, such as yeast cells; or higher eukaryotic cells, such as mammalian cells. Representative examples include: Escherichia coli, Streptomyces; bacterial cells of Salmonella typhimurium; fungal cells such as yeast, plant cells (such as ginseng cells).

本发明的多核苷酸在高等真核细胞中表达时，如果在载体中插入增强子序列时将会使转录得到增强。增强子是DNA的顺式作用因子，通常大约有10到300个碱基对，作用于启动子以增强基因的转录。可举的例子包括在复制起始点晚期一侧的100到270个碱基对的SV40增强子、在复制起始点晚期一侧的多瘤增强子以及腺病毒增强子等。When the polynucleotide of the present invention is expressed in higher eukaryotic cells, transcription will be enhanced if an enhancer sequence is inserted into the vector. Enhancers are cis-acting factors of DNA, usually about 10 to 300 base pairs, which act on the promoter to enhance gene transcription. Examples include the SV40 enhancer of 100 to 270 base pairs on the late side of the replication origin, the polyoma enhancer on the late side of the replication origin, and adenovirus enhancers.

本领域一般技术人员都清楚如何选择适当的载体、启动子、增强子和宿主细胞。Those skilled in the art will appreciate how to select appropriate vectors, promoters, enhancers and host cells.

用重组DNA转化宿主细胞可用本领域技术人员熟知的常规技术进行。当宿主为原核生物如大肠杆菌时，能吸收DNA的感受态细胞可在指数生长期后收获，用CaCl2法处理，所用的步骤在本领域众所周知。另一种方法是使用MgCl2。如果需要，转化也可用电穿孔的方法进行。当宿主是真核生物，可选用如下的DNA转染方法：磷酸钙共沉淀法，常规机械方法如显微注射、电穿孔、脂质体包装等。Transformation of host cells with recombinant DNA can be carried out using conventional techniques well known to those skilled in the art. When the host is a prokaryotic organism such as Escherichia coli, competent cells that can absorb DNA can be harvested after the exponential growth phase and treated with the CaCl2 method, the steps used are well known in the art. Another method is to use MgCl2. If necessary, transformation can also be carried out using electroporation. When the host is a eukaryotic organism, the following DNA transfection methods can be selected: calcium phosphate coprecipitation method, conventional mechanical methods such as microinjection, electroporation, liposome packaging, etc.

获得的转化子可以用常规方法培养，表达本发明的基因所编码的多肽。根据所用的宿主细胞，培养中所用的培养基可选自各种常规培养基。在适于宿主细胞生长的条件下进行培养。当宿主细胞生长到适当的细胞密度后，用合适的方法(如温度转换或化学诱导)诱导选择的启动子，将细胞再培养一段时间。The obtained transformant can be cultured by conventional methods to express the polypeptide encoded by the gene of the present invention. Depending on the host cell used, the culture medium used in the culture can be selected from various conventional culture media. Culture is carried out under conditions suitable for the growth of the host cells. After the host cells grow to an appropriate cell density, the selected promoter is induced by a suitable method (such as temperature conversion or chemical induction), and the cells are cultured for a period of time.

在上面的方法中的重组多肽可在细胞内、或在细胞膜上表达、或分泌到细胞外。如果需要，可利用其物理的、化学的和其它特性通过各种分离方法分离和纯化重组的蛋白。这些方法是本领域技术人员所熟知的。这些方法的例子包括但并不限于：常规的复性处理、用蛋白沉淀剂处理(盐析方法)、离心、渗透破菌、超处理、超离心、分子筛层析(凝胶过滤)、吸附层析、离子交换层析、高效液相层析(HPLC)和其它各种液相层析技术及这些方法的结合。The recombinant polypeptide in the above method can be expressed in the cell, on the cell membrane, or secreted outside the cell. If necessary, the recombinant protein can be separated and purified by various separation methods using its physical, chemical and other properties. These methods are well known to those skilled in the art. Examples of these methods include but are not limited to: conventional renaturation treatment, treatment with a protein precipitant (salting out method), centrifugation, osmotic sterilization, ultra-treatment, ultracentrifugation, molecular sieve chromatography (gel filtration), adsorption chromatography, ion exchange chromatography, high performance liquid chromatography (HPLC) and other various liquid chromatography techniques and combinations of these methods.

本发明有益效果：Beneficial effects of the present invention:

经大量筛选和改造，本发明发现了糖基转移酶催化活性位点，改造了相关位点后，能够将原本没有催化活性的糖基转移酶改造为具有活性的新蛋白，而对高度同源性的蛋白进行关键位点改造后可以人为改变其底物专一性。此外，在进一步缺失了C端数个氨基酸后，还能够有效地提高突变蛋白的表达量。After extensive screening and modification, the present invention discovered the catalytic active site of glycosyltransferase. After modifying the relevant site, the glycosyltransferase that originally had no catalytic activity can be transformed into a new protein with activity, and the substrate specificity of highly homologous proteins can be artificially changed after modifying the key sites. In addition, after further deleting several amino acids at the C-terminus, the expression level of the mutant protein can be effectively increased.

下面结合具体实施例，进一步阐述本发明。应理解，这些实施例仅用于说明本发明而不用于限制本发明的范围。下列实施例中未注明具体条件的实验方法，通常按照常规条件，例如Sambrook等人，分子克隆：实验室手册(New York:Cold Spring HarborLaboratory Press,1989)中所述的条件，或按照制造厂商所建议的条件。除非另外说明，否则百分比和份数是重量百分比和重量份数。The present invention will be further described below in conjunction with specific examples. It should be understood that these examples are intended to illustrate the present invention only and are not intended to limit the scope of the present invention. The experimental methods in the following examples where specific conditions are not specified are usually performed under conventional conditions, such as those described in Sambrook et al., Molecular Cloning: A Laboratory Manual (New York: Cold Spring Harbor Laboratory Press, 1989), or under conditions recommended by the manufacturer. Unless otherwise indicated, percentages and parts are weight percentages and weight parts.

实施例1.糖基转移酶UGTPg1、UGTPg101、UGTPg102、UGTPg100和UGTPg103在大肠杆菌BL21(DE3)中的表达与催化功能鉴定Example 1. Expression and catalytic function identification of glycosyltransferases UGTPg1, UGTPg101, UGTPg102, UGTPg100 and UGTPg103 in Escherichia coli BL21 (DE3)

分别合成如SEQ ID NO:22和SEQ ID NO:23核苷酸序列的两条引物。在合成的引物SEQ ID NO:22和SEQ ID NO:23两端分别设置BamHI和XhoI两个酶切位点，分别以pMDT-UGTPg1、pMDT-UGTPg101、pMDT-UGTPg102、pMDT-UGTPg100和pMDT-UGTPg103为模板进行PCR。Two primers with nucleotide sequences of SEQ ID NO: 22 and SEQ ID NO: 23 were synthesized respectively. Two restriction sites, BamHI and XhoI, were set at both ends of the synthesized primers SEQ ID NO: 22 and SEQ ID NO: 23, and PCR was performed using pMDT-UGTPg1, pMDT-UGTPg101, pMDT-UGTPg102, pMDT-UGTPg100 and pMDT-UGTPg103 as templates.

PCR扩增程序:94℃2min；94℃15s，58℃30s，68℃1.5min，共35个循环；68℃10min，降至10℃。PCR产物经琼脂糖凝胶电泳分离、回收后经BamHI和XhoI双酶切，利用NEB公司的T4DNA连接酶连入同样经BamHI和XhoI双酶切的pET28a载体(Novagen公司)中。PCR amplification program: 94°C for 2 min; 94°C for 15 s, 58°C for 30 s, 68°C for 1.5 min, for a total of 35 cycles; 68°C for 10 min, then reduced to 10°C. The PCR products were separated and recovered by agarose gel electrophoresis, then double-digested with BamHI and XhoI, and ligated into the pET28a vector (Novagen) that was also double-digested with BamHI and XhoI using T4 DNA ligase from NEB.

所获得的重组质粒命名为pET28a-UGTPg1、pET28a-UGTPg101、pET28a-UGTPg102、pET28a-UGTPg100和pET28a-UGTPg103。将这些重组质粒分别转化大肠杆菌BL21(DE3)(Novagen公司)中构建重组菌pET28a-UGTPg1-BL21、pET28a-UGTPg101-BL21、pET28a-UGTPg102-BL21、pET28a-UGTPg100-BL21和pET28a-UGTPg103-BL21。The obtained recombinant plasmids were named pET28a-UGTPg1, pET28a-UGTPg101, pET28a-UGTPg102, pET28a-UGTPg100 and pET28a-UGTPg103. These recombinant plasmids were transformed into Escherichia coli BL21 (DE3) (Novagen) to construct recombinant bacteria pET28a-UGTPg1-BL21, pET28a-UGTPg101-BL21, pET28a-UGTPg102-BL21, pET28a-UGTPg100-BL21 and pET28a-UGTPg103-BL21.

分别从平板挑取这5个重组菌株单克隆接种至含有50μg/mL卡那霉素的LB试管中37℃200rpm震荡培养过夜，按1％的比例接种至50mL LB培养基中37℃200rpm震荡培养至OD600为0.6-0.8，冰水浴冷却菌液，加入终浓度为0.1mM的IPTG,16℃下110rpm诱导18h。12000g，离心3min收集菌体，每克湿重菌体加入10mL PBS buffer(pH8.0)重悬后裂解菌体，12000g离心20min取上清作为粗酶液。同时诱导含有pET28a空载体的BL21菌株pET28a-BL21制备粗酶液作为后续实施例的对照。The five recombinant strains were picked from the plate and inoculated into LB test tubes containing 50 μg/mL kanamycin, shaken and cultured at 37°C 200rpm overnight, and inoculated into 50mL LB medium at a ratio of 1%, shaken and cultured at 37°C 200rpm until OD600 was 0.6-0.8, cooled in an ice-water bath, and IPTG was added at a final concentration of 0.1mM, and induced at 110rpm at 16°C for 18h. 12000g, centrifuged for 3min to collect the cells, added 10mL PBS buffer (pH8.0) per gram of wet weight of the cells, and then the cells were lysed after resuspending, and centrifuged at 12000g for 20min to obtain the supernatant as a crude enzyme solution. At the same time, the BL21 strain pET28a-BL21 containing the pET28a empty vector was induced to prepare a crude enzyme solution as a control for subsequent examples.

糖基转移酶催化人参皂苷苷元合成稀有人参皂苷的反应体系如下：200μL的反应液中包含PBS buffer(pH8.0)，5mM UDP-葡萄糖，0.5mM原人参三醇(PPT),1％吐温-20和150μL粗酶液。40℃水浴条件下反应4h，加入等体积的正丁醇终止反应并进行抽提，取正丁醇相真空浓缩，产物溶解于甲醇中，HPLC结果如图2。The reaction system for synthesizing rare ginsenosides from ginsenoside aglycones catalyzed by glycosyltransferase is as follows: 200 μL of reaction solution contains PBS buffer (pH 8.0), 5 mM UDP-glucose, 0.5 mM protopanaxatriol (PPT), 1% Tween-20 and 150 μL crude enzyme solution. The reaction is carried out in a 40°C water bath for 4 h, and an equal volume of n-butanol is added to terminate the reaction and extract. The n-butanol phase is concentrated in vacuo, and the product is dissolved in methanol. The HPLC results are shown in Figure 2.

结果显示UGTPg1催化PPT生成F1，即在C20-OH加上一个糖基；UGTPg101催化PPT的C20-OH糖基化先生F1，再在F1的C6-OH糖基化生成少量Rg1；UGTPg100催化PPT生成Rh1，即在PPT的C6-OH加上一分子葡萄糖；而UGTPg102和UGTPg103没有检测到酶活性。The results showed that UGTPg1 catalyzed PPT to produce F1, i.e., adding a sugar group to C20-OH; UGTPg101 catalyzed the glycosylation of C20-OH of PPT to produce F1, and then glycosylated F1 at C6-OH to produce a small amount of Rg1; UGTPg100 catalyzed PPT to produce Rh1, i.e., adding a molecule of glucose to C6-OH of PPT; while no enzyme activity was detected for UGTPg102 and UGTPg103.

实施例2.决定糖基转移酶UGTPg1和UGTPg102催化C20-OH糖基化的关键氨基酸位点Example 2. Key amino acid sites determining glycosyltransferase UGTPg1 and UGTPg102 catalyzing C20-OH glycosylation

通过SWISS-MODEL对UGTPg1进行蛋白质同源建模，并用PyMOL软件(DeLanoScientific)进行三维结构的分析。H15和D117是UGTPg1保守的催化活性位点，在这两个位点附近找到了6个UGTPg1和UGTPg102不同的氨基酸位点，即UGTPg1的L38、A40、F85、T134、H144和Q388，分别对应于UGTPg102的F38、S40、L85、I134、Y144和H388。将UGTPg1的这6个氨基酸分别突变为UGTPg102对应的氨基酸，并在大肠杆菌BL21(DE3)中进行表达，测定酶活性。Protein homology modeling of UGTPg1 was performed by SWISS-MODEL, and the three-dimensional structure was analyzed by PyMOL software (DeLanoScientific). H15 and D117 are the conserved catalytic active sites of UGTPg1. Six different amino acid sites between UGTPg1 and UGTPg102 were found near these two sites, namely L38, A40, F85, T134, H144 and Q388 of UGTPg1, which correspond to F38, S40, L85, I134, Y144 and H388 of UGTPg102, respectively. These six amino acids of UGTPg1 were mutated to the corresponding amino acids of UGTPg102, and expressed in Escherichia coli BL21 (DE3), and the enzyme activity was measured.

合成分别具有序列表中SEQ ID NO:24和SEQ ID NO:25核苷酸序列的两条引物。以质粒pET28a-UGTPg1为模板，利用Stratagen公司的QuickChange II site-directedmutagenesis kit进行UGTPg1的L38位点进行定点突变。PCR扩增程序为：95℃30s；95℃30s，55℃1min，68℃7min，共16个循环；降至10℃。PCR产物用DpnI酶切，37℃水浴反应2h，取10μL转化大肠杆菌TOP 10感受态。Synthesize two primers with nucleotide sequences of SEQ ID NO:24 and SEQ ID NO:25 in the sequence table. Use plasmid pET28a-UGTPg1 as template and use Stratagen's QuickChange II site-directed mutagenesis kit to perform site-directed mutagenesis at the L38 site of UGTPg1. The PCR amplification program is: 95℃30s; 95℃30s, 55℃1min, 68℃7min, 16 cycles in total; reduce to 10℃. The PCR product is digested with DpnI, reacted in a water bath at 37℃ for 2h, and 10μL is taken to transform Escherichia coli TOP 10 competent cells.

挑取单克隆并抽提质粒，测序验证突变位点，所获得的质粒命名为pET28a-UGTPg1-L38F。将质粒pET28a-UGTPg1-L38F转化到大肠杆菌BL21(DE3)，构建重组菌株pET28a-UGTPg1-L38F-BL21，诱导表达的步骤同实施例1。同样，对应构建UGTPg1突变后的重组质粒pET28a-UGTPg1-A40S，pET28a-UGTPg1-F85L，pET28a-UGTPg1-T143I，pET28a-UGTPg1-H144Y和pET28a-UGTPg1-Q388H，并构建了重组菌株pET28a-UGTPg1-A40S-BL21，pET28a-UGTPg1-F85L-BL21，pET28a-UGTPg1-T143I-BL21，pET28a-UGTPg1-H144Y-BL21和pET28a-UGTPg1-Q388H-BL21。定点突变的方法同上(点突变引物序列分别为SEQ ID NO:26-SEQ ID NO:35)，诱导表达和酶活测定的方法同实施例1。Single clones were picked and plasmids were extracted. The mutation sites were verified by sequencing. The obtained plasmid was named pET28a-UGTPg1-L38F. The plasmid pET28a-UGTPg1-L38F was transformed into Escherichia coli BL21 (DE3) to construct the recombinant strain pET28a-UGTPg1-L38F-BL21. The steps of inducing expression were the same as those in Example 1. Similarly, corresponding recombinant plasmids pET28a-UGTPg1-A40S, pET28a-UGTPg1-F85L, pET28a-UGTPg1-T143I, pET28a-UGTPg1-H144Y and pET28a-UGTPg1-Q388H after UGTPg1 mutation were constructed, and recombinant strains pET28a-UGTPg1-A40S-BL21, pET28a-UGTPg1-F85L-BL21, pET28a-UGTPg1-T143I-BL21, pET28a-UGTPg1-H144Y-BL21 and pET28a-UGTPg1-Q388H-BL21 were constructed. The method of site-directed mutagenesis is the same as above (the sequences of the primers for point mutations are SEQ ID NO: 26-SEQ ID NO: 35, respectively), and the methods of inducing expression and enzyme activity determination are the same as those in Example 1.

结果显示：UGTPg1-L38F，UGTPg1-A40S，UGTPg1-F85L，UGTPg1-T143I和UGTPg1-Q388H仍具有催化C20-OH糖基化的酶活性，即催化PPT生成F1，而UGTPg1-H144Y没有检测到酶活性(图3)。结果表明，H144对于UGTPg1的催化功能非常重要。The results showed that UGTPg1-L38F, UGTPg1-A40S, UGTPg1-F85L, UGTPg1-T143I and UGTPg1-Q388H still had the enzyme activity to catalyze the glycosylation of C20-OH, that is, catalyze PPT to generate F1, while UGTPg1-H144Y had no enzyme activity detected (Figure 3). The results showed that H144 is very important for the catalytic function of UGTPg1.

合成分别具有序列表中SEQ ID NO:36和SEQ ID NO:37核苷酸序列的两条引物。以质粒pET28a-UGTPg102为模板，利用Stratagen公司的QuickChange II site-directedmutagenesis kit对UGTPg102的Y144位点进行定点突变，构建了UGTPg102突变后的重组质粒pET28a-UGTPg102-Y144H和重组菌株pET28a-UGTPg102-Y144H-BL21。定点突变的方法同上，诱导表达和酶活测定的方法同实施例2。UGTPg102-Y144H获得了催化C20-OH糖基化的功能，即能够催化PPT生成F1(图3)。Synthesize two primers having the nucleotide sequences of SEQ ID NO:36 and SEQ ID NO:37 in the sequence table, respectively. Using plasmid pET28a-UGTPg102 as a template, the Y144 site of UGTPg102 was subjected to site-directed mutagenesis using Stratagen's QuickChange II site-directed mutagenesis kit to construct the recombinant plasmid pET28a-UGTPg102-Y144H and recombinant strain pET28a-UGTPg102-Y144H-BL21 after the mutation of UGTPg102. The method of site-directed mutagenesis is the same as above, and the methods of induced expression and enzyme activity determination are the same as in Example 2. UGTPg102-Y144H has acquired the function of catalyzing C20-OH glycosylation, that is, it can catalyze PPT to generate F1 (Figure 3).

结论：UGTPg1和UGTPg102氨基酸序列的identity高达95.4％，而UGTPg1具有C20-OH糖基转移酶活性，UGTPg102没有酶活性，而本实施例证明，是关键氨基酸的替换导致UGTPg102了催化功能的丢失。Conclusion: The identity of the amino acid sequences of UGTPg1 and UGTPg102 is as high as 95.4%, and UGTPg1 has C20-OH glycosyltransferase activity, while UGTPg102 has no enzyme activity. This example proves that the replacement of key amino acids leads to the loss of catalytic function of UGTPg102.

实施例3.决定糖基转移酶UGTPg100和UGTPg103催化C6-OH糖基化的关键氨基酸位点Example 3. Key amino acid sites determining glycosyltransferases UGTPg100 and UGTPg103 catalyzing C6-OH glycosylation

UGTPg100和UGTPg103氨基酸序列的identity高达98.7％，即两者之间仅有6个氨基酸的差异。通过SWISS-MODEL对UGTPg100进行蛋白质同源建模，并用PyMOL软件(DeLanoScientific)进行三维结构的分析。H15和D117是UGTPg100保守的催化活性位点，在这两个位点附近找到了2个UGTPg100和UGTPg103不同的氨基酸位点，即UGTPg100的A142和L186，分别对应于UGTPg102的T142和S186。另外两个个氨基酸位点分别位于C端和N端结构域，在UGTPg100中为G338和W205，对应于UGTPg103的R338和R205。将UGTPg100的这4个氨基酸分别突变为UGTPg103对应的氨基酸，并在大肠杆菌BL21(DE3)中进行表达，测定酶活性。另外，有两个氨基酸没有仔细研究，因为他们氨基酸侧链的结构和性质非常相似，相互替换可能对酶活的影响很小，即UGTPg100中的V111和I349分别对应于UGTPg103中的I111和V349。The identity of the amino acid sequences of UGTPg100 and UGTPg103 is as high as 98.7%, that is, there are only 6 amino acid differences between the two. Protein homology modeling of UGTPg100 was performed using SWISS-MODEL, and the three-dimensional structure was analyzed using PyMOL software (DeLanoScientific). H15 and D117 are the conserved catalytic active sites of UGTPg100. Two different amino acid sites between UGTPg100 and UGTPg103 were found near these two sites, namely A142 and L186 of UGTPg100, corresponding to T142 and S186 of UGTPg102, respectively. The other two amino acid sites are located in the C-terminal and N-terminal domains, respectively, which are G338 and W205 in UGTPg100, corresponding to R338 and R205 of UGTPg103. The four amino acids of UGTPg100 were mutated to the corresponding amino acids of UGTPg103, and the enzyme activity was measured after expression in E. coli BL21 (DE3). In addition, two amino acids were not studied in detail because the structures and properties of their amino acid side chains are very similar, and mutual substitution may have little effect on enzyme activity, that is, V111 and I349 in UGTPg100 correspond to I111 and V349 in UGTPg103, respectively.

以质粒pET28a-UGTPg100为模板，利用Stratagen公司的QuickChange II site-directed mutagenesis kit对UGTPg100的A142,L186,W205和G338位点进行定点突变，构建了UGTPg100突变后的重组质粒pET28a-UGTPg100-A142T,pET28a-UGTPg100-L186S,pET28a-UGTPg100-W205R和pET28a-UGTPg100-G338R(点突变引物序列分别为SEQ ID NO:38-SEQ IDNO:45),以及重组菌株pET28a-UGTPg100-A142T-BL21,pET28a-UGTPg100-L186S-BL21,pET28a-UGTPg100-W205-BL21和pET28a-UGTPg100-G338R-BL21。定点突变的方法同上，诱导表达和酶活测定的方法同实施例1。Plasmid pET28a-UGTPg100 was used as a template, and the A142, L186, W205 and G338 sites of UGTPg100 were subjected to site-directed mutagenesis using Stratagen's QuickChange II site-directed mutagenesis kit to construct the mutated recombinant plasmids pET28a-UGTPg100-A142T, pET28a-UGTPg100-L186S, pET28a-UGTPg100-W205R and pET28a-UGTPg100-G338R (the sequences of the point mutation primers are SEQ ID NO: 38-SEQ ID NO: 38, respectively). IDNO: 45), and recombinant strains pET28a-UGTPg100-A142T-BL21, pET28a-UGTPg100-L186S-BL21, pET28a-UGTPg100-W205-BL21 and pET28a-UGTPg100-G338R-BL21. The method of site-directed mutagenesis is the same as above, and the method of induced expression and enzyme activity determination is the same as Example 1.

结果如图4所示，UGTPg100-A142T,UGTPg100-L186S和UGTPg100-G338R都失去了C6-OH糖基转移酶的活性；而UGTPg100-W205R仍然保留着酶活性。The results are shown in FIG4 . UGTPg100-A142T, UGTPg100-L186S and UGTPg100-G338R all lost the activity of C6-OH glycosyltransferase; whereas UGTPg100-W205R still retained the enzyme activity.

以质粒pET28a-UGTPg103为模板，利用Stratagen公司的QuickChange II site-directed mutagenesis kit对UGTPg100的T142,S186和R338位分别进行定点突变(点突变引物序列分别为SEQ ID NO:46-SEQ ID NO:47)，构建了UGTPg103突变后的重组质粒pET28a-UGTPg103-T142A,pET28a-UGTPg103-S186L以及重组菌株pET28a-UGTPg103-T142A-BL21。定点突变的方法同上，诱导表达和酶活测定的方法同实施例2。如图4，UGTPg103-T142A并没有检测到酶活性。Using plasmid pET28a-UGTPg103 as a template, the QuickChange II site-directed mutagenesis kit of Stratagen was used to perform site-directed mutagenesis on T142, S186 and R338 of UGTPg100 (the sequences of the point mutation primers were SEQ ID NO:46-SEQ ID NO:47, respectively), and the recombinant plasmids pET28a-UGTPg103-T142A, pET28a-UGTPg103-S186L and the recombinant strain pET28a-UGTPg103-T142A-BL21 after mutating UGTPg103 were constructed. The method of site-directed mutagenesis was the same as above, and the methods of induced expression and enzyme activity determination were the same as in Example 2. As shown in Figure 4, no enzyme activity was detected in UGTPg103-T142A.

以质粒pET28a-UGTPg103-T142A为模板，核苷酸序列SEQ ID NO:48和SEQ ID NO:49为引物，利用Stratagen公司的QuickChange II site-directed mutagenesis kit对pET28a-UGTPg103-T142A的S186位点进行定点突变，构建了UGTPg103双突变后的重组质粒pET28a-UGTPg103-T142A-S 186L,以及重组菌株pET28a-UGTPg103-T142A-S186L-BL21。类似地，构建了UGTPg103三突变的重组质粒pET28a-UGTPg103-T142A-S186L-R338G,以及重组菌株pET28a-UGTPg103-T142A-S186L-R338G-BL21。定点突变的方法同上，诱导表达和酶活测定的方法同实施例1。Using plasmid pET28a-UGTPg103-T142A as template and nucleotide sequences SEQ ID NO:48 and SEQ ID NO:49 as primers, the S186 site of pET28a-UGTPg103-T142A was site-directed mutagenesis kit from Stratagen was used to construct the recombinant plasmid pET28a-UGTPg103-T142A-S186L with double mutation of UGTPg103 and the recombinant strain pET28a-UGTPg103-T142A-S186L-BL21. Similarly, the recombinant plasmid pET28a-UGTPg103-T142A-S186L-R338G and the recombinant strain pET28a-UGTPg103-T142A-S186L-R338G-BL21 with triple mutations of UGTPg103 were constructed. The site-directed mutagenesis method was the same as above, and the methods of induced expression and enzyme activity determination were the same as in Example 1.

结果如图4，UGTPg103-T142A-S186L也没有催化C6-OH糖基化的功能，而UGTPg103-T142A-S186L-R338G获得了催化C6-OH糖基化的功能，即能够催化PPT生成Rh1。The results are shown in Figure 4. UGTPg103-T142A-S186L also does not have the function of catalyzing C6-OH glycosylation, while UGTPg103-T142A-S186L-R338G has the function of catalyzing C6-OH glycosylation, that is, it can catalyze PPT to produce Rh1.

结论：UGTPg100和UGTPg103氨基酸序列的identity高达98.7％，即两者之间仅有6个氨基酸的差异，而UGTPg100具有C6-OH糖基转移酶活性，UGTPg103没有酶活性，说明关键氨基酸的替换导致UGTPg103了催化功能的丢失。Conclusion: The identity of the amino acid sequences of UGTPg100 and UGTPg103 is as high as 98.7%, that is, there are only 6 amino acids different between the two. UGTPg100 has C6-OH glycosyltransferase activity, while UGTPg103 has no enzyme activity, indicating that the replacement of key amino acids has led to the loss of catalytic function of UGTPg103.

实施例4.决定糖基转移酶UGTPg1和UGTPg100底物区域专一性的关键氨基酸位点和区域Example 4. Key amino acid positions and regions that determine the substrate regiospecificity of glycosyltransferases UGTPg1 and UGTPg100

UGTPg1和UGTPg100氨基酸序列的identity高达84.1％，而UGTPg1具有C20-OH糖基转移酶活性，UGTPg100具有C6-OH糖基转移酶活性，说明关键氨基酸的替换改变了其底物专一性。通过SWISS-MODEL对UGTPg1和UGTPg100进行蛋白质同源建模，并用PyMOL软件(DeLanoScientific)进行三维结构的分析，如图。H15和D117是UGTPg1和UGTPg100保守的催化活性位点，在这两个位点附近找到了4个UGTPg1和UGTPg100不同的氨基酸位点，即UGTPg1的A10、I13、H82和H144，分别对应于UGTPg100的V10、F13、C82和F144。将UGTPg1的这4个氨基酸分别突变为UGTPg100对应的氨基酸，并在大肠杆菌BL21(DE3)中进行表达，测定酶活性。The identity of the amino acid sequences of UGTPg1 and UGTPg100 is as high as 84.1%, and UGTPg1 has C20-OH glycosyltransferase activity, while UGTPg100 has C6-OH glycosyltransferase activity, indicating that the replacement of key amino acids has changed their substrate specificity. Protein homology modeling of UGTPg1 and UGTPg100 was performed by SWISS-MODEL, and the three-dimensional structure was analyzed by PyMOL software (DeLanoScientific), as shown in the figure. H15 and D117 are the conserved catalytic active sites of UGTPg1 and UGTPg100. Four different amino acid sites of UGTPg1 and UGTPg100 were found near these two sites, namely A10, I13, H82 and H144 of UGTPg1, corresponding to V10, F13, C82 and F144 of UGTPg100, respectively. The four amino acids of UGTPg1 were mutated into the corresponding amino acids of UGTPg100, respectively, and expressed in Escherichia coli BL21 (DE3), and the enzyme activity was measured.

合成分别具有序列表中SEQ ID NO:52和SEQ ID NO:53,SEQ ID NO:54和SEQ IDNO:55，SEQ ID NO:56和SEQ ID NO:57，SEQ ID NO:58和SEQ ID NO:59核苷酸序列作为引物。以质粒pET28a-UGTPg1为模板，利用Stratagen公司的QuickChange II site-directedmutagenesis kit对UGTPg1的A10、I13、H82和H144位点进行定点突变，构建了UGTPg1突变后的重组质粒pET28a-UGTPg1-A10V、pET28a-UGTPg1-I13F、pET28a-UGTPg1-H82C和pET28a-UGTPg1-H144F,以及重组菌株pET28a-UGTPg1-A10V-BL21、pET28a-UGTPg1-I13F-BL21、pET28a-UGTPg1-H82C-BL21和pET28a-UGTPg1-H144F-BL21。定点突变的方法同上，诱导表达和酶活测定的方法同实施例2。如图5，UGTPg1-A10V，UGTPg1-I13F和UGTPg1-H82C的C20-OH糖基转移酶的活性降低了；UGTPg1-H82C和UGTPg1-H144F获得了催化C6-OH糖基化的功能，但UGTPg1-H144F催化C6-OH糖基化的活性很弱。The nucleotide sequences of SEQ ID NO: 52 and SEQ ID NO: 53, SEQ ID NO: 54 and SEQ ID NO: 55, SEQ ID NO: 56 and SEQ ID NO: 57, SEQ ID NO: 58 and SEQ ID NO: 59 in the sequence listing were synthesized as primers. Using plasmid pET28a-UGTPg1 as a template, the A10, I13, H82 and H144 sites of UGTPg1 were subjected to site-directed mutagenesis using Stratagen's QuickChange II site-directed mutagenesis kit to construct recombinant plasmids pET28a-UGTPg1-A10V, pET28a-UGTPg1-I13F, pET28a-UGTPg1-H82C and pET28a-UGTPg1-H144F after UGTPg1 mutation, as well as recombinant strains pET28a-UGTPg1-A10V-BL21, pET28a-UGTPg1-I13F-BL21, pET28a-UGTPg1-H82C-BL21 and pET28a-UGTPg1-H144F-BL21. The site-directed mutagenesis method was the same as above, and the methods for inducing expression and enzyme activity determination were the same as in Example 2. As shown in Figure 5, the C20-OH glycosyltransferase activity of UGTPg1-A10V, UGTPg1-I13F and UGTPg1-H82C was reduced; UGTPg1-H82C and UGTPg1-H144F acquired the function of catalyzing C6-OH glycosylation, but the activity of UGTPg1-H144F in catalyzing C6-OH glycosylation was very weak.

实施例5.糖基转移酶UGTPg1，UGTPg101和UGTPg100的C末端截短促进可溶表达Example 5. C-terminal truncation of glycosyltransferases UGTPg1, UGTPg101 and UGTPg100 promotes soluble expression

糖基转移酶UGTPg1，UGTPg101和UGTPg100在大肠杆菌中表达主要以包涵体的形式存在，可溶表达量低。在N端截短5个氨基酸后，UGTPg1，UGTPg101和UGTPg100对于底物PPT都没有酶活性。在C端截短5个氨基酸后，UGTPg1，UGTPg101和UGTPg100对于底物PPT的酶活力降低，但可溶表达量却显著提高。在C端截短2个、4个氨基酸后，结果发现UGTPg1和UGTPg101的可溶表达量相对于野生型至少有50％提高(图6A)，但酶活力没有明显改变(图6B)。虽然在UGTPg100的C末端截短2个、4个氨基酸后，其可溶表达量相对于野生型也至少提高了50％，但是其酶活力却显著降低(图6)。Glycosyltransferases UGTPg1, UGTPg101 and UGTPg100 expressed in E. coli mainly exist in the form of inclusion bodies, and the soluble expression level is low. After truncating 5 amino acids at the N-terminus, UGTPg1, UGTPg101 and UGTPg100 have no enzyme activity for the substrate PPT. After truncating 5 amino acids at the C-terminus, the enzyme activity of UGTPg1, UGTPg101 and UGTPg100 for the substrate PPT is reduced, but the soluble expression level is significantly increased. After truncating 2 and 4 amino acids at the C-terminus, it was found that the soluble expression level of UGTPg1 and UGTPg101 increased by at least 50% relative to the wild type (Figure 6A), but the enzyme activity did not change significantly (Figure 6B). Although the soluble expression level of UGTPg100 increased by at least 50% relative to the wild type after truncating 2 and 4 amino acids at the C-terminus, its enzyme activity was significantly reduced (Figure 6).

在本发明提及的所有文献都在本申请中引用作为参考，就如同每一篇文献被单独引用作为参考那样。此外应理解，在阅读了本发明的上述讲授内容之后，本领域技术人员可以对本发明作各种改动或修改，这些等价形式同样落于本申请所附权利要求书所限定的范围。All documents mentioned in the present invention are cited as references in this application, just as each document is cited as reference individually. In addition, it should be understood that after reading the above teachings of the present invention, those skilled in the art can make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined by the claims attached to this application.

Claims

1. A mutant protein of a glycosyltransferase, wherein the mutant protein is a non-natural protein, the mutant protein has glycosyltransferase catalytic activity, and the mutant protein is shown as SEQ ID NO. 13 or 16.

2. The mutein of claim 1, wherein the glycosyltransferase catalytic activity comprises catalyzing one or more of the following reactions:

(a) Transferring a glycosyl group from a glycosyl donor to a hydroxyl group at the C-20 position of a tetracyclic triterpene compound;

(b) The glycosyl group from the glycosyl donor is transferred to the hydroxyl group at the C-6 position of the tetracyclic triterpene compound.

3. The mutein of claim 1 or 2, wherein the glycosyltransferase catalytic activity comprises catalyzing one or more of the following reactions:

(A)

wherein the compound of formula (I) is protopanaxatriol (protopanaxatriol), the compound of formula (II) is ginsenoside F1 (20-O-beta-D-glucopyranosyl-20 (S) -protopanaxatriol), and the mutein comprises SEQ ID NO. 13;

(B)

Wherein the compound of formula (I) is protopanaxatriol (protopanaxatriol), and the compound of formula (III) is ginsenoside Rh1 (6-O-beta-D-glucopyranosyl-20 (S) -protopanaxatriol); or (b)

The compound of formula (II) is ginsenoside F1 (20-O-beta-D-glucopyranosyl-20 (S) -protopanaxatriol), and the compound of formula (IV) is ginsenoside Rg1 (6-O-beta- (D-glucopyranosyl) -20-O-beta- (D-glucopyranosyl) -protopanaxatriol);

the mutant protein comprises SEQ ID NO. 16.

4. The protein of claim 1, wherein the mutein is set forth in SEQ ID No. 16 and wherein the glycosyltransferase activity comprises the transfer of a glycosyl group from a glycosyl donor to the hydroxyl group at the C-6 position of a tetracyclic triterpene compound; and/or

The mutein is shown in SEQ ID No. 13 and the glycosyltransferase activity comprises the transfer of a glycosyl group from a glycosyl donor to the hydroxyl group at the C-20 position of a tetracyclic triterpene compound.

5. A polynucleotide encoding the mutein of any one of claims 1-4.

6. A carrier, characterized in that, the vector contains the polynucleotide of claim 5.

7. A host cell comprising the vector of claim 6, or having integrated into its genome the polynucleotide of claim 5, and which does not comprise a plant cell.

8. Use of a mutein according to claim 1, characterized in that the mutein is used for catalyzing one or more of the following reactions or is used for the preparation of a catalytic preparation for catalyzing one or more of the following reactions:

9. A method of performing a glycosyl catalytic reaction comprising the steps of: the glycosylation reaction is performed in the presence of the mutein of claim 1.

10. The use of a host cell according to claim 7 for the preparation of a glycosyltransferase, or as a catalytic cell, or for the production of glycosylated tetracyclic triterpenes.

11. A method of producing a transgenic plant comprising the steps of: introducing the vector of claim 6 into a plant cell and regenerating the plant cell into a plant.