CN108866020A

CN108866020A - Glycosyl transferase, mutant and its application

Info

Publication number: CN108866020A
Application number: CN201710344730.7A
Authority: CN
Inventors: 周志华; 严兴; 王平平; 魏维; 李晓东
Original assignee: Shanghai Institutes for Biological Sciences SIBS of CAS
Current assignee: Shanghai Institutes for Biological Sciences SIBS of CAS
Priority date: 2017-05-16
Filing date: 2017-05-16
Publication date: 2018-11-23
Also published as: JP2020520244A; KR102418138B1; CN110462033A; JP7086107B2; KR20200016268A; WO2018210208A1

Abstract

本发明涉及糖基转移酶、突变体及其应用。具体地，提供了一种体外糖基化方法，包括步骤：在糖基转移酶存在下，将糖基供体的糖基转移到四环三萜类化合物的C‑3位羟基上；从而形成糖基化的四环三萜类化合物；其中，所述的糖基转移酶为如SEQ ID NO.4所示的糖基转移酶或其衍生多肽；或如SEQ ID NO.21所示的糖基转移酶或其衍生多肽。The present invention relates to glycosyltransferases, mutants and applications thereof. Specifically, an in vitro glycosylation method is provided, comprising the steps of: in the presence of a glycosyltransferase, transferring the glycosyl of the glycosyl donor to the C-3 hydroxyl group of a tetracyclic triterpenoid; thereby forming Glycosylated tetracyclic triterpenoids; wherein, the glycosyltransferase is the glycosyltransferase shown in SEQ ID NO.4 or its derivative polypeptide; or the sugar shown in SEQ ID NO.21 base transferase or its derivative polypeptide.

Description

Glycosyltransferases, mutants and applications thereof

技术领域technical field

本发明涉及生物技术和植物生物学领域，具体地，本发明涉及用于人参皂苷Rh2合成的糖基转移酶、糖基转移酶突变体及其应用。The invention relates to the fields of biotechnology and plant biology, in particular, the invention relates to a glycosyltransferase for the synthesis of ginsenoside Rh2, a mutant of the glycosyltransferase and applications thereof.

背景技术Background technique

人参皂苷是五加科人参属植物(如人参、三七、西洋参等)中的主要活性物质，近年来在葫芦科植物三七中也发现一些人参皂苷。目前，国内外科学家已经从人参、三七等植物中分离出了至少100种皂苷，人参皂苷属于三萜皂苷。其中一些人参皂苷被证实具有广泛的生理功能和药用价值：包括抗肿瘤、免疫调节、抗疲劳、护心、护肝等功能。其中多种皂苷已经用于临床，如以人参皂苷Rg3单体为主要成分的药物参一胶囊可改善肿瘤患者的气虚症状，提高机体免疫功能。以人参皂苷Rh2单体为主要成分的今幸胶囊是一种保健药品用于提高机体免疫力，增强抗病能力。Ginsenoside is the main active substance in Panax genus Araliaceae (such as ginseng, Panax notoginseng, American ginseng, etc.), and some ginsenosides have also been found in Cucurbitaceae Panax notoginseng in recent years. At present, scientists at home and abroad have isolated at least 100 saponins from plants such as ginseng and Panax notoginseng, and ginsenosides belong to triterpenoid saponins. Some ginsenosides have been proved to have a wide range of physiological functions and medicinal value: including anti-tumor, immune regulation, anti-fatigue, heart protection, liver protection and other functions. A variety of saponins have been used clinically. For example, Shenyi Capsules, a drug mainly composed of ginsenoside Rg3 monomer, can improve the symptoms of Qi deficiency in tumor patients and improve the immune function of the body. Jinxing Capsule, which is mainly composed of ginsenoside Rh2 monomer, is a health care medicine used to improve the body's immunity and enhance disease resistance.

从结构上来看，人参皂苷是皂苷元经过糖基化后形成的生物活性小分子。人参皂苷的皂苷元只有有限的几种，主要是达玛烷型的原人参二醇和原人参三醇，以及齐墩果烷型的香树脂。除了皂苷元的差异，人参皂苷之间结构上的差异主要体现在皂苷元不同的糖基化修饰上。人参皂苷的糖链一般结合在皂苷元的C3、C6、或C20的羟基上，糖基可以是葡萄糖、鼠李糖、木糖、和阿拉伯糖。From a structural point of view, ginsenosides are biologically active small molecules formed by glycosylation of saponins. There are only a limited number of saponins in ginsenosides, mainly dammarane-type protopanaxadiol and protopanaxatriol, and oleanane-type balsam. In addition to the differences in sapogenins, the structural differences between ginsenosides are mainly reflected in the different glycosylation modifications of sapogenins. The sugar chains of ginsenosides are generally bound to the C3, C6, or C20 hydroxyl groups of saponins, and the sugar groups can be glucose, rhamnose, xylose, and arabinose.

不同的糖基结合位点，糖链组成和长度使人参皂苷在生理功能和药用价值上产生极大的差异。例如，人参皂苷Rb1，Rd和Rc都是以原人参二醇为皂苷元的皂苷，它们之间的差别只是糖基修饰上的差别，但它们之间的生理功能就有很多的差别。Rb1有稳定中心神经元系统的功能，而Rc的功能却是抑制中心神经元系统的功能，Rb1的生理功能非常广泛，而Rd却只有非常有限的几种功能。Different glycosyl binding sites, sugar chain composition and length make ginsenosides have great differences in physiological functions and medicinal value. For example, ginsenosides Rb1, Rd, and Rc are all saponins with protopanaxadiol as saponin, and the difference between them is only the difference in sugar modification, but there are many differences in their physiological functions. Rb1 has the function of stabilizing the central neuron system, while the function of Rc is to inhibit the function of the central neuron system. The physiological functions of Rb1 are very extensive, while Rd has only a few very limited functions.

稀有人参皂苷是指在人参中含量极低的皂苷。人参皂苷Rh2(3-O-β-(D-glucopyranosyl)-20(S)-protopanaxadiol)属于原人参二醇类的皂苷，在皂苷元的C-3位羟基上连有一个葡萄糖基。人参皂苷Rh2的含量大约只有人参干重的万分之一左右，但是，人参皂苷Rh2具有良好的抗肿瘤活性，是人参中最主要的抗肿瘤活性成分之一，能够抑制肿瘤细胞生长、诱导肿瘤细胞凋亡、抗肿瘤转移。研究表明人参皂苷Rh2能够抑制lung cancercells 3LL(mice)，Morris liver cancer cells(rats)，B-16melanoma cells(mice)，以及HeLa cells(human)的增值。在临床上，人参皂苷Rh2与放疗或化疗结合治疗，可以增强放疗和化疗的效果。此外，人参皂苷Rh2还具有抗过敏，提高机体免疫力的功能，抑制NO和PGE产生的炎症等作用。Rare ginsenosides refer to saponins whose content is extremely low in ginseng. Ginsenoside Rh2 (3-O-β-(D-glucopyranosyl)-20(S)-protopanaxadiol) belongs to protopanaxadiol saponin, and a glucose group is attached to the C-3 hydroxyl group of saponin. The content of ginsenoside Rh2 is only about one ten-thousandth of the dry weight of ginseng. However, ginsenoside Rh2 has good anti-tumor activity and is one of the most important anti-tumor active ingredients in ginseng. It can inhibit the growth of tumor cells and induce tumors. Apoptosis, anti-tumor metastasis. Studies have shown that ginsenoside Rh2 can inhibit the proliferation of lung cancer cells 3LL (mice), Morris liver cancer cells (rats), B-16melanoma cells (mice), and HeLa cells (human). Clinically, ginsenoside Rh2 can be combined with radiotherapy or chemotherapy to enhance the effect of radiotherapy and chemotherapy. In addition, ginsenoside Rh2 also has the functions of anti-allergy, improving the body's immunity, and inhibiting the inflammation produced by NO and PGE.

糖基转移酶的功能是将糖基供体(核苷二磷酸糖，例如UDP-葡萄糖)上的糖基转移到不同的糖基受体上。根据氨基酸序列的不同，目前糖基转移酶已有94个家族。目前已测序的植物基因组中，发现了上百种以上不同的糖基转移酶。这些糖基转移酶的糖基受体包括糖、脂、蛋白、核酸、抗生素和其它的小分子。在人参中参与皂苷糖基化的糖基转移酶，其作用是把糖基供体上的糖基转移到皂苷元或者苷元的C3、C6或C20的羟基上，从而形成具有不同药用价值的皂苷。The function of glycosyltransferases is to transfer sugars from sugar donors (nucleoside diphosphate sugars, such as UDP-glucose) to different sugar acceptors. According to the different amino acid sequences, there are currently 94 families of glycosyltransferases. More than a hundred different glycosyltransferases have been found in the plant genomes that have been sequenced so far. Glycosyl acceptors for these glycosyltransferases include sugars, lipids, proteins, nucleic acids, antibiotics and other small molecules. The glycosyltransferase involved in the glycosylation of saponins in ginseng is to transfer the glycosyl on the glycosyl donor to the C3, C6 or C20 hydroxyl of saponin or aglycon, thus forming a glycoside with different medicinal value. saponins.

目前本领域尚缺乏一种有效的生产稀有人参皂苷Rh2、人参皂苷F2的方法，因此迫切需要开发多种特异高效的糖基转移酶。At present, there is still a lack of an effective method for producing rare ginsenoside Rh2 and ginsenoside F2 in this field, so it is urgent to develop a variety of specific and efficient glycosyltransferases.

发明内容Contents of the invention

本发明的目的就是提供一类糖基转移酶及其应用，用于合成稀有人参皂苷Rh2、人参皂苷F2。The object of the present invention is to provide a class of glycosyltransferase and its application for synthesizing rare ginsenoside Rh2 and ginsenoside F2.

本发明的第一方面，提供了一种分离的多肽，所述分离的多肽的氨基酸序列在对应于SEQ ID NO:19所示氨基酸序列的第222位的氨基酸残基为非Gln和/或在对应于SEQ IDNO:19所示氨基酸序列第322位的氨基酸残基为非Ala。The first aspect of the present invention provides an isolated polypeptide, the amino acid sequence of the isolated polypeptide corresponding to the amino acid residue at position 222 of the amino acid sequence shown in SEQ ID NO: 19 is non-Gln and/or at The amino acid residue corresponding to the 322nd position of the amino acid sequence shown in SEQ ID NO: 19 is not Ala.

在另一优选例中，所述分离的多肽：In another preference, the isolated polypeptide:

i).具有如SEQ ID NO:19所示氨基酸序列且第222位的氨基酸残基为非Gln和/或第322位的氨基酸残基为非Ala，或i). has the amino acid sequence shown in SEQ ID NO: 19 and the amino acid residue at position 222 is non-Gln and/or the amino acid residue at position 322 is non-Ala, or

ii).具有i)所限定的序列经过一个或几个氨基酸残基，优选1-20个、更优选1-15个、更优选1-10个、更优选1-3个、最优选1个氨基酸残基的取代、缺失或添加而形成的序列，且基本具有i)所限定的分离的多肽功能的由i)衍生的分离的多肽。ii). Having the sequence defined in i) through one or several amino acid residues, preferably 1-20, more preferably 1-15, more preferably 1-10, more preferably 1-3, most preferably 1 An isolated polypeptide derived from i) that is a sequence formed by substitution, deletion or addition of amino acid residues and substantially has the function of the isolated polypeptide defined in i).

在另一优选例中，所述分离的多肽具有i)所限定的序列经过一个或几个氨基酸残基，优选1-20个、更优选1-15个、更优选1-10个、更优选1-3个、最优选1个氨基酸残基的添加而形成的序列，且基本具有i)所限定的分离的多肽功能的由i)衍生的分离的多肽。In another preferred example, the isolated polypeptide has the sequence defined in i) through one or several amino acid residues, preferably 1-20, more preferably 1-15, more preferably 1-10, more preferably An isolated polypeptide derived from i), a sequence formed by the addition of 1-3, most preferably 1 amino acid residue, and substantially having the function of the isolated polypeptide defined in i).

在另一优选例中，所述分离的多肽的氨基酸序列在对应于SEQ ID NO:19所示氨基酸序列的第222位的氨基酸残基选自以下氨基酸的至少一种：His、Asn、Gln、Lys和Arg。In another preferred example, the amino acid sequence of the isolated polypeptide corresponds to at least one of the following amino acids at the amino acid residue at position 222 of the amino acid sequence shown in SEQ ID NO: 19: His, Asn, Gln, Lys and Arg.

在另一优选例中，所述分离的多肽的氨基酸序列在对应于SEQ ID NO:19所示氨基酸序列的第222位的氨基酸残基为His。In another preferred example, the amino acid residue corresponding to the 222nd position of the amino acid sequence shown in SEQ ID NO: 19 in the amino acid sequence of the isolated polypeptide is His.

在另一优选例中，所述分离的多肽的氨基酸序列在对应于SEQ ID NO:19所示氨基酸序列的第322位的氨基酸残基选自以下氨基酸的至少一种：Val、Ile、Leu、Met和Phe。In another preferred example, the amino acid sequence of the isolated polypeptide corresponding to the 322nd amino acid residue in the amino acid sequence shown in SEQ ID NO: 19 is selected from at least one of the following amino acids: Val, Ile, Leu, Met and Phe.

在另一优选例中，所述分离的多肽的氨基酸序列在对应于SEQ ID NO:19所示氨基酸序列的第322位的氨基酸残基为Val。In another preferred example, the amino acid residue corresponding to the 322nd position of the amino acid sequence shown in SEQ ID NO: 19 in the amino acid sequence of the isolated polypeptide is Val.

在另一优选例中，所述分离的多肽的氨基酸序列在对应于SEQ ID NO:19所示氨基酸序列的第222位的氨基酸残基为His，在对应于SEQ ID NO:19所示氨基酸序列第322位的氨基酸残基为Val。In another preferred example, the amino acid sequence of the isolated polypeptide corresponds to the amino acid residue at position 222 of the amino acid sequence shown in SEQ ID NO: 19 is His, and the amino acid residue corresponding to the amino acid sequence shown in SEQ ID NO: 19 is His. The amino acid residue at position 322 is Val.

iii).SEQ ID NO:19所示氨基酸序列且第222位的氨基酸残基为非Gln和/或第322位的氨基酸残基为非Ala，或iii). The amino acid sequence shown in SEQ ID NO: 19 and the amino acid residue at position 222 is non-Gln and/or the amino acid residue at position 322 is non-Ala, or

iv).具有iii)所限定的序列经过一个或几个氨基酸残基，优选1-20个、更优选1-15个、更优选1-10个、更优选1-3个、最优选1个氨基酸残基的取代、缺失或添加而形成的序列，且基本具有iii)所限定的分离的多肽功能的由iii)衍生的分离的多肽。iv). Having a sequence defined by iii) through one or several amino acid residues, preferably 1-20, more preferably 1-15, more preferably 1-10, more preferably 1-3, most preferably 1 An isolated polypeptide derived from iii) that is a sequence formed by substitution, deletion or addition of amino acid residues and substantially has the function of the isolated polypeptide as defined in iii).

在另一优选例中，所述分离的多肽具有iii)所限定的序列经过一个或几个氨基酸残基，优选1-20个、更优选1-15个、更优选1-10个、更优选1-3个、最优选1个氨基酸残基的添加而形成的序列，且基本具有iii)所限定的分离的多肽功能的由iii)衍生的分离的多肽。In another preferred example, the isolated polypeptide has the sequence defined in iii) through one or several amino acid residues, preferably 1-20, more preferably 1-15, more preferably 1-10, more preferably An isolated polypeptide derived from iii), a sequence formed by the addition of 1-3, most preferably 1 amino acid residue, and substantially having the function of the isolated polypeptide defined in iii).

在另一优选例中，所述分离的多肽的氨基酸序列在SEQ ID NO:19所示氨基酸序列的第222位的氨基酸残基选自以下氨基酸的至少一种：His、Asn、Gln、Lys和Arg。In another preferred example, the amino acid residue of the amino acid sequence of the isolated polypeptide at position 222 of the amino acid sequence shown in SEQ ID NO: 19 is selected from at least one of the following amino acids: His, Asn, Gln, Lys and Arg.

在另一优选例中，所述分离的多肽的氨基酸序列在SEQ ID NO:19所示氨基酸序列的第222位的氨基酸残基为His。In another preferred example, the amino acid residue at position 222 of the amino acid sequence of the isolated polypeptide shown in SEQ ID NO: 19 is His.

在另一优选例中，所述分离的多肽的氨基酸序列在SEQ ID NO:19所示氨基酸序列的第322位的氨基酸残基选自以下氨基酸的至少一种：Val、Ile、Leu、Met和Phe。In another preferred example, the amino acid residue of the amino acid sequence of the isolated polypeptide at position 322 of the amino acid sequence shown in SEQ ID NO: 19 is selected from at least one of the following amino acids: Val, Ile, Leu, Met and Phe.

在另一优选例中，所述分离的多肽的氨基酸序列在SEQ ID NO:19所示氨基酸序列的第322位的氨基酸残基为Val。In another preferred example, the amino acid residue at position 322 of the amino acid sequence of the isolated polypeptide shown in SEQ ID NO: 19 is Val.

在另一优选例中，所述分离的多肽的氨基酸序列在SEQ ID NO:19所示氨基酸序列的第222位的氨基酸残基为His，在SEQ ID NO:19所示氨基酸序列第322位的氨基酸残基为Val。In another preferred example, the amino acid residue of the isolated polypeptide at position 222 of the amino acid sequence shown in SEQ ID NO: 19 is His, and the amino acid residue at position 322 of the amino acid sequence shown in SEQ ID NO: 19 is His. The amino acid residue is Val.

在另一优选例中，所述分离的多肽为糖基转移酶。In another preferred embodiment, the isolated polypeptide is a glycosyltransferase.

在另一优选例中，所述糖基转移酶来源于人参属植物。In another preferred example, the glycosyltransferase is derived from plants of the genus Panax.

在另一优选例中，所述糖基转移酶来源于人参、西洋参和/或三七。In another preferred example, the glycosyltransferase is derived from ginseng, American ginseng and/or Panax notoginseng.

在另一优选例中，所述的多肽选自下组：In another preferred example, the polypeptide is selected from the following group:

(a)具有SEQ ID NO.:4或SEQ ID NO.:21所示氨基酸序列的多肽；(a) a polypeptide having the amino acid sequence shown in SEQ ID NO.:4 or SEQ ID NO.:21;

(b)将SEQ ID NO.:4或SEQ ID NO.:21所示氨基酸序列的多肽经过一个或几个氨基酸残基，优选1-20个、更优选1-15个、更优选1-10个、更优选1-3个、最优选1个氨基酸残基的取代、缺失或添加而形成的、或是添加信号肽序列后形成的、并具有糖基转移酶活性的衍生多肽；(b) passing one or several amino acid residues to the polypeptide of the amino acid sequence shown in SEQ ID NO.:4 or SEQ ID NO.:21, preferably 1-20, more preferably 1-15, more preferably 1-10 A derivative polypeptide formed by substitution, deletion or addition of one, more preferably 1-3, most preferably 1 amino acid residue, or after adding a signal peptide sequence, and having glycosyltransferase activity;

(c)序列中含有(a)或(b)中所述多肽序列的衍生多肽；(c) a derivative polypeptide containing the polypeptide sequence described in (a) or (b) in its sequence;

(d)氨基酸序列与SEQ ID NO.:4或SEQ ID NO.:21所示氨基酸序列的同源性≥85％(较佳地≥90％、91％、92％、93％、94％、95％、96％、97％、98％或99％)，并具有糖基转移酶活性的衍生多肽。(d) The homology between the amino acid sequence and the amino acid sequence shown in SEQ ID NO.:4 or SEQ ID NO.:21 is ≥85% (preferably ≥90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99%), and a derivative polypeptide having glycosyltransferase activity.

在另一优选例中，所述的序列(c)为由(a)或(b)添加了标签序列、信号序列或分泌信号序列后所形成的融合蛋白。In another preferred example, the sequence (c) is a fusion protein formed by adding a tag sequence, signal sequence or secretion signal sequence to (a) or (b).

在另一优选例中，所述糖基转移酶活性指能将糖基供体的糖基转移到四环三萜类化合物的C-3位羟基上的活性。In another preferred example, the glycosyltransferase activity refers to the activity that can transfer the sugar group of the sugar group donor to the C-3 hydroxyl group of the tetracyclic triterpenoid compound.

在另一优选例中，所述糖基转移酶能提高人参皂苷Rh2和/或人参皂苷F2的产量。In another preferred embodiment, the glycosyltransferase can increase the yield of ginsenoside Rh2 and/or ginsenoside F2.

在另一优选例中，在人工构建的菌株中，所述糖基转移酶能提高人参皂苷Rh2的产量；优选地，提高5-150％；更优选地，提高10-100％；更优选地，提高20-80％；最优选，提高28-70％。In another preference, in the artificially constructed strain, the glycosyltransferase can increase the yield of ginsenoside Rh2; preferably, by 5-150%; more preferably, by 10-100%; more preferably , increased by 20-80%; most preferably, increased by 28-70%.

在另一优选例中，所述人工构建的菌株选自下组：酿酒酵母菌株、大肠杆菌菌株、毕赤酵母菌株、裂殖酵母菌株、克鲁维酵母菌株。In another preferred example, the artificially constructed strain is selected from the group consisting of Saccharomyces cerevisiae strains, Escherichia coli strains, Pichia pastoris strains, Schizosaccharomyces strains, and Kluyveromyces strains.

本发明的第二方面，提供了一种分离的多核苷酸，所述的多核苷酸为选自下组的序列：The second aspect of the present invention provides an isolated polynucleotide, said polynucleotide being a sequence selected from the group consisting of:

(A)编码本发明的第一方面所述多肽的核苷酸序列；(A) a nucleotide sequence encoding the polypeptide described in the first aspect of the present invention;

(B)编码如SEQ ID NO.:4或SEQ ID NO.:21所示多肽或其衍生多肽的核苷酸序列；(B) a nucleotide sequence encoding a polypeptide as shown in SEQ ID NO.:4 or SEQ ID NO.:21 or a derivative thereof;

(C)如SEQ ID NO.:3或SEQ ID NO.:22所示的核苷酸序列；(C) a nucleotide sequence as shown in SEQ ID NO.:3 or SEQ ID NO.:22;

(D)与SEQ ID NO.:3或SEQ ID NO.:22所示序列的同源性≥90％(较佳地≥91％、92％、93％、94％、95％、96％、97％、98％或99％)的核苷酸序列；(D) Homology with the sequence shown in SEQ ID NO.:3 or SEQ ID NO.:22 ≥90% (preferably ≥91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%) of the nucleotide sequence;

(E)在SEQ ID NO.:3或SEQ ID NO.:22所示核苷酸序列的5’端和/或3’端截短或添加1-60个(较佳地1-30，更佳地1-10个)核苷酸所形成的核苷酸序列；(E) Truncating or adding 1-60 (preferably 1-30, more Preferably, a nucleotide sequence formed by 1-10) nucleotides;

(F)与(A)-(E)任一所述的核苷酸序列互补的核苷酸序列。(F) A nucleotide sequence complementary to the nucleotide sequence described in any one of (A)-(E).

本发明的第三方面，提供了一种载体，所述的载体含有本发明的第二方面所述的多核苷酸。The third aspect of the present invention provides a vector containing the polynucleotide described in the second aspect of the present invention.

在另一优选例中，所述载体包括表达载体、穿梭载体、整合载体。In another preferred example, the vectors include expression vectors, shuttle vectors, and integration vectors.

本发明的第四方面，提供了本发明的第一方面所述分离的多肽的用途，它被用于催化以下反应，或被用于制备催化以下反应的催化制剂：The fourth aspect of the present invention provides the use of the isolated polypeptide described in the first aspect of the present invention, which is used to catalyze the following reactions, or is used to prepare catalytic preparations that catalyze the following reactions:

(i)将来自糖基供体的糖基转移到四环三萜类化合物的C-3位羟基上。(i) Transfer of a glycosyl from a glycosyl donor to the C-3 hydroxyl group of a tetracyclic triterpenoid.

在另一优选例中，所述的糖基供体包括选自下组的核苷二磷酸糖：UDP-葡萄糖，ADP-葡萄糖，TDP-葡萄糖，CDP-葡萄糖，GDP-葡萄糖，UDP-乙酰基葡萄糖，ADP-乙酰基葡萄糖，TDP-乙酰基葡萄糖，CDP-乙酰基葡萄糖，GDP-乙酰基葡萄糖，UDP-木糖，ADP-木糖，TDP-木糖，CDP-木糖，GDP-木糖，UDP-木糖，UDP-半乳糖醛酸，ADP-半乳糖醛酸，TDP-半乳糖醛酸，CDP-半乳糖醛酸，GDP-半乳糖醛酸，UDP-半乳糖，ADP-半乳糖，TDP-半乳糖，CDP-半乳糖，GDP-半乳糖，UDP-阿拉伯糖，ADP-阿拉伯糖，TDP-阿拉伯糖，CDP-阿拉伯糖，GDP-阿拉伯糖，UDP-鼠李糖，ADP-鼠李糖，TDP-鼠李糖，CDP-鼠李糖，GDP-鼠李糖，或其他核苷二磷酸己糖或核苷二磷酸戊糖，或其组合。In another preferred example, the glycosyl donor includes nucleoside diphosphate sugars selected from the group consisting of: UDP-glucose, ADP-glucose, TDP-glucose, CDP-glucose, GDP-glucose, UDP-acetyl Glucose, ADP-Acetyl Glucose, TDP-Acetyl Glucose, CDP-Acetyl Glucose, GDP-Acetyl Glucose, UDP-Xylose, ADP-Xylose, TDP-Xylose, CDP-Xylose, GDP-Xylose , UDP-xylose, UDP-galacturonic acid, ADP-galacturonic acid, TDP-galacturonic acid, CDP-galacturonic acid, GDP-galacturonic acid, UDP-galactose, ADP-galactose , TDP-galactose, CDP-galactose, GDP-galactose, UDP-arabinose, ADP-arabinose, TDP-arabinose, CDP-arabinose, GDP-arabinose, UDP-rhamnose, ADP-mouse Litose, TDP-rhamnose, CDP-rhamnose, GDP-rhamnose, or other hexose nucleoside diphosphates or pentose nucleoside diphosphates, or combinations thereof.

在另一优选例中，所述的糖基供体包括选自下组的尿苷二磷酸(UDP)糖：UDP-葡萄糖，UDP-木糖，UDP-半乳糖醛酸，UDP-半乳糖，UDP-阿拉伯糖，UDP-鼠李糖，或其他尿苷二磷酸己糖或尿苷二磷酸戊糖，或其组合。In another preferred example, the glycosyl donor includes uridine diphosphate (UDP) sugars selected from the group consisting of: UDP-glucose, UDP-xylose, UDP-galacturonic acid, UDP-galactose, UDP-arabinose, UDP-rhamnose, or other uridine diphosphate hexoses or uridine diphosphate pentoses, or combinations thereof.

在另一优选例中，所述分离的多肽用于催化下述反应或被用于制备催化下述反应的催化制剂：In another preferred embodiment, the isolated polypeptide is used to catalyze the following reactions or is used to prepare catalytic preparations that catalyze the following reactions:

其中，R1为H或者OH；R2为H或者OH；R3为H或者糖基；R4为糖基。Wherein, R1 is H or OH; R2 is H or OH; R3 is H or glycosyl; R4 is glycosyl.

所述的多肽选自SEQ ID NO.:4或SEQ ID NO.:21所示的多肽或其衍生多肽。The polypeptide is selected from the polypeptides shown in SEQ ID NO.:4 or SEQ ID NO.:21 or derivatives thereof.

在另一优选例中，所述的糖基选自：葡萄糖基、半乳糖醛酸基、木糖糖基，半乳糖基、阿拉伯糖基、鼠李糖基，以及其他己糖基或戊糖基。In another preferred embodiment, the sugar group is selected from: glucose group, galacturonic acid group, xylosyl group, galactosyl group, arabinosyl group, rhamnosyl group, and other hexosyl or pentose groups base.

在另一优选例中，所述反应(A)和或(B)的反应产物包括(但不限于)：S构型或R构型的达玛烷型四环三萜类化合物、羊毛脂烷型四环三萜类化合物、apotirucallane型四环三萜、甘遂烷型四环三萜类化合物、环阿屯烷(环阿尔廷烷)型四环三萜类化合物、葫芦烷四环三萜类化合物、或楝烷型四环三萜类化合物。In another preferred example, the reaction products of the reactions (A) and or (B) include (but not limited to): S-configuration or R-configuration dammarane-type tetracyclic triterpenoids, lanolin type tetracyclic triterpenoids, apotirucallane type tetracyclic triterpenoids, cuminane type tetracyclic triterpenoids, cycloartane (cycloaltinane) type tetracyclic triterpenoids, cucurbitane tetracyclic triterpenoids compounds, or neem-type tetracyclic triterpenoids.

本发明的第五方面，提供了一种体外糖基化方法，包括步骤：The fifth aspect of the present invention provides an in vitro glycosylation method, comprising the steps of:

在糖基转移酶存在下，将糖基供体的糖基转移到四环三萜类化合物的C-3位羟基上；从而形成糖基化的四环三萜类化合物；In the presence of a glycosyltransferase, the glycosyl of the glycosyl donor is transferred to the C-3 hydroxyl group of the tetracyclic triterpenoid; thereby forming a glycosylated tetracyclic triterpenoid;

其中，所述的糖基转移酶为本发明的第一方面所述的多肽或其衍生多肽。Wherein, the glycosyltransferase is the polypeptide described in the first aspect of the present invention or a derivative polypeptide thereof.

在另一优选例中，所述的衍生多肽选自：In another preferred example, the derivative polypeptide is selected from:

将SEQ ID NO.:4或SEQ ID NO.:21所示氨基酸序列的多肽经过一个或几个氨基酸残基的取代、缺失或添加而形成的、或是添加信号肽序列后形成的、并具有糖基转移酶活性的衍生多肽；或The polypeptide of the amino acid sequence shown in SEQ ID NO.:4 or SEQ ID NO.:21 is formed by substituting, deleting or adding one or several amino acid residues, or is formed after adding a signal peptide sequence, and has a derivative polypeptide with glycosyltransferase activity; or

氨基酸序列与SEQ ID NO.:4或SEQ ID NO.:21氨基酸序列的同源性≥85％(较佳地≥90％、91％、92％、93％、94％、95％、96％、97％、98％、99％)，并具有糖基转移酶活性的衍生多肽；Amino acid sequence and SEQ ID NO.: 4 or SEQ ID NO.: 21 amino acid sequence homology ≥ 85% (preferably ≥ 90%, 91%, 92%, 93%, 94%, 95%, 96% , 97%, 98%, 99%), and a derivative polypeptide with glycosyltransferase activity;

其中，所述糖基转移酶活性指能将糖基供体的糖基转移到四环三萜类化合物的C-3位羟基上的活性。Wherein, the glycosyltransferase activity refers to the activity that can transfer the glycosyl of the glycosyl donor to the C-3 hydroxyl group of the tetracyclic triterpenoid compound.

本发明的第六方面，提供了一种进行糖基催化反应的方法，包括步骤：在本发明的第一方面所述的多肽或其衍生多肽存在的条件下，进行糖基催化反应。The sixth aspect of the present invention provides a method for catalyzing a glycosyl reaction, comprising the step of: performing the catalyzed reaction of a glycosyl in the presence of the polypeptide described in the first aspect of the present invention or its derivative polypeptide.

在另一优选例中，所述的方法还包括步骤：In another preferred example, the method also includes the steps of:

在糖基供体以及本发明的第一方面所述多肽或其衍生多肽的存在下，将式(I)化合物转化为所述式(II)化合物。The compound of formula (I) is converted into said compound of formula (II) in the presence of a glycosyl donor and said polypeptide of the first aspect of the invention or a derivative thereof.

在另一优选例中，所述式(I)化合物为原人参二醇PPD，并且式(II)化合物为人参皂苷Rh2；或In another preferred example, the compound of formula (I) is protopanaxadiol PPD, and the compound of formula (II) is ginsenoside Rh2; or

所述式(I)化合物为Compound K，并且式(II)化合物为人参皂苷F2。The compound of formula (I) is Compound K, and the compound of formula (II) is ginsenoside F2.

在另一优选例中，所述的方法还包括将所述的多肽及其衍生多肽分别加入催化反应；和/或In another preferred embodiment, the method further includes adding the polypeptide and its derivative polypeptide to the catalytic reaction respectively; and/or

将所述的多肽及其衍生多肽同时加入催化反应。The polypeptide and its derived polypeptide are simultaneously added to the catalytic reaction.

在另一优选例中，所述的方法还包括将编码糖基转移酶的核苷酸序列与达玛稀二醇和/或原人参二醇合成代谢途径中的关键基因和/或其他糖基转移酶基因在宿主细胞中共表达，从而获得所述的式(II)化合物。In another preferred example, the method also includes combining the nucleotide sequence encoding glycosyltransferase with key genes and/or other glycosyltransferases in the synthetic metabolic pathway of dammarindiol and/or protopanaxadiol The enzyme gene is co-expressed in the host cell, thereby obtaining the compound of formula (II).

在另一优选例中，所述式(II)化合物为人参皂苷Rh2或人参皂苷F2。In another preferred example, the compound of formula (II) is ginsenoside Rh2 or ginsenoside F2.

在另一优选例中，所述的宿主细胞为酵母菌或大肠杆菌。In another preferred example, the host cell is yeast or Escherichia coli.

在另一优选例中，所述方法还包括：向反应体系中提供用于调节酶活性的添加物。In another preferred example, the method further includes: providing an additive for regulating enzyme activity into the reaction system.

在另一优选例中，所述的用于调节酶活性的添加物是：提高酶活性或抑制酶活性的添加物。In another preferred example, the additive for regulating enzyme activity is: an additive for increasing enzyme activity or inhibiting enzyme activity.

在另一优选例中，所述的用于调节酶活性的添加物选自下组：Ca²⁺、Co²⁺、Mn²⁺、Ba²⁺、Al³⁺、Ni²⁺、Zn²⁺、或Fe²⁺。In another preferred example, the additive for regulating enzyme activity is selected from the group consisting of Ca ²⁺ , Co ²⁺ , Mn ²⁺ , Ba ²⁺ , Al ³⁺ , Ni ²⁺ , Zn ²⁺ , or Fe ²⁺ .

在另一优选例中，所述的用于调节酶活性的添加物是：可以生成Ca²⁺、Co²⁺、Mn²⁺、Ba²⁺、Al³⁺、Ni²⁺、Zn²⁺、或Fe²⁺的物质。In another preferred example, the additives used to regulate enzyme activity are: capable of generating Ca ²⁺ , Co ²⁺ , Mn ²⁺ , Ba ²⁺ , Al ³⁺ , Ni ²⁺ , Zn ²⁺ , or Fe ²⁺ species.

在另一优选例中，所述的糖基供体是核苷二磷酸糖，选自下组：UDP-葡萄糖，ADP-葡萄糖，TDP-葡萄糖，CDP-葡萄糖，GDP-葡萄糖，UDP-乙酰基葡萄糖，ADP-乙酰基葡萄糖，TDP-乙酰基葡萄糖，CDP-乙酰基葡萄糖，GDP-乙酰基葡萄糖，UDP-木糖，ADP-木糖，TDP-木糖，CDP-木糖，GDP-木糖，UDP-木糖，UDP-半乳糖醛酸，ADP-半乳糖醛酸，TDP-半乳糖醛酸，CDP-半乳糖醛酸，GDP-半乳糖醛酸，UDP-半乳糖，ADP-半乳糖，TDP-半乳糖，CDP-半乳糖，GDP-半乳糖，UDP-阿拉伯糖，ADP-阿拉伯糖，TDP-阿拉伯糖，CDP-阿拉伯糖，GDP-阿拉伯糖，UDP-鼠李糖，ADP-鼠李糖，TDP-鼠李糖，CDP-鼠李糖，GDP-鼠李糖，或其他核苷二磷酸己糖或核苷二磷酸戊糖，或其组合。In another preferred example, the glycosyl donor is a nucleoside diphosphate sugar selected from the group consisting of: UDP-glucose, ADP-glucose, TDP-glucose, CDP-glucose, GDP-glucose, UDP-acetyl Glucose, ADP-Acetyl Glucose, TDP-Acetyl Glucose, CDP-Acetyl Glucose, GDP-Acetyl Glucose, UDP-Xylose, ADP-Xylose, TDP-Xylose, CDP-Xylose, GDP-Xylose , UDP-xylose, UDP-galacturonic acid, ADP-galacturonic acid, TDP-galacturonic acid, CDP-galacturonic acid, GDP-galacturonic acid, UDP-galactose, ADP-galactose , TDP-galactose, CDP-galactose, GDP-galactose, UDP-arabinose, ADP-arabinose, TDP-arabinose, CDP-arabinose, GDP-arabinose, UDP-rhamnose, ADP-mouse Litose, TDP-rhamnose, CDP-rhamnose, GDP-rhamnose, or other hexose nucleoside diphosphates or pentose nucleoside diphosphates, or combinations thereof.

在另一优选例中，所述的糖基供体是尿苷二磷酸糖，选自下组：UDP-葡萄糖，UDP-木糖，UDP-半乳糖醛酸，UDP-半乳糖，UDP-阿拉伯糖，UDP-鼠李糖，或其他尿苷二磷酸己糖或尿苷二磷酸戊糖，或其组合。In another preferred example, the glycosyl donor is uridine diphosphate sugar, selected from the group consisting of: UDP-glucose, UDP-xylose, UDP-galacturonic acid, UDP-galactose, UDP-arabino Sugar, UDP-rhamnose, or other uridine diphosphate hexoses or uridine diphosphate pentoses, or combinations thereof.

在另一优选例中，反应体系的pH为：pH4.0-10.0，优选pH为5.5-9.0。In another preferred embodiment, the pH of the reaction system is: pH 4.0-10.0, preferably pH 5.5-9.0.

在另一优选例中，反应体系的温度为：10℃-105℃，优选20℃-50℃。In another preferred example, the temperature of the reaction system is: 10°C-105°C, preferably 20°C-50°C.

在另一优选例中，所述的达玛稀二醇合成代谢途径中的关键基因包括(但不限于)：达玛烯二醇合成酶基因。In another preferred example, the key genes in the synthetic metabolic pathway of dammarenediol include (but not limited to): dammarenediol synthase gene.

在另一优选例中，所述的原人参二醇合成代谢途径中的关键基因包括(但不限于)：达玛烯二醇合成酶基因、原人参二醇合成的细胞色素P450基因CYP716A47和其的还原酶基因，或其组合。In another preferred example, the key genes in the synthetic metabolic pathway of protopanaxadiol include (but not limited to): dammarenediol synthase gene, cytochrome P450 gene CYP716A47 of protopanaxadiol synthesis and other The reductase gene, or a combination thereof.

在另一优选例中，所述糖基催化反应的底物为式(I)化合物，且所述的产物为式(II)化合物。In another preferred embodiment, the substrate of the sugar-catalyzed reaction is the compound of formula (I), and the product is the compound of formula (II).

本发明的第七方面，提供了一种遗传工程化的宿主细胞，所述的宿主细胞含有本发明的第三方面所述的载体，或其基因组中整合有本发明的第二方面所述的多核苷酸。The seventh aspect of the present invention provides a genetically engineered host cell containing the vector described in the third aspect of the present invention, or the vector described in the second aspect of the present invention is integrated into its genome. polynucleotide.

在另一优选例中，所述的糖基转移酶为本发明的第一方面所述的多肽或其衍生多肽。In another preferred embodiment, the glycosyltransferase is the polypeptide described in the first aspect of the present invention or a derivative thereof.

在另一优选例中，编码所述糖基转移酶的核苷酸序列如本发明的第二方面所述。In another preferred example, the nucleotide sequence encoding the glycosyltransferase is as described in the second aspect of the present invention.

在另一优选例中，所述的细胞为原核细胞或真核细胞。In another preferred example, the cells are prokaryotic cells or eukaryotic cells.

在另一优选例中，所述的宿主细胞为真核细胞，如酵母细胞或植物细胞。In another preferred embodiment, the host cells are eukaryotic cells, such as yeast cells or plant cells.

在另一优选例中，所述的宿主细胞为酿酒酵母细胞。In another preferred example, the host cell is a Saccharomyces cerevisiae cell.

在另一优选例中，所述的宿主细胞原核细胞，如大肠杆菌。In another preferred embodiment, the host cell is a prokaryotic cell, such as Escherichia coli.

在另一优选例中，所述的宿主细胞为人参细胞。In another preferred example, the host cells are ginseng cells.

在另一优选例中，所述的宿主细胞不是天然产生式(II)化合物的细胞。In another preferred embodiment, the host cell is not a cell that naturally produces the compound of formula (II).

在另一优选例中，所述的宿主细胞不是天然产生人参皂苷Rh2或人参皂苷F2的细胞。In another preferred example, the host cell is not a cell that naturally produces ginsenoside Rh2 or ginsenoside F2.

在另一优选例中，所述的宿主细胞含有原人参二醇合成代谢途径中的关键基因包括(但不限于)：达玛烯二醇合成酶基因、原人参二醇合成的细胞色素P450基因CYP716A47及其还原酶基因，或其组合。In another preferred example, the host cell contains key genes in the synthetic metabolic pathway of protopanaxadiol, including (but not limited to): dammarenediol synthase gene, cytochrome P450 gene synthesized by protopanaxadiol CYP716A47 and its reductase gene, or a combination thereof.

本发明的第八方面，提供了本发明的第七方面所述的宿主细胞的用途，用于制备酶催化试剂，或生产糖基转移酶、或作为催化细胞、或产生糖基化的四环三萜类化合物。The eighth aspect of the present invention provides the use of the host cell described in the seventh aspect of the present invention for preparing enzyme catalytic reagents, or producing glycosyltransferases, or as catalytic cells, or producing glycosylated tetracyclic Triterpenes.

在另一优选例中，所述四环三萜类化合物为式(II)化合物。In another preferred example, the tetracyclic triterpenoid compound is a compound of formula (II).

在另一优选例中，所述的宿主细胞用于通过对式(I)化合物的糖基化反应，生产式(II)化合物。In another preferred embodiment, the host cell is used to produce the compound of formula (II) through glycosylation of the compound of formula (I).

在另一优选例中，所述的宿主细胞用于通过对原人参二醇PPD和/或Compound K的糖基化反应，生产人参皂苷Rh2和/或人参皂苷F2。In another preferred embodiment, the host cell is used to produce ginsenoside Rh2 and/or ginsenoside F2 through glycosylation of protopanaxadiol PPD and/or Compound K.

本发明的第九方面，提供了一种产生转基因植物的方法，包括步骤：将本发明的第七方面所述的遗传工程化的宿主细胞再生为植物，并且所述的遗传工程化的宿主细胞为植物细胞。A ninth aspect of the present invention provides a method for producing a transgenic plant, comprising the steps of: regenerating the genetically engineered host cell described in the seventh aspect of the present invention into a plant, and the genetically engineered host cell for plant cells.

在另一优选例中，所述的遗传工程化的宿主细胞选自：人参细胞、花旗参细胞、三七细胞、烟草细胞。In another preferred example, the genetically engineered host cells are selected from: ginseng cells, American ginseng cells, Panax notoginseng cells, and tobacco cells.

应理解，在本发明范围内中，本发明的上述各技术特征和在下文(如实施例)中具体描述的各技术特征之间都可以互相组合，从而构成新的或优选的技术方案。限于篇幅，在此不再一一累述。It should be understood that within the scope of the present invention, the above-mentioned technical features of the present invention and the technical features specifically described in the following (such as embodiments) can be combined with each other to form new or preferred technical solutions. Due to space limitations, we will not repeat them here.

附图说明Description of drawings

图1糖基转移酶基因Pn50琼脂糖凝胶电泳图。Fig. 1 Agarose gel electrophoresis diagram of glycosyltransferase gene Pn50.

图2糖基转移酶基因Pn50催化人参皂苷TLC分析图。Figure 2 TLC analysis diagram of glycosyltransferase gene Pn50 catalyzed ginsenosides.

图3为利用Pn50构建重组酿酒酵母菌株产稀有人参皂苷Rh2的HPLC分析图。Fig. 3 is an HPLC analysis chart of rare ginsenoside Rh2 produced by a recombinant Saccharomyces cerevisiae strain constructed by using Pn50.

具体实施方式Detailed ways

本发明人经过广泛而深入的研究，首次提供三七糖基转移酶Pn50(SEQ ID NO.:4)和人参来源的糖基转移酶UGTPg45的突变体8E7(SEQ ID NO.:21)在萜类化合物糖基化催化及新皂苷合成中的应用。After extensive and in-depth research, the present inventors provided for the first time the mutant 8E7 (SEQ ID NO.: 21) of notoginseng glycosyltransferase Pn50 (SEQ ID NO.: 4) and ginseng-derived glycosyltransferase UGTPg45 in terpene Application in the catalysis of glycosylation of compounds and the synthesis of new saponins.

具体地，本发明的糖基转移酶能特异和高效地催化四环三萜化合物底物)的和/或将来自糖基供体的糖基转移到四环三萜类化合物的C-3位羟基上。特别是能够高效将原人参二醇PPD转化为具有抗肿瘤活性的稀有人参皂苷Rh2，将稀有人参皂苷Compound K(在PPD的C20位具有一个糖基修饰)转化为产物人参皂苷F2。意外地，向产原人参二醇的酿酒酵母菌株中导入三七来源的糖基转移酶基因Pn50所构建的重组酿酒酵母菌株ZWBY04RS-Pn50能合成稀有人参皂苷Rh2，并且与导入野生型人参来源的糖基转移酶基因UGTPg45的菌株ZWBY04RS-UGTPg45相比，菌株ZWBY04RS-Pn50人参皂苷Rh2的产量提高28％(45.55/35.66-1＝28％)；利用随机突变对UGTPg45进行改造获得的突变体基因8E7所构建的人工合成稀有人参皂苷Rh2菌株ZWBY04RS-8E7，其Rh2产量相比于利用UGTPg45所构建菌株ZWBY04RS-UGTPg45产量提升了70％(60.48/35.66-1＝70％)。Specifically, the glycosyltransferase of the present invention can specifically and efficiently catalyze the tetracyclic triterpenoid substrate) and/or transfer the glycosyl from the glycosyl donor to the C-3 position of the tetracyclic triterpenoid on the hydroxyl group. In particular, it can efficiently convert protopanaxadiol PPD into rare ginsenoside Rh2 with anti-tumor activity, and convert rare ginsenoside Compound K (with a glycosyl modification at C20 of PPD) into product ginsenoside F2. Unexpectedly, the recombinant Saccharomyces cerevisiae strain ZWBY04RS-Pn50 constructed by introducing the glycosyltransferase gene Pn50 derived from Panax notoginseng into the S. Compared with the strain ZWBY04RS-UGTPg45 of the glycosyltransferase gene UGTPg45, the yield of ginsenoside Rh2 of the strain ZWBY04RS-Pn50 increased by 28% (45.55/35.66-1=28%); the mutant gene 8E7 obtained by transforming UGTPg45 by random mutation Compared with the strain ZWBY04RS-UGTPg45 constructed by using UGTPg45, the Rh2 production of the artificially synthesized rare ginsenoside Rh2 strain ZWBY04RS-8E7 was increased by 70% (60.48/35.66-1=70%).

本发明还提供了转化和催化方法。本发明的糖基转移酶还可与达玛烯二醇和/或原人参二醇合成代谢途径中的关键酶(例如达玛烯二醇合成酶基因PgDDS、原人参二醇合成的细胞色素P450基因CYP716A47及其还原酶基因PgCPR1)在宿主细胞中共表达，或者应用于制备人参皂苷Rh2的遗传工程细胞中，应用于构建人工合成稀有人参皂苷Rh2的菌株。The invention also provides conversion and catalytic methods. The glycosyltransferase of the present invention can also be synthesized with dammarenediol and/or key enzymes in the protopanaxadiol synthetic metabolic pathway (such as the cytochrome P450 gene of dammarenediol synthase gene PgDDS, protopanaxadiol synthesis) CYP716A47 and its reductase gene (PgCPR1) are co-expressed in host cells, or applied to genetically engineered cells for preparing ginsenoside Rh2, and used to construct strains for artificially synthesizing rare ginsenoside Rh2.

此外，本发明的糖基转移酶还可与达玛烯二醇和/或原人参二醇合成代谢途径中的关键酶在宿主细胞中共表达，应用于构建人工合成稀有人参皂苷Rh2的菌株。在此基础上完成了本发明。In addition, the glycosyltransferase of the present invention can also be co-expressed with key enzymes in the synthetic metabolic pathway of dammarenediol and/or protopanaxadiol in host cells, and applied to construct strains for artificially synthesizing rare ginsenoside Rh2. The present invention has been accomplished on this basis.

定义definition

如本文所用，术语“活性多肽”、“本发明的多肽及其衍生多肽”、“本发明的酶”、“糖基转移酶”或“本发明的糖基转移酶”可互换使用，并具有本领域普通技术人员通常理解的含义。本发明糖基转移酶具有将将糖基供体的糖基转移到四环三萜类化合物的C-3位羟基上的活性。As used herein, the terms "active polypeptide", "polypeptide of the present invention and derivative polypeptide thereof", "enzyme of the present invention", "glycosyltransferase" or "glycosyltransferase of the present invention" are used interchangeably, and has the meaning commonly understood by those of ordinary skill in the art. The glycosyltransferase of the present invention has the activity of transferring the glycosyl of the glycosyl donor to the C-3 hydroxyl group of the tetracyclic triterpenoid compound.

在本发明的一个优选例中，本发明糖基转移酶的氨基酸序列在对应于SEQ ID NO:19所示氨基酸序列的第222位的氨基酸残基为非Gln和/或在对应于SEQ ID NO:19所示氨基酸序列第322位的氨基酸残基为非Ala。In a preferred example of the present invention, the amino acid sequence of the glycosyltransferase of the present invention corresponds to the amino acid residue at position 222 of the amino acid sequence shown in SEQ ID NO: 19 as non-Gln and/or at the amino acid residue corresponding to SEQ ID NO The amino acid residue at position 322 of the amino acid sequence shown in :19 is not Ala.

在本发明的一个优选例中，本发明糖基转移酶：In a preferred example of the present invention, the glycosyltransferase of the present invention:

在本发明的一个优选例中，本发明糖基转移酶具有i)所限定的序列经过一个或几个氨基酸残基，优选1-20个、更优选1-15个、更优选1-10个、更优选1-3个、最优选1个氨基酸残基的添加而形成的序列，且基本具有i)所限定的分离的多肽功能的由i)衍生的分离的多肽。In a preferred embodiment of the present invention, the glycosyltransferase of the present invention has the sequence defined in i) through one or several amino acid residues, preferably 1-20, more preferably 1-15, more preferably 1-10 , more preferably 1-3, most preferably 1 amino acid residue addition sequence, and the isolated polypeptide derived from i) substantially having the function of the isolated polypeptide defined in i).

在本发明的一个优选例中，本发明糖基转移酶的氨基酸序列在对应于SEQ ID NO:19所示氨基酸序列的第222位的氨基酸残基选自以下氨基酸的至少一种：His、Asn、Gln、Lys和Arg。In a preferred example of the present invention, the amino acid sequence of the glycosyltransferase of the present invention corresponds to the 222nd amino acid residue of the amino acid sequence shown in SEQ ID NO: 19 and is selected from at least one of the following amino acids: His, Asn , Gln, Lys and Arg.

在本发明的一个优选例中，本发明糖基转移酶的氨基酸序列在对应于SEQ ID NO:19所示氨基酸序列的第222位的氨基酸残基为His。In a preferred example of the present invention, the amino acid residue corresponding to the 222nd position of the amino acid sequence shown in SEQ ID NO:19 in the amino acid sequence of the glycosyltransferase of the present invention is His.

在本发明的一个优选例中，本发明糖基转移酶的氨基酸序列在对应于SEQ ID NO:19所示氨基酸序列的第322位的氨基酸残基选自以下氨基酸的至少一种：Val、Ile、Leu、Met和Phe。In a preferred example of the present invention, the amino acid sequence of the glycosyltransferase of the present invention corresponds to at least one of the following amino acids at the amino acid residue at position 322 of the amino acid sequence shown in SEQ ID NO: 19: Val, Ile , Leu, Met and Phe.

在本发明的一个优选例中，本发明糖基转移酶的氨基酸序列在对应于SEQ ID NO:19所示氨基酸序列的第322位的氨基酸残基为Val。In a preferred example of the present invention, the amino acid residue corresponding to the 322nd position of the amino acid sequence shown in SEQ ID NO: 19 in the amino acid sequence of the glycosyltransferase of the present invention is Val.

在本发明的一个优选例中，本发明糖基转移酶的氨基酸序列在对应于SEQ ID NO:19所示氨基酸序列的第222位的氨基酸残基为His，在对应于SEQ ID NO:19所示氨基酸序列第322位的氨基酸残基为Val。In a preferred example of the present invention, the amino acid sequence of the glycosyltransferase of the present invention corresponds to the 222nd amino acid residue in the amino acid sequence shown in SEQ ID NO: 19 is His, and in the amino acid residue corresponding to SEQ ID NO: 19 The amino acid residue at position 322 in the amino acid sequence is Val.

在本发明的一个优选例中，本发明糖基转移酶选自下组：In a preferred embodiment of the present invention, the glycosyltransferase of the present invention is selected from the following group:

在具体的实施方式中，本发明糖基转移酶活性指能将糖基供体的糖基转移到四环三萜类化合物的C-3位羟基上的活性。In a specific embodiment, the glycosyltransferase activity of the present invention refers to the activity of transferring the glycosyl group of the glycosyl donor to the C-3 hydroxyl group of the tetracyclic triterpenoid compound.

在本发明的一个优选例中，从三七中克隆的一个新的糖基转移酶基因Pn50，利用这个新的糖基转移酶可以催化多种达玛烷型人参皂苷C3位糖基化。In a preferred example of the present invention, a new glycosyltransferase gene Pn50 cloned from Panax notoginseng can catalyze the C3 glycosylation of various dammarane-type ginsenosides by using this new glycosyltransferase.

通过对三七转录组数据进行分析，从中拼接到一个全长的糖基转移酶基因序列，并命名为Pn50。将其克隆到克隆载体PMDT-18T上，然后设计表达引物构建到大肠杆菌表达载体pET28a上，使其在大肠杆菌中诱导表达。所获得的蛋白能催化原人二醇和Compound K的C3位羟基糖基化。Through the analysis of the Panax notoginseng transcriptome data, a full-length glycosyltransferase gene sequence was spliced from it, and named Pn50. It was cloned into the cloning vector PMDT-18T, and then the expression primers were designed and constructed into the Escherichia coli expression vector pET28a to induce expression in Escherichia coli. The obtained protein can catalyze the C3 hydroxyl glycosylation of protohuman diol and Compound K.

所述三七糖基转移酶基因Pn50替换人参来源的UGTPg45可以大幅提升Rh2产量(提高28％)。The substitution of the notoginseng glycosyltransferase gene Pn50 for the UGTPg45 derived from ginseng can greatly increase the production of Rh2 (increased by 28%).

在本发明的另一个优选例中，提供了一种糖基转移酶突变体蛋白8E7。所述糖基转移酶突变体蛋白为人参来源的野生型基因UGTPg45的突变体蛋白，所述突变糖基转移酶基因8E7替换人参来源的野生型基因UGTPg45可以大幅提升Rh2产量(提高70％)。In another preferred embodiment of the present invention, a glycosyltransferase mutant protein 8E7 is provided. The glycosyltransferase mutant protein is a mutant protein of the ginseng-derived wild-type gene UGTPg45, and the replacement of the ginseng-derived wild-type gene UGTPg45 by the mutant glycosyltransferase gene 8E7 can greatly increase Rh2 production (70% increase).

本领域普通技术人员不难知晓，在多肽的某些区域，例如非重要区域改变少数氨基酸残基基本上不会改变生物活性，例如，适当替换某些氨基酸得到的序列并不会影响其活性(可参见Watson等，Molecular Biology of The Gene，第四版，1987，The Benjamin/Cummings Pub.Co.P224)。因此，本领域普通技术人员能够实施这种替换并且确保所得分子仍具有所需生物活性。It is not difficult for those skilled in the art to know that changing a few amino acid residues in some regions of the polypeptide, such as non-essential regions, will not basically change the biological activity, for example, the sequence obtained by properly replacing some amino acids will not affect its activity ( See Watson et al., Molecular Biology of The Gene, Fourth Edition, 1987, The Benjamin/Cummings Pub. Co. P224). Accordingly, one of ordinary skill in the art is able to perform such substitutions and ensure that the resulting molecule still possesses the desired biological activity.

因此，本发明的多肽可以在对应于SEQ ID NO:19所示氨基酸序列的第222位的氨基酸残基为非Gln和/或在对应于SEQ ID NO:19所示氨基酸序列第322位的氨基酸残基为非Ala的基础上作进一步突变而仍具备本发明糖基转移酶的功能和活性。例如本发明的糖基转移酶(a)其氨基酸序列如SEQ ID NO:21所示；或(b)包含(a)所限定的序列经过一个或多个氨基酸残基，优选1-20个、更优选1-15个、更优选1-10个、更优选1-3个、最优选1个氨基酸残基的取代、缺失或添加而形成的序列，且基本具有(a)所限定的多肽功能的由(a)衍生的多肽。Therefore, the polypeptide of the present invention may be non-Gln at the amino acid residue corresponding to the 222nd position of the amino acid sequence shown in SEQ ID NO:19 and/or at the amino acid corresponding to the 322nd position of the amino acid sequence shown in SEQ ID NO:19 On the basis of residues other than Ala, further mutations are carried out to still have the function and activity of the glycosyltransferase of the present invention. For example, the glycosyltransferase of the present invention (a) has an amino acid sequence as shown in SEQ ID NO: 21; or (b) comprises the sequence defined in (a) through one or more amino acid residues, preferably 1-20, More preferably 1-15, more preferably 1-10, more preferably 1-3, most preferably a sequence formed by substitution, deletion or addition of 1 amino acid residue, and basically has the polypeptide function defined in (a) A polypeptide derived from (a).

在本发明中，本发明的糖基转移酶包括与氨基酸序列如SEQ ID NO:21所示的糖基转移酶相比，有至多20个、较佳地至多10个，再佳地至多3个，更佳地至多2个，最佳地至多1个氨基酸被性质相似或相近的氨基酸所替换而形成的突变体。这些保守性变异的突变体可根据，例如下表所示进行氨基酸替换而产生。In the present invention, the glycosyltransferase of the present invention includes at most 20, preferably at most 10, and more preferably at most 3 compared with the amino acid sequence of the glycosyltransferase shown in SEQ ID NO: 21 , more preferably at most 2, most preferably at most 1 amino acid is replaced by an amino acid with similar or similar properties to form a mutant. Mutants of these conservative variations can be generated, for example, by making amino acid substitutions as shown in the table below.

初始残基initial residue 代表性的取代残基Representative Substituted Residues 优选的取代残基Preferred Substitution Residues Ala(A)Ala(A) Val；Leu；IleVal; Leu; Ile ValVal Arg(R)Arg(R) Lys；Gln；AsnLys; Gln; Asn LysLys Asn(N)Asn(N) Gln；His；Lys；ArgGln; His; Lys; Arg GlnGln Asp(D)Asp(D) GluGlu GluGlu Cys(C)Cys(C) SerSer SerSer Gln(Q)Gln(Q) AsnAsn AsnAsn Glu(E)Glu(E) AspAsp AspAsp Gly(G)Gly(G) Pro；AlaPro; AlaAla His(H)His(H) Asn；Gln；Lys；ArgAsn; Gln; Lys; Arg ArgArg Ile(I)Ile (I) Leu；Val；Met；Ala；PheLeu; Val; Met; Ala; Phe LeuLeu Leu(L)Leu(L) Ile；Val；Met；Ala；PheIle; Val; Met; Ala; Phe IleIle Lys(K)Lys(K) Arg；Gln；AsnArg; Gln; Asn ArgArg Met(M)Met(M) Leu；Phe；IleLeu; Phe; Ile LeuLeu Phe(F)Phe(F) Leu；Val；Ile；Ala；TyrLeu; Val; Ile; Ala; Tyr LeuLeu Pro(P)Pro(P) AlaAla AlaAla Ser(S)Ser(S) ThrThr ThrThr Thr(T)Thr(T) SerSer SerSer Trp(W)Trp(W) Tyr；PheTyr; Phe TyrTyr Tyr(Y)Tyr(Y) Trp；Phe；Thr；SerTrp; Phe; Thr; Ser PhePhe Val(V)Val(V) Ile；Leu；Met；Phe；AlaIle; Leu; Met; Phe; LeuLeu

本发明还提供了编码本发明多肽的多核苷酸。术语“编码多肽的多核苷酸”可以是包括编码此多肽的多核苷酸，也可以是还包括附加编码和/或非编码序列的多核苷酸。The present invention also provides polynucleotides encoding the polypeptides of the present invention. The term "polynucleotide encoding a polypeptide" may include a polynucleotide encoding the polypeptide, or may also include additional coding and/or non-coding sequences.

因此，本文所用的“含有”，“具有”或“包括”包括了“包含”、“主要由……构成”、“基本上由……构成”、和“由……构成”；“主要由……构成”、“基本上由……构成”和“由……构成”属于“含有”、“具有”或“包括”的下位概念。Therefore, as used herein, "comprising", "having" or "comprising" includes "comprising", "consisting essentially of", "consisting essentially of", and "consisting of"; "consisting essentially of Consists of ", "essentially composed of" and "consisting of" belong to the sub-concepts of "contain", "have" or "include".

对应于SEQ ID NO:19所示氨基酸序列的第222位/322位的氨基酸残基Corresponding to the amino acid residue at position 222/322 of the amino acid sequence shown in SEQ ID NO:19

本领域普通技术人员均知道，可在某个蛋白的氨基酸序列中对一些氨基酸残基作出各种突变，例如取代、添加或缺失，但得到的突变体仍能具备原蛋白的功能或活性。因此，本领域普通技术人员可对本发明具体公开的氨基酸序列作出一定改变而得到仍具有所需活性的突变体，那么这种突变体中与SEQ ID NO:19所示氨基酸序列的第222位/322位的氨基酸残基相对应的氨基酸残基可能就不是第222位/322位，但如此得到的突变体仍应落在本发明的保护范围内。Those of ordinary skill in the art know that various mutations, such as substitutions, additions or deletions, can be made to some amino acid residues in the amino acid sequence of a certain protein, but the obtained mutants can still have the function or activity of the original protein. Therefore, those of ordinary skill in the art can make certain changes to the amino acid sequence specifically disclosed in the present invention to obtain a mutant that still has the desired activity, then the 222nd/ The amino acid residue corresponding to the amino acid residue at position 322 may not be position 222/322, but the mutant thus obtained should still fall within the protection scope of the present invention.

本文所用的术语“对应于”具有本领域普通技术人员通常理解的意义。具体地说，“对应于”表示两条序列经同源性或序列相同性比对后，一条序列与另一条序列中的指定位置相对应的位置。因此，就“对应于SEQ ID NO:19所示氨基酸序列的第222位/322位的氨基酸残基”而言，如果在SEQ ID NO:19所示氨基酸序列的一端加上6-His标签，那么所得突变体中对应于SEQ ID NO:19所示氨基酸序列的第222位/322位就可能是第228位/328位；而如果删除SEQ ID NO:19所示氨基酸序列中的少数氨基酸残基，那么所得突变体中对应于SEQID NO:19所示氨基酸序列的第222位/322位就可能是第220位/320位，等等。再例如，如果一条具有400个氨基酸残基的序列与SEQ ID NO:19所示氨基酸序列的第20-420位具有较高的同源性或序列相同性，那么所得突变体中对应于SEQ ID NO:19所示氨基酸序列的第222位/322位就可能是第202位/302位。The term "corresponding to" used herein has a meaning commonly understood by those of ordinary skill in the art. Specifically, "corresponding to" means that after two sequences are aligned for homology or sequence identity, one sequence corresponds to the specified position in the other sequence. Therefore, in terms of "corresponding to the amino acid residue at position 222/322 of the amino acid sequence shown in SEQ ID NO: 19", if a 6-His tag is added to one end of the amino acid sequence shown in SEQ ID NO: 19, Then the 222nd/322th position corresponding to the amino acid sequence shown in SEQ ID NO:19 in the resulting mutant may be the 228th/328th position; and if a few amino acid residues in the amino acid sequence shown in SEQ ID NO:19 are deleted base, then the 222nd/322th position corresponding to the amino acid sequence shown in SEQID NO: 19 in the resulting mutant may be the 220th/320th position, and so on. For another example, if a sequence with 400 amino acid residues has higher homology or sequence identity with the 20th-420th amino acid sequence shown in SEQ ID NO: 19, then the resulting mutant corresponding to SEQ ID The 222nd/322nd position of the amino acid sequence shown in NO:19 may be the 202nd/302nd position.

在具体的实施方式中，所述同源性或序列相同性可以是80％以上，优选90％以上，更优选95％-98％，最优选99％以上。In a specific embodiment, the homology or sequence identity may be more than 80%, preferably more than 90%, more preferably 95%-98%, most preferably more than 99%.

本领域普通技术人员公知的测定序列同源性或相同性的方法包括但不限于：计算机分子生物学(Computational Molecular Biology)，Lesk，A.M.编，牛津大学出版社，纽约，1988；生物计算：信息学和基因组项目(Biocomputing:Informatics and GenomeProjects)，Smith，D.W.编，学术出版社，纽约，1993；序列数据的计算机分析(ComputerAnalysis of Sequence Data)，第一部分，Griffin，A.M.和Griffin，H.G.编，HumanaPress，新泽西，1994；分子生物学中的序列分析(Sequence Analysis in MolecularBiology)，von Heinje，G.，学术出版社，1987和序列分析引物(Sequence AnalysisPrimer)，Gribskov，M.与Devereux，J.编M Stockton Press，纽约，1991和Carillo，H.与Lipman，D.，SIAM J.Applied Math.，48:1073(1988)。测定相同性的优选方法要在测试的序列之间得到最大的匹配。测定相同性的方法编译在公众可获得的计算机程序中。优选的测定两条序列之间相同性的计算机程序方法包括但不限于：GCG程序包(Devereux，J.等，1984)、BLASTP、BLASTN和FASTA(Altschul，S，F.等，1990)。公众可从NCBI和其它来源得到BLASTX程序(BLAST手册，Altschul，S.等，NCBI NLM NIH Bethesda，Md.20894；Altschul，S.等，1990)。熟知的Smith Waterman算法也可用于测定相同性。Methods for determining sequence homology or identity known to those of ordinary skill in the art include, but are not limited to: Computational Molecular Biology, Lesk, A.M. Ed., Oxford University Press, New York, 1988; Biological Computing: Information Biocomputing: Informatics and Genome Projects, Smith, D.W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A.M. and Griffin, H.G., eds., HumanaPress , New Jersey, 1994; Sequence Analysis in Molecular Biology (Sequence Analysis in Molecular Biology), von Heinje, G., Academic Press, 1987 and Sequence Analysis Primer (Sequence Analysis Primer), Gribskov, M. and Devereux, J. Ed. M Stockton Press, New York, 1991 and Carillo, H. and Lipman, D., SIAM J. Applied Math., 48:1073 (1988). The preferred method of determining identity is to obtain the largest match between the sequences tested. Methods to determine identity are codified in publicly available computer programs. Preferred computer program methods for determining identity between two sequences include, but are not limited to, the GCG package (Devereux, J. et al., 1984), BLASTP, BLASTN, and FASTA (Altschul, S, F. et al., 1990). The BLASTX program is publicly available from NCBI and other sources (BLAST Handbook, Altschul, S. et al., NCBI NLM NIH Bethesda, Md. 20894; Altschul, S. et al., 1990). The well known Smith Waterman algorithm can also be used to determine identity.

如非特别说明，本文所说的人参皂苷和皂苷元，是的C20位S和/或R构型的人参皂苷和皂苷元。Unless otherwise specified, the ginsenosides and saponins mentioned herein refer to ginsenosides and saponins in the S and/or R configuration at the C20 position.

如本文所用，“分离的多肽”是指所述多肽基本上不含天然与其相关的其它蛋白、脂类、糖类或其它物质。本领域的技术人员能用标准的蛋白质纯化技术纯化所述多肽。基本上纯的多肽在非还原聚丙烯酰胺凝胶上能产生单一的主带。所述多肽的纯度还可以用氨基酸序列进行进一步分析。As used herein, "isolated polypeptide" means that the polypeptide is substantially free of other proteins, lipids, carbohydrates or other substances with which it is naturally associated. Those skilled in the art can purify the polypeptides using standard protein purification techniques. Substantially pure polypeptides yield a single major band on non-reducing polyacrylamide gels. The purity of the polypeptide can also be further analyzed by amino acid sequence.

本发明的活性多肽可以是重组多肽、天然多肽、合成多肽。本发明的多肽可以是天然纯化的产物，或是化学合成的产物，或使用重组技术从原核或真核宿主(例如，细菌、酵母、植物)中产生。根据重组生产方案所用的宿主，本发明的多肽可以是糖基化的，或可以是非糖基化的。本发明的多肽还可包括或不包括起始的甲硫氨酸残基。The active polypeptide of the present invention may be a recombinant polypeptide, a natural polypeptide, or a synthetic polypeptide. The polypeptides of the present invention may be purified from nature, or chemically synthesized, or produced using recombinant techniques from prokaryotic or eukaryotic hosts (eg, bacteria, yeast, plants). Depending on the host used in the recombinant production protocol, the polypeptides of the invention may be glycosylated, or may be non-glycosylated. Polypeptides of the invention may or may not include an initial methionine residue.

本发明还包括所述多肽的片段、衍生物和类似物。如本文所用，术语“片段”、“衍生物”和“类似物”是指基本上保持所述多肽相同的生物学功能或活性的多肽。The invention also includes fragments, derivatives and analogs of said polypeptides. As used herein, the terms "fragment", "derivative" and "analogue" refer to a polypeptide that substantially retains the same biological function or activity of the polypeptide.

本发明的多肽片段、衍生物或类似物可以是(i)有一个或多个保守或非保守性氨基酸残基(优选保守性氨基酸残基)被取代的多肽，而这样的取代的氨基酸残基可以是也可以不是由遗传密码编码的，或(ii)在一个或多个氨基酸残基中具有取代基团的多肽，或(iii)成熟多肽与另一个化合物(比如延长多肽半衰期的化合物，例如聚乙二醇)融合所形成的多肽，或(iv)附加的氨基酸序列融合到此多肽序列而形成的多肽(如前导序列或分泌序列或用来纯化此多肽的序列或蛋白原序列，或与抗原IgG片段的形成的融合蛋白)。根据本文的教导，这些片段、衍生物和类似物属于本领域熟练技术人员公知的范围。The polypeptide fragments, derivatives or analogs of the present invention may be (i) polypeptides having one or more conservative or non-conservative amino acid residues (preferably conservative amino acid residues) substituted, and such substituted amino acid residues It may or may not be encoded by the genetic code, or (ii) a polypeptide having a substituent group in one or more amino acid residues, or (iii) a mature polypeptide in combination with another compound (such as a compound that extends the half-life of the polypeptide, e.g. polyethylene glycol), or (iv) an additional amino acid sequence fused to the polypeptide sequence (such as a leader sequence or secretory sequence or a sequence or proprotein sequence used to purify the polypeptide, or with Formation of fusion proteins of antigen IgG fragments). Such fragments, derivatives and analogs are within the purview of those skilled in the art in light of the teachings herein.

在本发明的活性多肽具有糖基转移酶活性，并且能够催化以下一种或多种反应：The active polypeptide of the present invention has glycosyltransferase activity, and can catalyze one or more of the following reactions:

所述的多肽序列为SEQ ID NO.:4、SEQ ID NO.:21或其衍生多肽，该术语还包括具有与所示多肽具有相同功能的SEQ ID NO.:4、SEQ ID NO.:21序列的变异形式。这些变异形式包括(但并不限于)：一个或多个(通常为1-50个，较佳地1-30个，更佳地1-20个，最佳地1-10个)氨基酸的缺失、插入和/或取代，以及在C末端和/或N末端添加一个或数个(通常为20个以内，较佳地为10个以内，更佳地为5个以内)氨基酸。例如，在本领域中，用性能相近或相似的氨基酸进行取代时，通常不会改变蛋白质的功能。又比如，在C末端和/或N末端添加一个或数个氨基酸通常也不会改变蛋白质的功能。该术语还包括本发明蛋白的活性片段和活性衍生物。本发明还提供所述多肽的类似物。这些类似物与天然多肽的差别可以是氨基酸序列上的差异，也可以是不影响序列的修饰形式上的差异，或者兼而有之。这些多肽包括天然或诱导的遗传变异体。诱导变异体可以通过各种技术得到，如通过辐射或暴露于诱变剂而产生随机诱变，还可通过定点诱变法或其他已知分子生物学的技术。类似物还包括具有不同于天然L-氨基酸的残基(如D-氨基酸)的类似物，以及具有非天然存在的或合成的氨基酸(如β、γ-氨基酸)的类似物。应理解，本发明的多肽并不限于上述例举的代表性的多肽。The polypeptide sequence is SEQ ID NO.:4, SEQ ID NO.:21 or derivative polypeptides thereof, and the term also includes SEQ ID NO.:4, SEQ ID NO.:21 having the same function as the indicated polypeptide variants of the sequence. These variations include (but are not limited to): one or more (usually 1-50, preferably 1-30, more preferably 1-20, and most preferably 1-10) amino acid deletions , insertion and/or substitution, and addition of one or several (usually within 20, preferably within 10, more preferably within 5) amino acids at the C-terminal and/or N-terminal. For example, in the art, substitutions with amino acids with similar or similar properties generally do not change the function of the protein. As another example, adding one or several amino acids at the C-terminus and/or N-terminus usually does not change the function of the protein. The term also includes active fragments and active derivatives of the proteins of the invention. The invention also provides analogs of said polypeptides. The difference between these analogs and the natural polypeptide may be the difference in amino acid sequence, or the difference in the modified form that does not affect the sequence, or both. These polypeptides include natural or induced genetic variants. Induced variants can be obtained by various techniques, such as random mutagenesis by radiation or exposure to mutagens, but also by site-directed mutagenesis or other techniques known in molecular biology. Analogs also include analogs with residues other than natural L-amino acids (eg, D-amino acids), and analogs with non-naturally occurring or synthetic amino acids (eg, β, γ-amino acids). It should be understood that the polypeptides of the present invention are not limited to the representative polypeptides exemplified above.

修饰(通常不改变一级结构)形式包括：体内或体外的多肽的化学衍生形式如乙酰化或羧基化。修饰还包括糖基化，如那些在多肽的合成和加工中或进一步加工步骤中进行糖基化修饰而产生的多肽。这种修饰可以通过将多肽暴露于进行糖基化的酶(如哺乳动物的糖基化酶或去糖基化酶)而完成。修饰形式还包括具有磷酸化氨基酸残基(如磷酸酪氨酸，磷酸丝氨酸，磷酸苏氨酸)的序列。还包括被修饰从而提高了其抗蛋白酶水解性能或优化了溶解性能的多肽。Modified (usually without altering primary structure) forms include: chemically derivatized forms of polypeptides such as acetylation or carboxylation, in vivo or in vitro. Modifications also include glycosylation, such as those resulting from polypeptides that are modified by glycosylation during synthesis and processing of the polypeptide or during further processing steps. Such modification can be accomplished by exposing the polypeptide to an enzyme that performs glycosylation, such as a mammalian glycosylase or deglycosylation enzyme. Modified forms also include sequences with phosphorylated amino acid residues (eg, phosphotyrosine, phosphoserine, phosphothreonine). Also included are polypeptides that have been modified to increase their resistance to proteolysis or to optimize solubility.

本发明的Pn50、8E7多肽或其衍生多肽的氨基端或羧基端还可含有一个或多个多肽片段，作为蛋白标签。任何合适的标签都可以用于本发明。例如，所述的标签可以是FLAG、HA、HA1、c-Myc、Poly–His、Poly-Arg、Strep-TagII、AU1、EE、T7、4A6、ε、B、gE、以及Ty1。这些标签可用于对蛋白进行纯化。表1列出了其中的一些标签及其序列。The amino terminus or carboxyl terminus of the Pn50, 8E7 polypeptide or derivative polypeptides of the present invention may also contain one or more polypeptide fragments as protein tags. Any suitable label can be used in the present invention. For example, the tag can be FLAG, HA, HA1, c-Myc, Poly-His, Poly-Arg, Strep-TagII, AU1, EE, T7, 4A6, ε, B, gE, and Ty1. These tags can be used to purify proteins. Table 1 lists some of these tags and their sequences.

表1Table 1

标签Label 残基数number of residues 序列sequence Poly-ArgPoly-Arg 5-6个(通常5个)5-6 (usually 5) RRRRRRRRRR Poly-HisPoly-His 2-10个(通常6个)2-10 (usually 6) HHHHHHHHHHHH FLAGFLAG 8个8 DYKDDDDKDYKDDDDK Strep-TagIIStrep-TagII 8个8 WSHPQFEKWSHPQFEK C-mycC-myc 10个10 WQKLISEEDLWQKLISEEDL GSTGST 220个220 后面6个LVPRGS 6 LVPRGS behind

为了使翻译的蛋白分泌表达(如分泌到细胞外)，还可在所述Pn50、8E7多肽或其衍生多肽的氨基酸氨基末端添加上信号肽序列，如pelB信号肽等。信号肽在多肽从细胞内分泌出来的过程中可被切去。In order to secrete and express the translated protein (such as secreted outside the cell), a signal peptide sequence, such as pelB signal peptide, can also be added to the amino acid terminal of the Pn50, 8E7 polypeptide or its derivative polypeptide. The signal peptide can be cleaved during the secretion of the polypeptide from the cell.

本发明的多核苷酸可以是DNA形式或RNA形式。DNA形式包括cDNA、基因组DNA或人工合成的DNA。DNA可以是单链的或是双链的。DNA可以是编码链或非编码链。编码成熟多肽的编码区序列可以与SEQ ID NO.:3或SEQ ID NO.:22所示的编码区序列相同或者是简并的变异体。如本文所用，“简并的变异体”在本发明中是指编码具有SEQ ID NO.:4或SEQ IDNO.:21所示氨基酸序列的多肽、或其衍生多肽，但与SEQ ID NO.:4或SEQ ID NO.:21或其衍生多肽的编码序列，优选为SEQ ID NO.:3或SEQ ID NO.:22所示的序列有差别的核酸序列。A polynucleotide of the invention may be in the form of DNA or RNA. Forms of DNA include cDNA, genomic DNA or synthetic DNA. DNA can be single-stranded or double-stranded. DNA can be either the coding strand or the non-coding strand. The coding region sequence encoding the mature polypeptide may be the same as the coding region sequence shown in SEQ ID NO.:3 or SEQ ID NO.:22 or a degenerate variant. As used herein, "degenerate variant" in the present invention refers to encoding a polypeptide having the amino acid sequence shown in SEQ ID NO.: 4 or SEQ ID NO.: 21, or a derivative thereof, but with SEQ ID NO.: 4 or the coding sequence of SEQ ID NO.:21 or its derivative polypeptide, preferably a nucleic acid sequence that differs from the sequence shown in SEQ ID NO.:3 or SEQ ID NO.:22.

编码SEQ ID NO.:4或SEQ ID NO.:21所示多肽或其衍生多肽的成熟多肽的多核苷酸包括：只编码成熟多肽的编码序列；成熟多肽的编码序列和各种附加编码序列；成熟多肽的编码序列(和任选的附加编码序列)以及非编码序列。The polynucleotide encoding the mature polypeptide of the polypeptide shown in SEQ ID NO.:4 or SEQ ID NO.:21 or its derivative polypeptide includes: the coding sequence that only encodes the mature polypeptide; the coding sequence of the mature polypeptide and various additional coding sequences; The coding sequence (and optionally additional coding sequences) and non-coding sequences of the mature polypeptide.

术语“编码多肽的多核苷酸”可以是包括编码此多肽的多核苷酸，也可以是还包括附加编码和/或非编码序列的多核苷酸。The term "polynucleotide encoding a polypeptide" may include a polynucleotide encoding the polypeptide, or may also include additional coding and/or non-coding sequences.

本发明还涉及上述多核苷酸的变异体，其编码与本发明有相同的氨基酸序列的多肽或多肽的片段、类似物和衍生物。此多核苷酸的变异体可以是天然发生的等位变异体或非天然发生的变异体。这些核苷酸变异体包括取代变异体、缺失变异体和插入变异体。如本领域所知的，等位变异体是一个多核苷酸的替换形式，它可能是一个或多个核苷酸的取代、缺失或插入，但不会从实质上改变其编码的多肽的功能。The present invention also relates to variants of the above-mentioned polynucleotides, which encode polypeptides or polypeptide fragments, analogs and derivatives having the same amino acid sequence as the present invention. Variants of this polynucleotide may be naturally occurring allelic variants or non-naturally occurring variants. These nucleotide variants include substitution variants, deletion variants and insertion variants. As known in the art, an allelic variant is an alternative form of a polynucleotide which may be a substitution, deletion or insertion of one or more nucleotides without substantially altering the function of the polypeptide it encodes .

本发明还涉及与上述的序列杂交且两个序列之间具有至少50％，较佳地至少70％，更佳地至少80％相同性的多核苷酸。本发明特别涉及在严格条件(或严紧条件)下与本发明所述多核苷酸可杂交的多核苷酸。在本发明中，“严格条件”是指：(1)在较低离子强度和较高温度下的杂交和洗脱，如0.2×SSC，0.1％SDS，60℃；或(2)杂交时加有变性剂，如50％(v/v)甲酰胺，0.1％小牛血清/0.1％Ficoll，42℃等；或(3)仅在两条序列之间的相同性至少在90％以上，更好是95％以上时才发生杂交。并且，可杂交的多核苷酸编码的多肽与SEQ ID NO.:4或者SEQ ID NO.:21所示的成熟多肽有相同的生物学功能和活性。The present invention also relates to polynucleotides which hybridize to the above-mentioned sequences and which have at least 50%, preferably at least 70%, more preferably at least 80% identity between the two sequences. The present invention particularly relates to polynucleotides hybridizable under stringent conditions (or stringent conditions) to the polynucleotides of the present invention. In the present invention, "stringent conditions" refers to: (1) hybridization and elution at lower ionic strength and higher temperature, such as 0.2×SSC, 0.1% SDS, 60°C; or (2) hybridization with There are denaturing agents, such as 50% (v/v) formamide, 0.1% calf serum/0.1% Ficoll, 42°C, etc.; or (3) only if the identity between the two sequences is at least 90%, more Preferably, hybridization occurs above 95%. Moreover, the polypeptide encoded by the hybridizable polynucleotide has the same biological function and activity as the mature polypeptide shown in SEQ ID NO.:4 or SEQ ID NO.:21.

本发明还涉及与上述的序列杂交的核酸片段。如本文所用，“核酸片段”的长度至少含15个核苷酸，较好是至少30个核苷酸，更好是至少50个核苷酸，最好是至少100个核苷酸以上。核酸片段可用于核酸的扩增技术(如PCR)以确定和/或分离编码Pn50、8E7多肽或其衍生多肽的多聚核苷酸。The present invention also relates to nucleic acid fragments that hybridize to the above-mentioned sequences. As used herein, a "nucleic acid fragment" is at least 15 nucleotides in length, preferably at least 30 nucleotides in length, more preferably at least 50 nucleotides in length, most preferably at least 100 nucleotides in length. Nucleic acid fragments can be used in nucleic acid amplification techniques (such as PCR) to identify and/or isolate polynucleotides encoding Pn50, 8E7 polypeptides or derivatives thereof.

本发明中的多肽和多核苷酸优选以分离的形式提供，更佳地被纯化至均质。The polypeptides and polynucleotides of the invention are preferably provided in isolated form, more preferably purified to homogeneity.

本发明的Pn50、8E7多肽或其衍生多肽核苷酸全长序列或其片段通常可以用PCR扩增法、重组法或人工合成的方法获得。对于PCR扩增法，可根据本发明所公开的有关核苷酸序列，尤其是开放阅读框序列来设计引物，并用市售的cDNA库或按本领域技术人员已知的常规方法所制备的cDNA库作为模板，扩增而得有关序列。当序列较长时，常常需要进行两次或多次PCR扩增，然后再将各次扩增出的片段按正确次序拼接在一起。The Pn50, 8E7 polypeptide or its derivative polypeptide nucleotide full-length sequence or its fragments of the present invention can usually be obtained by PCR amplification, recombination or artificial synthesis. For the PCR amplification method, primers can be designed according to the relevant nucleotide sequences disclosed in the present invention, especially the open reading frame sequence, and the cDNA prepared by a commercially available cDNA library or a conventional method known to those skilled in the art can be used. The library is used as a template to amplify related sequences. When the sequence is long, it is often necessary to carry out two or more PCR amplifications, and then splice together the amplified fragments in the correct order.

一旦获得了有关的序列，就可以用重组法来大批量地获得有关序列。这通常是将其克隆入载体，再转入细胞，然后通过常规方法从增殖后的宿主细胞中分离得到有关序列。Once the relevant sequences are obtained, recombinant methods can be used to obtain the relevant sequences in large quantities. Usually, it is cloned into a vector, then transformed into a cell, and then the relevant sequence is isolated from the proliferated host cell by conventional methods.

此外，还可用人工合成的方法来合成有关序列，尤其是片段长度较短时。通常，通过先合成多个小片段，然后再进行连接可获得序列很长的片段。In addition, related sequences can also be synthesized by artificial synthesis, especially when the fragment length is relatively short. Often, fragments with very long sequences are obtained by synthesizing multiple small fragments and then ligating them.

目前，已经可以完全通过化学合成来得到编码本发明蛋白(或其片段，或其衍生物)的DNA序列。然后可将该DNA序列引入本领域中已知的各种现有的DNA分子(或如载体)和细胞中。此外，还可通过化学合成将突变引入本发明蛋白序列中。At present, the DNA sequence encoding the protein of the present invention (or its fragment, or its derivative) can be obtained completely through chemical synthesis. This DNA sequence can then be introduced into various existing DNA molecules (or eg vectors) and cells known in the art. In addition, mutations can also be introduced into the protein sequences of the invention by chemical synthesis.

应用PCR技术扩增DNA/RNA的方法被优选用于获得本发明的基因。特别是很难从文库中得到全长的cDNA时，可优选使用RACE法(RACE-cDNA末端快速扩增法)，用于PCR的引物可根据本文所公开的本发明的序列信息适当地选择，并可用常规方法合成。可用常规方法如通过凝胶电泳分离和纯化扩增的DNA/RNA片段。The method of amplifying DNA/RNA using PCR technique is preferably used to obtain the gene of the present invention. Especially when it is difficult to obtain full-length cDNA from the library, the RACE method (RACE-cDNA terminal rapid amplification method) can be preferably used, and the primers used for PCR can be appropriately selected according to the sequence information of the present invention disclosed herein, And can be synthesized by conventional methods. Amplified DNA/RNA fragments can be separated and purified by conventional methods such as by gel electrophoresis.

本发明也涉及包含本发明的多核苷酸的载体，以及用本发明的载体或Pn50、8E7多肽或其衍生多肽的编码序列经基因工程产生的宿主细胞，以及经重组技术产生本发明所述多肽的方法。The present invention also relates to a vector comprising the polynucleotide of the present invention, and a host cell produced by genetic engineering using the vector of the present invention or the coding sequence of the Pn50, 8E7 polypeptide or its derivative polypeptide, and the production of the polypeptide of the present invention by recombinant techniques Methods.

通过常规的重组DNA技术，可利用本发明的多聚核苷酸序列可用来表达或生产重组的Pn50、8E7多肽或其衍生多肽。一般来说有以下步骤：By conventional recombinant DNA technology, the polynucleotide sequence of the present invention can be used to express or produce recombinant Pn50, 8E7 polypeptide or derivative polypeptide. Generally speaking, there are the following steps:

(1)用本发明的编码Pn50、8E7多肽或其衍生多肽的多核苷酸(或变异体)，或用含有该多核苷酸的重组表达载体转化或转导合适的宿主细胞；(1) Transform or transduce a suitable host cell with the polynucleotide (or variant) encoding the Pn50, 8E7 polypeptide or its derivative polypeptide of the present invention, or with a recombinant expression vector containing the polynucleotide;

(2)在合适的培养基中培养的宿主细胞；(2) host cells cultured in a suitable medium;

(3)从培养基或细胞中分离、纯化蛋白质。(3) Separation and purification of protein from culture medium or cells.

本发明中，编码Pn50、8E7多肽或其衍生多肽的多核苷酸序列可插入到重组表达载体中。术语“重组表达载体”指本领域熟知的细菌质粒、噬菌体、酵母质粒、植物细胞病毒、哺乳动物细胞病毒如腺病毒、逆转录病毒或其他载体。只要能在宿主体内复制和稳定，任何质粒和载体都可以用。表达载体的一个重要特征是通常含有复制起点、启动子、标记基因和翻译控制元件。In the present invention, polynucleotide sequences encoding Pn50, 8E7 polypeptides or derivative polypeptides can be inserted into recombinant expression vectors. The term "recombinant expression vector" refers to bacterial plasmid, phage, yeast plasmid, plant cell virus, mammalian cell virus such as adenovirus, retrovirus or other vectors well known in the art. Any plasmid and vector can be used as long as it can be replicated and stabilized in the host. An important feature of expression vectors is that they usually contain an origin of replication, a promoter, marker genes, and translational control elements.

本领域的技术人员熟知的方法能用于构建含Pn50、8E7多肽或其衍生多肽的编码DNA序列和合适的转录/翻译控制信号的表达载体。这些方法包括体外重组DNA技术、DNA合成技术、体内重组技术等。所述的DNA序列可有效连接到表达载体中的适当启动子上，以指导mRNA合成。这些启动子的代表性例子有：大肠杆菌的lac或trp启动子；λ噬菌体PL启动子；真核启动子包括CMV立即早期启动子、HSV胸苷激酶启动子、早期和晚期SV40启动子、反转录病毒的LTRs和其他一些已知的可控制基因在原核或真核细胞或其病毒中表达的启动子。表达载体还包括翻译起始用的核糖体结合位点和转录终止子。Methods well known to those skilled in the art can be used to construct expression vectors containing the coding DNA sequences of Pn50, 8E7 polypeptides or derivative polypeptides and appropriate transcription/translation control signals. These methods include in vitro recombinant DNA technology, DNA synthesis technology, in vivo recombination technology and the like. Said DNA sequence can be operably linked to an appropriate promoter in the expression vector to direct mRNA synthesis. Representative examples of these promoters are: E. coli lac or trp promoter; lambda phage PL promoter; eukaryotic promoters include CMV immediate early promoter, HSV thymidine kinase promoter, early and late SV40 promoter, reverse LTRs of transcription viruses and other promoters known to control the expression of genes in prokaryotic or eukaryotic cells or their viruses. The expression vector also includes a ribosome binding site for translation initiation and a transcription terminator.

此外，表达载体优选地包含一个或多个选择性标记基因，以提供用于选择转化的宿主细胞的表型性状，如真核细胞培养用的二氢叶酸还原酶、新霉素抗性以及绿色荧光蛋白(GFP)，或用于大肠杆菌的四环素或氨苄青霉素抗性。In addition, the expression vector preferably contains one or more selectable marker genes to provide phenotypic traits for selection of transformed host cells, such as dihydrofolate reductase for eukaryotic cell culture, neomycin resistance, and green Fluorescent protein (GFP), or tetracycline or ampicillin resistance for E. coli.

包含上述的适当DNA序列以及适当启动子或者控制序列的载体，可以用于转化适当的宿主细胞，以使其能够表达蛋白质。Vectors containing the above-mentioned appropriate DNA sequences and appropriate promoters or control sequences can be used to transform appropriate host cells so that they can express proteins.

宿主细胞可以是原核细胞，如细菌细胞；或是低等真核细胞，如酵母细胞；或是高等真核细胞，如哺乳动物细胞。代表性例子有：大肠杆菌，链霉菌属；鼠伤寒沙门氏菌的细菌细胞；真菌细胞如酵母；植物细胞；果蝇S2或Sf9的昆虫细胞；CHO、COS、293细胞、或Bowes黑素瘤细胞的动物细胞等。The host cell may be a prokaryotic cell, such as a bacterial cell; or a lower eukaryotic cell, such as a yeast cell; or a higher eukaryotic cell, such as a mammalian cell. Representative examples are: Escherichia coli, Streptomyces spp; bacterial cells of Salmonella typhimurium; fungal cells such as yeast; plant cells; insect cells of Drosophila S2 or Sf9; CHO, COS, 293 cells, or Bowes melanoma cells animal cells, etc.

本发明的多核苷酸在高等真核细胞中表达时，如果在载体中插入增强子序列时将会使转录得到增强。增强子是DNA的顺式作用因子，通常大约有10到300个碱基对，作用于启动子以增强基因的转录。可举的例子包括在复制起始点晚期一侧的100到270个碱基对的SV40增强子、在复制起始点晚期一侧的多瘤增强子以及腺病毒增强子等。When the polynucleotide of the present invention is expressed in higher eukaryotic cells, if an enhancer sequence is inserted into the vector, the transcription will be enhanced. Enhancers are cis-acting elements of DNA, usually about 10 to 300 base pairs in length, that act on promoters to enhance gene transcription. Examples include the SV40 enhancer of 100 to 270 base pairs on the late side of the replication origin, the polyoma enhancer on the late side of the replication origin, and the adenovirus enhancer.

本领域一般技术人员都清楚如何选择适当的载体、启动子、增强子和宿主细胞。Those of ordinary skill in the art will know how to select appropriate vectors, promoters, enhancers and host cells.

用重组DNA转化宿主细胞可用本领域技术人员熟知的常规技术进行。当宿主为原核生物如大肠杆菌时，能吸收DNA的感受态细胞可在指数生长期后收获，用CaCl₂法处理，所用的步骤在本领域众所周知。另一种方法是使用MgCl₂。如果需要，转化也可用电穿孔的方法进行。当宿主是真核生物，可选用如下的DNA转染方法：磷酸钙共沉淀法，常规机械方法如显微注射、电穿孔、脂质体包装等。Transformation of host cells with recombinant DNA can be performed using conventional techniques well known to those skilled in the art. When the host is a prokaryotic organism such as E. coli, competent cells capable of taking up DNA can be harvested after the exponential growth phase and treated with the _CaCl2 method using procedures well known in the art. Another method is to use _MgCl2 . Transformation can also be performed by electroporation, if desired. When the host is eukaryotic, the following DNA transfection methods can be used: calcium phosphate co-precipitation method, conventional mechanical methods such as microinjection, electroporation, liposome packaging, etc.

获得的转化子可以用常规方法培养，表达本发明的基因所编码的多肽。根据所用的宿主细胞，培养中所用的培养基可选自各种常规培养基。在适于宿主细胞生长的条件下进行培养。当宿主细胞生长到适当的细胞密度后，用合适的方法(如温度转换或化学诱导)诱导选择的启动子，将细胞再培养一段时间。The obtained transformant can be cultured by conventional methods to express the polypeptide encoded by the gene of the present invention. The medium used in the culture can be selected from various conventional media according to the host cells used. The culture is carried out under conditions suitable for the growth of the host cells. After the host cells have grown to an appropriate cell density, the selected promoter is induced by an appropriate method (such as temperature shift or chemical induction), and the cells are cultured for an additional period of time.

在上面的方法中的重组多肽可在细胞内、或在细胞膜上表达、或分泌到细胞外。如果需要，可利用其物理的、化学的和其它特性通过各种分离方法分离和纯化重组的蛋白。这些方法是本领域技术人员所熟知的。这些方法的例子包括但并不限于：常规的复性处理、用蛋白沉淀剂处理(盐析方法)、离心、渗透破菌、超处理、超离心、分子筛层析(凝胶过滤)、吸附层析、离子交换层析、高效液相层析(HPLC)和其它各种液相层析技术及这些方法的结合。The recombinant polypeptide in the above method can be expressed inside the cell, or on the cell membrane, or secreted outside the cell. The recombinant protein can be isolated and purified by various separation methods by taking advantage of its physical, chemical and other properties, if desired. These methods are well known to those skilled in the art. Examples of these methods include, but are not limited to: conventional refolding treatment, treatment with protein precipitating agents (salting out method), centrifugation, osmotic disruption, supertreatment, ultracentrifugation, molecular sieve chromatography (gel filtration), adsorption layer Analysis, ion exchange chromatography, high performance liquid chromatography (HPLC) and various other liquid chromatography techniques and combinations of these methods.

应用application

本发明涉及的活性多肽或糖基转移酶Pn50、8E7多肽或其衍生多肽的用途包括(但不限于)：特异和高效地将来自糖基供体的糖基转移到四环三萜类化合物的C-3位羟基上。特别是能够将式(I)化合物转化为所述式(II)化合物，例如将原人参二醇PPD转化为抗肿瘤活性更优良的稀有人参皂苷Rh2；将Compound K转化为人参皂苷F2。The use of the active polypeptide or glycosyltransferase Pn50, 8E7 polypeptide or its derivative polypeptide involved in the present invention includes (but not limited to): specifically and efficiently transferring the glycosyl from the glycosyl donor to the tetracyclic triterpenoid On the C-3 hydroxyl group. In particular, the compound of formula (I) can be converted into the compound of formula (II), for example, protopanaxadiol PPD can be converted into rare ginsenoside Rh2 with better antitumor activity; Compound K can be converted into ginsenoside F2.

所述的四环三萜化合物包括(但不限于)：S构型或R构型的达玛烷型、羊毛脂烷型、甘遂烷型、环阿屯烷(环阿尔廷烷)型、apotirucallane型、葫芦烷、楝烷型等四环三萜类化合物。The tetracyclic triterpene compounds include (but are not limited to): S-configuration or R-configuration dammarane type, lanolin type, cuminane type, cycloartane (cycloartean) type, Apotirucallane type, cucurbitane, neem type and other tetracyclic triterpenoids.

本发明提供了一种工业催化方法，包括：在提供糖基供体的条件下，用本发明的Pn50、8E7多肽或其衍生多肽，获得人参皂苷Rh2和人参皂苷F2。具体是，所述的(A)反应中所用的多肽选自SEQ ID NO.:4或SEQ ID NO.:21所示多肽或其衍生多肽；所述(B)反应中所用的多肽为SEQ ID NO.:4所示多肽或其衍生多肽。The invention provides an industrial catalysis method, comprising: obtaining ginsenoside Rh2 and ginsenoside F2 by using the Pn50, 8E7 polypeptide or derivative polypeptide of the invention under the condition of providing glycosyl donors. Specifically, the polypeptide used in the (A) reaction is selected from the polypeptides shown in SEQ ID NO.:4 or SEQ ID NO.:21 or derivative polypeptides thereof; the polypeptide used in the (B) reaction is SEQ ID NO.: The polypeptide shown in 4 or its derivative polypeptide.

在本发明的一个优选例中，提供了一种利用前述三七的糖基转移酶Pn50和糖基转移酶突变体蛋白8E7在酿酒酵母中合成人参皂苷Rh2的方法。In a preferred example of the present invention, a method for synthesizing ginsenoside Rh2 in Saccharomyces cerevisiae is provided by using the glycosyltransferase Pn50 of Panax notoginseng and the glycosyltransferase mutant protein 8E7.

所述的糖基供体是核苷二磷酸糖，选自下组：UDP-葡萄糖，ADP-葡萄糖，TDP-葡萄糖，CDP-葡萄糖，GDP-葡萄糖，UDP-乙酰基葡萄糖，ADP-乙酰基葡萄糖，TDP-乙酰基葡萄糖，CDP-乙酰基葡萄糖，GDP-乙酰基葡萄糖，UDP-木糖，ADP-木糖，TDP-木糖，CDP-木糖，UDP-木糖，GDP-木糖，UDP-半乳糖醛酸，ADP-半乳糖醛酸，TDP-半乳糖醛酸，CDP-半乳糖醛酸，GDP-半乳糖醛酸，UDP-半乳糖，ADP-半乳糖，TDP-半乳糖，CDP-半乳糖，GDP-半乳糖，UDP-阿拉伯糖，ADP-阿拉伯糖，TDP-阿拉伯糖，CDP-阿拉伯糖，GDP-阿拉伯糖，UDP-鼠李糖，ADP-鼠李糖，TDP-鼠李糖，CDP-鼠李糖，GDP-鼠李糖，或其他核苷二磷酸己糖或核苷二磷酸戊糖，或其组合。The glycosyl donor is a nucleoside diphosphate sugar selected from the group consisting of: UDP-glucose, ADP-glucose, TDP-glucose, CDP-glucose, GDP-glucose, UDP-acetylglucose, ADP-acetylglucose , TDP-Acetyl Glucose, CDP-Acetyl Glucose, GDP-Acetyl Glucose, UDP-Xylose, ADP-Xylose, TDP-Xylose, CDP-Xylose, UDP-Xylose, GDP-Xylose, UDP -Galacturonic acid, ADP-galacturonic acid, TDP-galacturonic acid, CDP-galacturonic acid, GDP-galacturonic acid, UDP-galactose, ADP-galactose, TDP-galactose, CDP -Galactose, GDP-Galactose, UDP-Arabinose, ADP-Arabinose, TDP-Arabinose, CDP-Arabinose, GDP-Arabinose, UDP-Rhamnose, ADP-Rhamnose, TDP-Rhamnose Sugar, CDP-rhamnose, GDP-rhamnose, or other hexose nucleoside diphosphates or pentose nucleoside diphosphates, or combinations thereof.

所述的糖基供体优选是尿苷二磷酸糖，选自下组：UDP-葡萄糖，UDP-木糖，UDP-鼠李糖，UDP-半乳糖醛酸，UDP-半乳糖，UDP-阿拉伯糖，或其他尿苷二磷酸己糖或尿苷二磷酸戊糖，或其组合。The glycosyl donor is preferably uridine diphosphate sugar, selected from the group consisting of: UDP-glucose, UDP-xylose, UDP-rhamnose, UDP-galacturonic acid, UDP-galactose, UDP-arabinose sugar, or other hexose uridine diphosphate or pentose uridine diphosphate, or a combination thereof.

在所述方法中，还可以添加酶活性添加物(提高酶活性或抑制酶活性的添加物)。所述酶活性的添加物可以选自下组：Ca²⁺、Co²⁺、Mn²⁺、Ba²⁺、Al³⁺、Ni²⁺、Zn²⁺、或Fe²⁺；或为可以生成Ca²⁺、Co²⁺、Mn²⁺、Ba²⁺、Al³⁺、Ni²⁺、Zn²⁺、或Fe²⁺的物质。In the method, an enzyme activity additive (an enzyme activity-enhancing or enzyme-inhibiting additive) may also be added. The additive of the enzyme activity can be selected from the following group: Ca ²⁺ , Co ²⁺ , Mn ²⁺ , Ba ²⁺ , Al ³⁺ , Ni ²⁺ , Zn ²⁺ , or Fe ²⁺ ; or can generate Substances of Ca ²⁺ , Co ²⁺ , Mn ²⁺ , Ba ²⁺ , Al ³⁺ , Ni ²⁺ , Zn ²⁺ , or Fe ²⁺ .

所述方法的pH条件为：pH4.0-10.0，优选pH6.0-pH8.5，更优选8.5。The pH condition of the method is: pH4.0-10.0, preferably pH6.0-pH8.5, more preferably 8.5.

所述方法的温度条件为：10℃-105℃，优选25℃-35℃，更优选35℃。The temperature condition of the method is: 10°C-105°C, preferably 25°C-35°C, more preferably 35°C.

本发明还提供了一种组合物，它含有有效量的本发明的活性多肽或糖基转移酶Pn50、8E7多肽或其衍生多肽，以及食品学上或工业上可接受的载体或赋形剂。这类载体包括(但并不限于)：水、缓冲液、葡萄糖、水、甘油、乙醇、及其组合。The present invention also provides a composition, which contains an effective amount of the active polypeptide or glycosyltransferase Pn50, 8E7 polypeptide or derivative polypeptides of the present invention, and a food-acceptable or industrially acceptable carrier or excipient. Such carriers include, but are not limited to, water, buffers, dextrose, water, glycerol, ethanol, and combinations thereof.

所述的组合物中还可添加调节本发明糖基转移酶活性的物质。任何具有提高酶活性功能的物质均是可用的。较佳地，所述的提高本发明的糖基转移酶活性的物质选自巯基乙醇。此外，很多物质可以降低酶活性，包括但不限于：Ca²⁺、Co²⁺、Mn²⁺、Ba²⁺、Al³⁺、Ni²⁺、Zn²⁺和Fe²⁺；或在添加至底物后可水解形成Ca²⁺、Co²⁺、Mn²⁺、Ba²⁺、Al³⁺、Ni²⁺、Zn²⁺和Fe²⁺的物质。Substances that regulate the activity of the glycosyltransferase of the present invention can also be added to the composition. Any substance having a function of increasing enzyme activity is usable. Preferably, the substance improving the activity of the glycosyltransferase of the present invention is selected from mercaptoethanol. In addition, many substances can reduce enzyme activity, including but not limited to: Ca ²⁺ , Co ²⁺ , Mn ²⁺ , Ba ²⁺ , Al ³⁺ , Ni ²⁺ , Zn ²⁺ , and Fe ²⁺ ; or when added to Substrates can be hydrolyzed to form Ca ²⁺ , Co ²⁺ , Mn ²⁺ , Ba ²⁺ , Al ³⁺ , Ni ²⁺ , Zn ²⁺ and Fe ²⁺ substances.

在获得了本发明的Pn50、8E7多肽或其衍生多肽后，本领域人员可以方便地应用该酶来发挥转糖基的作用，特别是对达玛稀二醇、原人参二醇的转糖基作用。作为本发明的优选方式，还提供了二种形成稀有人参皂苷的方法，该方法之一包含：用本发明所述的Pn50、8E7多肽或其衍生多肽处理待转糖基的底物，所述的底物包括达玛稀二醇、原人参二醇及其衍生物等四环三萜类化合物。较佳地，在pH3.5-10条件下，用所述的Pn50、8E7多肽或其衍生多肽酶处理待转糖基的底物。较佳地，在温度30-105℃条件下，用所述的Pn50、8E7多肽或其衍生多肽酶处理待转糖基的底物。After obtaining the Pn50, 8E7 polypeptide or its derivative polypeptide of the present invention, those skilled in the art can conveniently use the enzyme to play the role of transglycosylation, especially the transglycosylation of dammarindiol and protopanaxadiol effect. As a preferred mode of the present invention, two methods for forming rare ginsenosides are also provided, one of which includes: treating the substrate to be transglycosylated with the Pn50, 8E7 polypeptide or derivative polypeptide described in the present invention, said The substrates include tetracyclic triterpenoids such as dammarene diol, protopanaxadiol and its derivatives. Preferably, under the condition of pH 3.5-10, the substrate to be transglycosylated is treated with the Pn50, 8E7 polypeptide or its derivative peptidase. Preferably, the substrate to be transglycosylated is treated with the Pn50, 8E7 polypeptide or its derivative peptidase at a temperature of 30-105°C.

该方法之二包含：将本发明所述的Pn50、8E7多肽或其衍生多肽基因转入可以合成原人参二醇PPD的工程菌(例如，酵母或大肠杆菌工程菌)中，或者，将Pn50、8E7多肽或其衍生多肽基因与达玛稀二醇、原人参二醇PPD合成代谢途径中的关键基因和任选地其他糖基转移酶基因于宿主细胞(例如酵母细胞或大肠杆菌)中共表达，获得直接生产稀有人参皂苷Rh2和/或人参皂苷F2的重组菌。或者，将Pn50、8E7多肽或其衍生多肽的编码核苷酸序列与达玛烯二醇和/或原人参二醇PPD合成代谢途径中的关键酶和任选地其他糖基转移酶以及合成UDP-鼠李糖的关键酶在宿主细胞中共表达，应用于构建人工合成稀有人参皂苷Rh2和人参皂苷F2的重组菌株。The second method comprises: transferring the Pn50, 8E7 polypeptide or its derivative polypeptide gene according to the present invention into engineering bacteria capable of synthesizing protopanaxadiol PPD (for example, yeast or Escherichia coli engineering bacteria), or, Pn50, The 8E7 polypeptide or its derivative polypeptide gene is co-expressed in host cells (such as yeast cells or Escherichia coli) with key genes in the synthetic metabolic pathway of dammarene diol, protopanaxadiol PPD and optionally other glycosyltransferase genes, Obtain recombinant bacteria that directly produce rare ginsenoside Rh2 and/or ginsenoside F2. Alternatively, the key enzymes and optionally other glycosyltransferases and synthetic UDP- The key enzyme of rhamnose is co-expressed in host cells, and it is applied to construct recombinant strains for artificially synthesizing rare ginsenoside Rh2 and ginsenoside F2.

所述的达玛稀二醇合成代谢途径中的关键基因包括(但不限于)：达玛烯二醇合成酶基因。The key genes in the synthetic metabolic pathway of dammarenediol include (but not limited to): dammarenediol synthase gene.

在另一优选例中，所述的原人参二醇合成代谢途径中的关键基因包括(但不限于)：达玛烯二醇合成酶基因PgDDS、原人参二醇合成的细胞色素P450基因CYP716A47及其还原酶基因，或其组合。或者以上各种酶的同功酶及其组合。其中，达玛烯二醇合成酶将环氧角鲨烯(酿酒酵母自身合成)转化为达玛烯二醇，细胞色素P450CYP716A47及其还原酶再将达玛烯二醇转化为原人参二醇PPD。(Han et.al，plant&cell physiology，2011，52.2062-73)In another preferred example, the key genes in the synthetic metabolic pathway of protopanaxadiol include (but not limited to): dammarenediol synthase gene PgDDS, cytochrome P450 gene CYP716A47 synthesized by protopanaxadiol and Its reductase gene, or a combination thereof. Or isoenzymes and combinations thereof of the above various enzymes. Among them, dammarenediol synthase converts epoxysqualene (synthesized by Saccharomyces cerevisiae) into dammarenediol, and cytochrome P450CYP716A47 and its reductase convert dammarenediol into protopanaxadiol PPD . (Han et.al, plant&cell physiology, 2011, 52.2062-73)

本发明的主要优点：Main advantage of the present invention:

(1)本发明利用酿酒酵母生产人参皂苷Rh2的方法相比于传统的依赖于人参属植物提取和糖基水解的方法具有成本低、周期短、质量稳定等优点；(1) The method of the present invention utilizing Saccharomyces cerevisiae to produce ginsenoside Rh2 has the advantages of low cost, short period and stable quality compared to the traditional method relying on the extraction of Panax genus plants and the hydrolysis of glycosyl groups;

(2)本发明首次从三七中获得的糖基转移酶Pn50可以催化PPD和Rh2，催化CK合成F2，并且将其导入产PPD菌株中相比于人参中野生型糖基转移酶UGTPg45可以更高效的合成稀有人参皂苷Rh2；(2) The glycosyltransferase Pn50 obtained from Panax notoginseng for the first time in the present invention can catalyze PPD and Rh2, and catalyze the synthesis of F2 from CK. Efficient synthesis of rare ginsenoside Rh2;

(3)本发明通过对人参中野生型糖基转移酶UGTPg45随机突变获得突变体基因8E7，将其导入产PPD菌株中相比于人参中野生型糖基转移酶UGTPg45可以显著提高稀有人参皂苷Rh2合成效率。(3) The present invention obtains the mutant gene 8E7 by random mutation of the wild-type glycosyltransferase UGTPg45 in ginseng, and introducing it into the PPD-producing strain can significantly increase the rare ginsenoside Rh2 compared with the wild-type glycosyltransferase UGTPg45 in ginseng. synthetic efficiency.

下面结合具体实施例，进一步阐述本发明。应理解，这些实施例仅用于说明本发明而不用于限制本发明的范围。下列实施例中未注明具体条件的实验方法，通常按照常规条件，例如Sambrook等人，分子克隆：实验室手册(New York:Cold Spring HarborLaboratory Press,1989)中所述的条件，或按照制造厂商所建议的条件。除非另外说明，否则百分比和份数按重量计算。Below in conjunction with specific embodiment, further illustrate the present invention. It should be understood that these examples are only used to illustrate the present invention and are not intended to limit the scope of the present invention. The experimental method that does not indicate specific condition in the following examples, usually according to conventional conditions, such as Sambrook et al., molecular cloning: the conditions described in the laboratory manual (New York: Cold Spring Harbor Laboratory Press, 1989), or according to the manufacturer suggested conditions. Percentages and parts are by weight unless otherwise indicated.

通过以下具体实施方法，可以进一步理解本发明的具体实施过程。Through the following specific implementation methods, the specific implementation process of the present invention can be further understood.

实施例1.三七糖基转移酶基因Pn50的克隆Example 1. Cloning of the notoginseng glycosyltransferase gene Pn50

合成如序列Pn50克隆引物F，SEQ ID NO:1(ATGGAGAGAGAAATGTTGAGCA)及Pn50克隆引物R，SEQ ID NO:2(TCAGGAGGAAACAAGCTTTGAA)的两条引物。以从三七中提取的RNA反转录获得的cDNA为模板，利用如上引物进行PCR。DNA聚合酶选用宝生物工程有限公司的高保真的KOD DNA聚合酶。PCR扩增程序为：94℃2min；94℃15s，58℃30s，68℃2min，共35个循环；68℃10min降至10℃。PCR产物经琼脂糖凝胶电泳检测，结果见图1。Two primers such as Pn50 cloning primer F, SEQ ID NO: 1 (ATGGAGAGAGAAATGTTGAGCA) and Pn50 cloning primer R, SEQ ID NO: 2 (TCAGGAGGAAACAAGCTTTGAA) were synthesized. Using the cDNA obtained by reverse transcription of the RNA extracted from Panax notoginseng as a template, PCR was performed using the above primers. The DNA polymerase was selected from the high-fidelity KOD DNA polymerase of Treasure Bioengineering Co., Ltd. The PCR amplification program was as follows: 94°C for 2min; 94°C for 15s, 58°C for 30s, 68°C for 2min, a total of 35 cycles; 68°C for 10min to 10°C. The PCR products were detected by agarose gel electrophoresis, and the results are shown in Figure 1.

在紫外下照射，切下目标DNA条带。然后采用Axygen Gel Extraction Kit(AEYGEN公司)从琼脂糖凝胶中回收DNA即为扩增出的糖基转移酶基因的DNA片段。利用宝生物工程(大连)有限公司(Takara)的PMD18-T克隆试剂盒，将回收的PCR产物克隆到PMDT载体，所构建的载体命名为PMDT-Pn50。经测序获得Pn50的基因序列。Under UV irradiation, the target DNA band is excised. Then, Axygen Gel Extraction Kit (AEYGEN Company) was used to recover DNA from the agarose gel, which was the DNA fragment of the amplified glycosyltransferase gene. The recovered PCR product was cloned into a PMDT vector using the PMD18-T cloning kit of Takara Bioengineering (Dalian) Co., Ltd. (Takara), and the constructed vector was named PMDT-Pn50. The gene sequence of Pn50 was obtained by sequencing.

Pn50基因具有SEQ ID NO:3的核苷酸序列。自SEQ ID NO:3的5’端第1-1368位核苷酸为Pn50的开放阅读框(Open Reading Frame,ORF)，自SEQ ID NO:3的5’端的第1-3位核苷酸为Pn50基因的起始密码子ATG，自SEQ ID NO:3的5’端的第1366-1368位核苷酸为Pn50基因的终止密码子TGA。糖基转移酶Pn50基因编码一个含有455个氨基酸的蛋白质Pn50，具有SEQ ID NO:4的氨基酸残基序列，用软件预测到该蛋白质的理论分子量大小为51.1kDa，等电点pI为5.10。自SEQ ID NO:4的氨基端的第332-375位氨基酸为糖基转移酶PSPG保守功能域。The Pn50 gene has the nucleotide sequence of SEQ ID NO:3. From the 1st-1368th nucleotide of the 5' end of SEQ ID NO:3 is the open reading frame (Open Reading Frame, ORF) of Pn50, from the 1st-3rd nucleotide of the 5' end of SEQ ID NO:3 It is the start codon ATG of the Pn50 gene, and the 1366-1368 nucleotides from the 5' end of SEQ ID NO:3 are the stop codon TGA of the Pn50 gene. The glycosyltransferase Pn50 gene encodes a protein Pn50 containing 455 amino acids, which has the amino acid residue sequence of SEQ ID NO: 4. The theoretical molecular weight of the protein is predicted to be 51.1 kDa by software, and the isoelectric point pI is 5.10. The 332-375th amino acid from the amino terminal of SEQ ID NO:4 is the conserved functional domain of glycosyltransferase PSPG.

Pn50核苷酸序列SEQ ID NO:3Pn50 nucleotide sequence SEQ ID NO:3

ATGGAGAGAGAAATGTTGAGCAAAACTCACATTATGTTCATCCCATTCCCAGCTCAAGGCCACATGAGCCCAATGATGCAATTCGTCAAGCGTTTAGCCTGGAAAGGCGTGCGAATCACGATAGTTCTTCCGGCTGAGATTCGAGATTCTATGCAAATAAACAACTCATTGATCAACACTGAGTGCATCTCCTTTGATTTTGATAAAGATGATGAGATGCCATACAGCATGCGGGCTTATATGGGAGTTGTAAAGCTCAAGGTCACAAATAAACTGAGTGACCTACTCGAGAAGCAAAAAACAAATGGCTACCCTGTTAATTTGCTAGTGGTCGATTCATTATATCCATCTCGGGTAGAAATGTGCCACCAACTTGGGGTAAAAGGAGCTCCATTTTTCACTCACTCTTGTGCTGTTGGTGCCATTTATTATAATGCTCGCTTAGGGAAATTGAAGATACCTCCTGAGGAAGGGTTGACTTCTGTTTCATTGCCTTCAATTCCATTGTTGGGGAGAAATGATTTGCCAATTATTAGGACTGGCACCTTTCCTGATCTCTTTGAGCATTTGGGGAATCAGTTTTCAGATCTTGATAAAGCGGATTGGATCTTTTTCAATACTTTTGATAAGCTTGAAAATGAGGAAGCAAAATGGCTATCTAGCCAATGGCCAATTACATCCATCGGACCATTAATCCCTTCAATGTACTTAGACAAACAATTACCAAATGACAAAGACAATGACATTAATTTCTACAAGGCAGACGTCGGATCGTGCATCAAGTGGCTAGACGCCAAAGACCCTGGCTCGGTAGTCTACGCCTCATTCGGGAGCGTGAAGCACAACCTCGGCGATGACTACATGGACGAAGTAGCATGGGGCTTGTTACACAGCAAATATCACTTCATATGGGTTGTTATAGAATCCGAACGTACAAAGCTCTCTAGCGATTTCTTGGCAGAGGCAGAGGAAAAAGGCCTAATAGTGAGTTGGTGCCCTCAACTCGAAGTTTTGTCACATAAATCTATAGGTAGTTTTATGACTCATTGTGGTTGGAACTCGACGGTTGAGGCATTGAGTTTGGGCGTGCCAATGGTGGCAGTGCCACAACAGTTTGATCAGCCTGTTAATGCCAAGTATATCGTGGATGTATGGCGAATTGGGGTTCAGGTTCCGATTGGTGAAAATGGGGTTCTTTTGAGGGGAGAAGTTGCTAACTGTATAAAGGATGTTATGGAGGGGGAAATAGGGGATGAGCTTAGAGGGAATGCTTTGAAATGGAAGGGGTTGGCTGTGGAGGCAATGGAGAAAGGGGGTAGCTCTGATAAGAATATTGATGAGTTCATTTCAAAGCTTGTTTCCTCCTGAATGGAGAGAGAAATGTTGAGCAAAACTCACATTATGTTCATCCCATTCCCAGCTCAAGGCCACATGAGCCCAATGATGCAATTCGTCAAGCGTTTAGCCTGGAAAGGCGTGCGAATCACGATAGTTCTTCCGGCTGAGATTCGAGATTCTATGCAAATAAACAACTCATTGATCAACACTGAGTGCATCTCCTTTGATTTTGATAAAGATGATGAGATGCCATACAGCATGCGGGCTTATATGGGAGTTGTAAAGCTCAAGGTCACAAATAAACTGAGTGACCTACTCGAGAAGCAAAAAACAAATGGCTACCCTGTTAATTTGCTAGTGGTCGATTCATTATATCCATCTCGGGTAGAAATGTGCCACCAACTTGGGGTAAAAGGAGCTCCATTTTTCACTCACTCTTGTGCTGTTGGTGCCATTTATTATAATGCTCGCTTAGGGAAATTGAAGATACCTCCTGAGGAAGGGTTGACTTCTGTTTCATTGCCTTCAATTCCATTGTTGGGGAGAAATGATTTGCCAATTATTAGGACTGGCACCTTTCCTGATCTCTTTGAGCATTTGGGGAATCAGTTTTCAGATCTTGATAAAGCGGATTGGATCTTTTTCAATACTTTTGATAAGCTTGAAAATGAGGAAGCAAAATGGCTATCTAGCCAATGGCCAATTACATCCATCGGACCATTAATCCCTTCAATGTACTTAGACAAACAATTACCAAATGACAAAGACAATGACATTAATTTCTACAAGGCAGACGTCGGATCGTGCATCAAGTGGCTAGACGCCAAAGACCCTGGCTCGGTAGTCTACGCCTCATTCGGGAGCGTGAAGCACAACCTCGGCGATGACTACATGGACGAAGTAGCATGGGGCTTGTTACACAGCAAATATCACTTCATATGGGTTGTTATAGAATCCGAACGTACAAAGCTCTCTAGCGATTTCTTGGCAGAGGCAGAGGAAAAAGGCCTAATAGTGAGTTGGTGCCCTC AACTCGAAGTTTTGTCACATAAATCTATAGGTAGTTTTATGACTCATTGTGGTTGGAACTCGACGGTTGAGGCATTGAGTTTGGGCGTGCCAATGGTGGCAGTGCCACAACAGTTTGATCAGCCTGTTAATGCCAAGTATATCGTGGATGTATGGCGAATTGGGGTTCAGGTTCCGATTGGTGAAAATGGGGTTCTTTTGAGGGGAGAAGTTGCTAACTGTATAAAGGATGTTATGGAGGGGGAAATAGGGGATGAGCTTAGAGGGAATGCTTTGAAATGGAAGGGGTTGGCTGTGGAGGCAATGGAGAAAGGGGGTAGCTCTGATAAGAATATTGATGAGTTCATTTCAAAGCTTGTTTCCTCCTGA

Pn50氨基酸序列SEQ ID NO:4Pn50 amino acid sequence SEQ ID NO:4

MEREMLSKTHIMFIPFPAQGHMSPMMQFVKRLAWKGVRITIVLPAEIRDSMQINNSLINTECISFDFDKDDEMPYSMRAYMGVVKLKVTNKLSDLLEKQKTNGYPVNLLVVDSLYPSRVEMCHQLGVKGAPFFTHSCAVGAIYYNARLGKLKIPPEEGLTSVSLPSIPLLGRNDLPIIRTGTFPDLFEHLGNQFSDLDKADWIFFNTFDKLENEEAKWLSSQWPITSIGPLIPSMYLDKQLPNDKDNDINFYKADVGSCIKWLDAKDPGSVVYASFGSVKHNLGDDYMDEVAWGLLHSKYHFIWVVIESERTKLSSDFLAEAEEKGLIVSWCPQLEVLSHKSIGSFMTHCGWNSTVEALSLGVPMVAVPQQFDQPVNAKYIVDVWRIGVQVPIGENGVLLRGEVANCIKDVMEGEIGDELRGNALKWKGLAVEAMEKGGSSDKNIDEFISKLVSSMEREMLSKTHIMFIPFPAQGHMSPMMQFVKRLAWKGVRITIVLPAEIRDSMQINNSLINTECISFDFDKDDEMPYSMRAYMGVVKLKVTNKLSDLLEKQKTNGYPVNLLVVDSLYPSRVEMCHQLGVKGAPFFTHSCAVGAIYYNARLGKLKIPPEEGLTSVSLPSIPLLGRNDLPIIRTGTFPDLFEHLGNQFSDLDKADWIFFNTFDKLENEEAKWLSSQWPITSIGPLIPSMYLDKQLPNDKDNDINFYKADVGSCIKWLDAKDPGSVVYASFGSVKHNLGDDYMDEVAWGLLHSKYHFIWVVIESERTKLSSDFLAEAEEKGLIVSWCPQLEVLSHKSIGSFMTHCGWNSTVEALSLGVPMVAVPQQFDQPVNAKYIVDVWRIGVQVPIGENGVLLRGEVANCIKDVMEGEIGDELRGNALKWKGLAVEAMEKGGSSDKNIDEFISKLVSS

实施例2.三七糖基转移酶基因Pn50的在大肠杆菌中表达Example 2. Expression of the notoginseng glycosyltransferase gene Pn50 in Escherichia coli

合成如序列Pn50表达引物F SEQ ID NO:5(GGATCCATGGAGAGAGAAATGTTGAGCA)及Pn50表达引物R SEQ ID NO:6(CTCGAGTCAGGAGGAAACAAGCTTTGAA)的两条引物。在合成的引物F/R上分别加BamH I和Xho I两个酶切位点，以从植物中提取的cDNA为模板进行PCR。DNA聚合酶选用宝生物工程有限公司的高保真的KOD DNA聚合酶。PCR扩增程序为：94℃2min；94℃15s，58℃30s，68℃2min，共35个循环；68℃10min降至10℃。PCR产物经琼脂糖凝胶电泳检测。在紫外下照射，切下目标DNA条带。然后采用Axygen Gel Extraction Kit(AEYGEN公司)从琼脂糖凝胶中回收DNA即为扩增出的糖基转移酶基因的DNA片段。将回收的两个PCR产物用BamH I和Xho I酶切后与分别与同样用BamH I和Xho I酶切后的pET28a连接，连接产物转化大肠杆菌EPI300感受态细胞，将转化后的大肠杆菌菌液涂布在添加50ug/mL卡那霉素的LB平板上，并进一步通过PCR和酶切验证重组克隆。各选取其中一个克隆提取重组质粒后进行测序验证，测序验证正确后将重组质粒转化大肠杆菌BL21(DE3)中诱导表达，诱导表达方法为：从平板挑取单克隆接种至含有50ug/mL卡那霉素的LB试管中过夜，取1％接种至50ml三角瓶中37℃震荡培养至OD600为0.6-0.7加入终浓度为0.1mM的IPTG诱导,在18℃下诱导16h。12000g，3min收集菌体每克湿重菌体加入10ml PBS buffer重悬后裂解菌体，离心取上清做为粗酶液。Two primers such as the sequence Pn50 expression primer F SEQ ID NO: 5 (GGATCCATGGAGAGAGAAATGTTGAGCA) and Pn50 expression primer R SEQ ID NO: 6 (CTCGAGTCAGGAGGAAACAAGCTTTGAA) were synthesized. Two restriction sites, BamH I and Xho I, were respectively added to the synthesized primers F/R, and PCR was carried out using cDNA extracted from plants as a template. The DNA polymerase was selected from the high-fidelity KOD DNA polymerase of Treasure Bioengineering Co., Ltd. The PCR amplification program was as follows: 94°C for 2min; 94°C for 15s, 58°C for 30s, 68°C for 2min, a total of 35 cycles; 68°C for 10min to 10°C. PCR products were detected by agarose gel electrophoresis. Under UV irradiation, the target DNA band is excised. Then, Axygen Gel Extraction Kit (AEYGEN Company) was used to recover DNA from the agarose gel, which was the DNA fragment of the amplified glycosyltransferase gene. The recovered two PCR products were digested with BamH I and Xho I and then ligated with pET28a that was also digested with BamH I and Xho I respectively, and the ligation product was transformed into Escherichia coli EPI300 competent cells, and the transformed Escherichia coli bacteria were The solution was spread on the LB plate added with 50ug/mL kanamycin, and the recombinant clone was further verified by PCR and enzyme digestion. Select one of the clones to extract the recombinant plasmid and perform sequencing verification. After the sequencing verification is correct, transform the recombinant plasmid into Escherichia coli BL21 (DE3) to induce expression. The induction expression method is: pick a single clone from the plate and inoculate it into the Inoculate 1% of mycin in the LB test tube overnight, inoculate it into a 50ml Erlenmeyer flask with shaking at 37°C until the OD600 is 0.6-0.7, add IPTG with a final concentration of 0.1mM for induction, and induce at 18°C for 16h. 12000g, 3min to collect the bacteria, add 10ml PBS buffer to resuspend each gram of wet weight of the bacteria, lyse the bacteria, centrifuge and take the supernatant as the crude enzyme solution.

实施例3.三七糖基转移酶基因Pn50催化不同底物反应及其产物检测Example 3. Notoginseng glycosyltransferase gene Pn50 catalyzes different substrate reactions and detection of its products

配置糖基转移酶Pn50催化反应体系(100μL)如下：Configure the glycosyltransferase Pn50 catalytic reaction system (100 μL) as follows:

在37℃水浴下反应2h。反应结束后加入等体积的正丁醇抽提，取正丁醇相，经真空浓缩后，反应产物溶解于10μL甲醇中，结果用TLC或HPLC检测。从图2中结果可以看出本发明中使用的糖基转移酶Pn50可以催化原人参二醇PPD形成一种新的产物(反应见式A)，其在TLC板上的迁移位置和Rh2的迁移位置一致，证明此种新的产物为人参皂苷Rh2。此外Pn50还能催化compound K，生成的产物根据TLC板上的迁移位置以及Pn50的区域专一性推测为人参皂苷F2(图2)。The reaction was carried out in a water bath at 37°C for 2h. After the reaction was completed, an equal volume of n-butanol was added for extraction, and the n-butanol phase was taken, concentrated in vacuo, and the reaction product was dissolved in 10 μL of methanol, and the results were detected by TLC or HPLC. As can be seen from the results in Figure 2, the glycosyltransferase Pn50 used in the present invention can catalyze protopanaxadiol PPD to form a new product (reaction is shown in formula A), and its migration position and Rh2 migration on the TLC plate The positions are consistent, which proves that this new product is ginsenoside Rh2. In addition, Pn50 can also catalyze compound K, and the generated product is speculated to be ginsenoside F2 according to the migration position on the TLC plate and the regional specificity of Pn50 (Figure 2).

实施例4.利用三七来源的糖基转移酶基因Pn50在酿酒酵母中合成稀有人参皂苷Rh2Example 4. Using the glycosyltransferase gene Pn50 derived from Panax notoginseng to synthesize rare ginsenoside Rh2 in Saccharomyces cerevisiae

(1)三七来源的糖基转移酶基因Pn50可以催化原人参二醇C3位羟基糖基化合成稀有人参皂苷Rh2，为实现在酿酒酵母中合成稀有人参皂苷Rh2，本发明首先构建了一株可生产原人参二醇的酿酒酵母底盘细胞。向野生型酿酒酵母中导入达玛烯二醇合成酶基因PgDDS，原人参二醇合成的细胞色素P450基因CYP716A47及其还原酶基因PgCPR1，利用酿酒酵母自身的甲羟戊酸途径合成的2,3-环氧角鲨烯可以合成原人参二醇。通过对人工构建的原人参二醇合成途径进行优化包括对合成限速步骤优化，前体供应优化等获得了一株高产原人参二醇的酿酒酵母菌株ZWBY04RS。(1) The glycosyltransferase gene Pn50 derived from Panax notoginseng can catalyze the C3 hydroxyl glycosylation of protopanaxadiol to synthesize rare ginsenoside Rh2. In order to realize the synthesis of rare ginsenoside Rh2 in Saccharomyces cerevisiae, the present invention first constructed a Saccharomyces cerevisiae chassis cells producing protopanaxadiol. The dammarenediol synthase gene PgDDS, the cytochrome P450 gene CYP716A47 and its reductase gene PgCPR1 synthesized by protopanaxadiol were introduced into wild-type Saccharomyces cerevisiae, and the 2,3 - Epoxy squalene can synthesize protopanaxadiol. A S. cerevisiae strain ZWBY04RS with high protopanaxadiol production was obtained by optimizing the artificially constructed protopanaxadiol synthesis pathway, including optimizing the rate-limiting steps of synthesis and optimizing the supply of precursors.

(2)合成如序列SEQ ID NO:7-18的12条引物，以PCR的方法获得糖基转移酶表达的启动子，终止子，ORF，筛选标记，上下游同源臂片段，PCR方法同实施例1。将上述PCR片段以及UGTPg45的ORF各100ng混匀后，利用酿酒酵母常规的LiAc/ssDNA转化方法，转化重组酿酒酵母菌株ZWBY04RS，获得重组酿酒酵母菌株ZWBY04RS-UGTPg45。类似的将上述PCR片段以及Pn50的ORF各100ng混匀后，利用酿酒酵母常规的LiAc/ssDNA转化方法，转化重组酿酒酵母菌株ZWBY04RS，获得重组酿酒酵母菌株ZWBY04RS-Pn50。(2) Synthesize 12 primers such as the sequence of SEQ ID NO:7-18, obtain the promoter expressed by the glycosyltransferase, terminator, ORF, screening markers, upstream and downstream homology arm fragments with the PCR method, and the PCR method is the same as Example 1. After mixing 100 ng of the above PCR fragment and the ORF of UGTPg45, the recombinant Saccharomyces cerevisiae strain ZWBY04RS was transformed using the conventional LiAc/ssDNA transformation method of Saccharomyces cerevisiae to obtain the recombinant Saccharomyces cerevisiae strain ZWBY04RS-UGTPg45. Similarly, 100 ng of the above PCR fragment and the ORF of Pn50 were mixed, and the recombinant Saccharomyces cerevisiae strain ZWBY04RS was transformed using the conventional LiAc/ssDNA transformation method of Saccharomyces cerevisiae to obtain the recombinant Saccharomyces cerevisiae strain ZWBY04RS-Pn50.

启动子引物F SEQ_ID_NO.7Promoter primer F SEQ_ID_NO.7

TAGCTCTGATAAGAATATTGATGAGTTCATTTCAAAGCTTGTTTCCTCCTGAGATTTAGCTCTGATAAGAATATTGATGAGTTCATTTCAAAGCTTGTTTCCTCCTGAGATT

启动子引物R SEQ_ID_NO.8Promoter primer R SEQ_ID_NO.8

ACTGTCAAGGAGGGTATTCTGGGCCTCCATGTCGCTGCTATATAACAGTTGAAATTTGGATAAGAACATACTGTCAAGGAGGGTATTCTGGGCCTCCATGTCGCTGCTATATAACAGTTGAAATTTGGATAAGAACAT

ORF引物F SEQ_ID_NO.9ORF Primer F SEQ_ID_NO.9

GCTATACTGCTGTCGATTCGATACTAACGCCGCCATCCAGTGTCGAGAATTACAATAGTGCTATACTGCTGTCGATTCGATACTAACGCCGCCATCCAGTGTCGAGAATTACAATAGT

ORF引物R SEQ_ID_NO.10ORF Primer R SEQ_ID_NO.10

TCTGGTGAGGATTTACGGTATGATCATGCTGGTAAGCTTCGTTATCTGGTGAGGATTTACGGTATGATCATGCTGGTAAGCTTCGTTA

终止子引物F SEQ_ID_NO.11Terminator primer F SEQ_ID_NO.11

GAAAAGAAGATAATATTTTTATATAATTATATTAATCTCAGGAGGAAACAAGCTTTGAAGAAAAGAAGATAATATTTTTTATATAATTATATTAAATCTCAGGAGGAAACAAGCTTTGAA

终止子引物R SEQ_ID_NO.12Terminator primer R SEQ_ID_NO.12

GCATAGCAATCTAATCTAAGTTTTAATTACAAAATGGAGAGAGAAATGTTGAGCAAAACGCATAGCAATCTAATCTAAGTTTTTAATTACAAAATGGAGAGAGAAATGTTGAGCAAAAC

筛选标记引物F SEQ_ID_NO.13Screening marker primer F SEQ_ID_NO.13

CGCGTTGAGAAGATGTTCTTATCCAAATTTCAACTGTTATATAGCAGCGACGCGTTGAGAAGATGTTTCTTATCCAAATTTCAACTGTTATATAGCAGCGA

筛选标记引物R SEQ_ID_NO.14Screening marker primer R SEQ_ID_NO.14

TTACTTCTTGCAGACATCAGACATACTATTGTAATTCTCGACACTGGATGGCGGCGTTATTACTTCTTGCAGACATCAGACATACTATTGTAATTCTCGACACTGGATGGCGGCGTTA

上游同源臂引物F SEQ_ID_NO.15Upstream homology arm primer F SEQ_ID_NO.15

GGCTGTCGCCATTCAAGAGCAGATAGCTTCAAAATGTTTCTACTCCTTTTTTACTCTTCGGCTGTCGCCATTCAAGAGCAGATAGCTTCAAAATGTTTCTACTCCTTTTTTACTCTTC

上游同源臂引物R SEQ_ID_NO.16Upstream homology arm primer R SEQ_ID_NO.16

CATAATGTGAGTTTTGCTCAACATTTCTCTCTCCATTTTGTAATTAAAACTTAGATTAGCATAATGTGAGTTTTGCTCAACATTTCTCTCTCCATTTTGTAATTAAAACTTAGATTAG

下游同源臂引物F SEQ_ID_NO.17Downstream homology arm primer F SEQ_ID_NO.17

CCCAAAGCTAAGAGTCCCATTTTATTCCCCAAAAGCTAAGAGTCCCATTTTTATTC

下游同源臂引物R SEQ_ID_NO.18Downstream homology arm primer R SEQ_ID_NO.18

GAGTAGAAACATTTTGAAGCTATCTGCTCTTGAATGGCGACAGCCTATTGCCCCAGTGTGAGTAGAAACATTTTGAAGCTATCTGCTCTTGAATGGCGACAGCTATTGCCCCAGTGT

(3)配置固体培养基：配置培养基：1％Yeast Extract(酵母膏)，2％Peptone(蛋白胨)，2％Dextrose(glucose)(葡萄糖)，2％琼脂粉。(3) Prepare solid medium: Prepare medium: 1% Yeast Extract (yeast extract), 2% Peptone (peptone), 2% Dextrose (glucose) (glucose), 2% agar powder.

配置液体培养基：配置培养基：1％Yeast Extract(酵母膏)，2％Peptone(蛋白胨)，2％Dextrose(glucose)(葡萄糖)。Configure liquid medium: configure medium: 1% Yeast Extract (yeast extract), 2% Peptone (peptone), 2% Dextrose (glucose) (glucose).

挑取在固体培养基平板上划线的重组酿酒酵母菌ZWBY04RS-UGTPg45和ZWBY04RS-Pn50，分别于含有5mL液体培养基的试管震荡培养过夜(30℃，250rpm，16h)；离心收集菌体，转移至10mL液体培养基的50mL三角瓶中，调OD600至0.05，30℃，250rpm震荡培养4天得到发酵产物。本方法对每一株重组酵母同时设置一个平行实验。Pick the recombinant Saccharomyces cerevisiae ZWBY04RS-UGTPg45 and ZWBY04RS-Pn50 that were streaked on the solid medium plate, and shake them in test tubes containing 5mL liquid medium for overnight culture (30°C, 250rpm, 16h); centrifuge to collect the cells, transfer Put it into a 50mL Erlenmeyer flask with 10mL of liquid medium, adjust the OD600 to 0.05, culture at 30°C with shaking at 250rpm for 4 days to obtain the fermentation product. In this method, a parallel experiment is set up for each strain of recombinant yeast.

原人参二醇及稀有人参皂苷Rh2的提取和检测：从10mL发酵液中吸取100μL发酵液，用Fastprep震荡裂解酵母，加入等体积的正丁醇抽提，而后在真空条件下使正丁醇蒸干。用100μL甲醇溶解后通过HPLC检测目的产物的产量。HPLC结果见图3。Extraction and detection of protopanaxadiol and rare ginsenoside Rh2: draw 100 μL of fermentation broth from 10 mL of fermentation broth, shake and lyse the yeast with Fastprep, add an equal volume of n-butanol for extraction, and then distill n-butanol under vacuum conditions Dry. After dissolved in 100 μL of methanol, the yield of the target product was detected by HPLC. The HPLC results are shown in FIG. 3 .

向产原人参二醇的酿酒酵母菌株中导入人参来源的糖基转移酶基因UGTPg45所构建的重组酿酒酵母菌株ZWBY04RS-UGTPg45能合成稀有人参皂苷Rh2，其产量为35.66mg/L。向产原人参二醇的酿酒酵母菌株中导入三七来源的糖基转移酶基因Pn50所构建的重组酿酒酵母菌株ZWBY04RS-Pn50能合成稀有人参皂苷Rh2，其产量为45.55mg/L。The recombinant Saccharomyces cerevisiae strain ZWBY04RS-UGTPg45 constructed by introducing the glycosyltransferase gene UGTPg45 derived from ginseng into the Saccharomyces cerevisiae strain producing protopanaxadiol can synthesize rare ginsenoside Rh2, and its yield is 35.66mg/L. The recombinant Saccharomyces cerevisiae strain ZWBY04RS-Pn50 constructed by introducing the glycosyltransferase gene Pn50 derived from Panax notoginseng into the Saccharomyces cerevisiae strain producing protopanaxadiol can synthesize rare ginsenoside Rh2, and its yield is 45.55mg/L.

实施例5.人参来源的糖基转移酶UGTPg45随机突变库的构建Example 5. Construction of ginseng-derived glycosyltransferase UGTPg45 random mutation library

以质粒UGTPg45-pMD18T为模板，使用GeneMorph II Random Mutagenesis Kit(Agilent Technology)，UGTPg45随机突变引物F SEQ ID NO:23(Gcatagcaatctaatctaagttttaattacaaaatggagagagaaatgttgagcaaaac)和UGTPg45随机突变引物R SEQ ID NO:24(Gaaaagaagataatatttttatataattatattaatctcaggaggaaacaagctttgaa)为引物进行易错PCR，程序如下：95℃2min预变性；95℃变性30s，58℃退火30s，72℃延伸3min 15s，30个循环；72℃终延伸10min。根据试剂盒说明书加入1ug，1.5ug，2ug模板对突变率进行摸索。割胶回收PCR产物，用Taq酶加A，连接到载体pMD18T上，转化大肠杆菌TOP10。随机挑取10个阳性克隆测序，确定突变率在1-2碱基/基因相应的模板用量为1.5ug。后续实验使用该条件进行易错PCR即构建好UGTPg45的随机突变库。以质粒UGTPg45-pMD18T为模板，使用GeneMorph II Random Mutagenesis Kit(Agilent Technology)，UGTPg45随机突变引物F SEQ ID NO:23(Gcatagcaatctaatctaagttttaattacaaaatggagagagaaatgttgagcaaaac)和UGTPg45随机突变引物R SEQ ID NO:24(Gaaaagaagataatatttttatataattatattaatctcaggaggaaacaagctttgaa)为引物进行易Incorrect PCR, the program is as follows: pre-denaturation at 95°C for 2min; denaturation at 95°C for 30s, annealing at 58°C for 30s, extension at 72°C for 3min and 15s, 30 cycles; final extension at 72°C for 10min. According to the kit instructions, 1ug, 1.5ug, 2ug templates were added to explore the mutation rate. The PCR product was recovered by tapping the rubber, and was connected to the vector pMD18T with Taq enzyme plus A, and transformed into Escherichia coli TOP10. Randomly pick 10 positive clones for sequencing, and determine that the mutation rate is 1-2 bases/gene and the corresponding template dosage is 1.5ug. Subsequent experiments use this condition to carry out error-prone PCR to construct a random mutation library of UGTPg45.

使用引物SEQ ID NO:25(片段7引物F)(Acactggggcaataggctgtcgccattcaagagcagatagcttcaaaatgtttctactc)和SEQ ID NO:26(片段7引物R)(Cataatgtgagttttgctcaacatttctctctccattttgtaattaaaacttagattag)，以酿酒酵母基因组DNA为模板PCR获得片段7，使用引物SEQ ID NO:27(片段8引物F)(Ttgatgagttcatttcaaagcttgtttcctcctgagattaatataattatataaaaata)和SEQ ID NO:28(片段8引物R)(Actgtcaaggagggtattctgggcctccatgtcgctgctatataacagttgaaatttgg)以酿酒酵母基因组DNA为模板PCR获得片段8，使用引物SEQ ID NO:29(片段9引物F)(Cccaaagctaagagtcccattttattc)和SEQ ID NO:30(片段9引物R)(Gaagagtaaaaaaggagtagaaacattttgaagctatctgctcttgaatggcgacagcc)，SEQ ID NO:31(片段10引物F)(Tgtcgattcgatactaacgccgccatccagtgtcgagaattacaatagtatgtctgatg)和SEQID NO:32(片段10引物R)(Tctggtgaggatttacggtatg)以酿酒酵母基因组DNA为模板PCR获得片段9和10。使用引物SEQ ID NO:33(片段11引物F)(Aagatgttcttatccaaatttcaactgttatatagcagcgacatggaggcccagaatac)和SEQ ID NO:34(片段11引物R)(tacttcttgcagacatcagacatactattgtaattctcgacactggatggcggcgttag)以质粒PLKAN为模板PCR获得片段11。将上述片段7-11等摩尔比混合，转化PPD高产菌株ZWBY04RS，涂布YPD平板(加入100ug/ml G418)，30℃培养2天，待表达UGTPg45突变基因的酵母转化子长出。类似的，转化野生型UGTPg45构建酵母转化子作为对照。Using primer SEQ ID NO:25 (fragment 7 primer F) (Acactggggcaataggctgtcgccattcaagagcagatagcttcaaaatgtttctactc) and SEQ ID NO:26 (fragment 7 primer R) (Cataatgtgagttttgctcaacatttctctctccattttgtaattaaaacttagattag), using Saccharomyces cerevisiae genomic DNA as NO template PCR to obtain fragment ID 7, using primer SEQ: 27 (Fragment 8 Primer F) (Ttgatgagttcatttcaaagcttgtttcctcctgagattaatataattatataaaaata) and SEQ ID NO: 28 (Fragment 8 Primer R) (Actgtcaaggagggtattctgggcctccatgtcgctgctatataacagttgaaatttgg) obtained Saccharomyces cerevisiae genomic DNA as template PCR to obtain Fragment F fragment 8, using primer 9 (SEQ ID NO: 2) (Cccaaagctaagagtcccattttattc)和SEQ ID NO:30(片段9引物R)(Gaagagtaaaaaaggagtagaaacattttgaagctatctgctcttgaatggcgacagcc)，SEQ ID NO:31(片段10引物F)(Tgtcgattcgatactaacgccgccatccagtgtcgagaattacaatagtatgtctgatg)和SEQID NO:32(片段10引物R)(Tctggtgaggatttacggtatg)以酿酒酵母Fragments 9 and 10 were obtained by PCR using genomic DNA as a template. Fragment 11 was obtained by PCR using primers SEQ ID NO: 33 (Primer F for Fragment 11) (Aagatgttcttatccaaatttcaactgttatatagcagcgacatggaggcccagaatac) and SEQ ID NO: 34 (Primer R for Fragment 11) (tacttcttgcagacatcagacatactattgtaattctcgacactggatggcggcgttag) using plasmid PLKAN as a template. The above fragments 7-11 were mixed in equimolar ratio, transformed into PPD high-yield strain ZWBY04RS, coated with YPD plate (adding 100ug/ml G418), cultured at 30°C for 2 days, and the yeast transformant expressing UGTPg45 mutant gene grew out. Similarly, yeast transformants were constructed by transforming wild-type UGTPg45 as a control.

在96孔板中加入每孔600ul YPD培养基(加入100ug/ml G418)，挑取酵母单克隆到培养基中，30℃280rpm震荡培养1天。转移6ul培养物到一块含有600ul YPD培养基的新的96孔板中，30℃280rpm震荡培养3天。每孔加入600ul正丁醇，盖好橡胶盖并用胶带封好，旋转抽提3h。4000rpm离心10min，吸取150ul正丁醇相到一块新的96孔板中，HPLC进行产物测定。Add 600ul of YPD medium (100ug/ml G418) to each well of a 96-well plate, pick a single yeast clone into the medium, and culture at 280rpm at 30°C for 1 day. Transfer 6 ul of the culture to a new 96-well plate containing 600 ul of YPD medium, and culture at 30°C with shaking at 280 rpm for 3 days. Add 600ul of n-butanol to each hole, cover the rubber cover and seal it with tape, and extract by rotation for 3h. Centrifuge at 4000rpm for 10min, pipette 150ul of n-butanol phase into a new 96-well plate, and perform product determination by HPLC.

利用突变体基因8E7所构建的稀有人参皂苷Rh2菌株ZWBY04RS-8E7其Rh2产量相比于利用UGTPg45所构建菌株ZWBY04RS-UGTPg45产量提升了70％，达到60.48mg/L。The Rh2 production of the rare ginsenoside Rh2 strain ZWBY04RS-8E7 constructed using the mutant gene 8E7 was 70% higher than that of the strain ZWBY04RS-UGTPg45 constructed using UGTPg45, reaching 60.48mg/L.

所述野生型基因UGTPg45具有SEQ ID NO:20的核苷酸序列。自SEQ ID NO:20的5’端第1-1374位核苷酸为UGTPg45的开放阅读框，自SEQ ID NO:20的5’端的第1-3位核苷酸为UGTPg45基因的起始密码子ATG，自SEQ ID NO:20的5’端的第1371-1374位核苷酸为UGTPg45基因的终止密码子TGA。糖基转移酶UGTPg45基因编码一个含有457个氨基酸的蛋白质UGTPg45，具有SEQ ID NO:19的氨基酸残基序列，用软件预测到该蛋白质的理论分子量大小为51.1kDa，等电点pI为5.10。自SEQ ID NO:19的氨基端的第332-375位氨基酸为糖基转移酶PSPG保守功能域。The wild-type gene UGTPg45 has the nucleotide sequence of SEQ ID NO:20. The 1st-1374th nucleotide from the 5' end of SEQ ID NO:20 is the open reading frame of UGTPg45, and the 1st-3rd nucleotide from the 5' end of SEQ ID NO:20 is the initiation codon of UGTPg45 gene Sub ATG, the 1371-1374 nucleotides from the 5' end of SEQ ID NO: 20 is the stop codon TGA of UGTPg45 gene. The glycosyltransferase UGTPg45 gene encodes a protein UGTPg45 containing 457 amino acids, which has the amino acid residue sequence of SEQ ID NO: 19. The theoretical molecular weight of the protein is predicted to be 51.1 kDa by software, and the isoelectric point pI is 5.10. The 332-375th amino acid from the amino terminal of SEQ ID NO: 19 is the conserved functional domain of glycosyltransferase PSPG.

UGTPg45氨基酸序列SEQ_ID_NO.19UGTPg45 amino acid sequence SEQ_ID_NO.19

MEREMLSKTHIMFIPFPAQGHMSPMMQFAKRLAWKGLRITIVLPAQIRDFMQITNPLINTECISFDFDKDDGMPYSMQAYMGVVKLKVTNKLSDLLEKQRTNGYPVNLLVVDSLYPSRVEMCHQLGVKGAPFFTHSCAVGAIYYNARLGKLKIPPEEGLTSVSLPSIPLLGRDDLPIIRTGTFPDLFEHLGNQFSDLDKADWIFFNTFDKLENEEAKWLSSQWPITSIGPLIPSMYLDKQLPNDKDNGINFYKADVGSCIKWLDAKDPGSVVYASFGSVKHNLGDDYMDEVAWGLLHSKYHFIWVVIESERTKLSSDFLAEAEAEEKGLIVSWCPQLQVLSHKSIGSFMTHCGWNSTVEALSLGVPMVALPQQFDQPANAKYIVDVWQIGVRVPIGEEGVVLRGEVANCIKDVMEGEIGDELRGNALKWKGLAVEAMEKGGSSDKNIDEFISKLVSSMEREMLSKTHIMFIPFPAQGHMSPMMQFAKRLAWKGLRITIVLPAQIRDFMQITNPLINTECISFDFDKDDGMPYSMQAYMGVVKLKVTNKLSDLLEKQRTNGYPVNLLVVDSLYPSRVEMCHQLGVKGAPFFTHSCAVGAIYYNARLGKLKIPPEEGLTSVSLPSIPLLGRDDLPIIRTGTFPDLFEHLGNQFSDLDKADWIFFNTFDKLENEEAKWLSSQWPITSIGPLIPSMYLDKQLPNDKDNGINFYKADVGSCIKWLDAKDPGSVVYASFGSVKHNLGDDYMDEVAWGLLHSKYHFIWVVIESERTKLSSDFLAEAEAEEKGLIVSWCPQLQVLSHKSIGSFMTHCGWNSTVEALSLGVPMVALPQQFDQPANAKYIVDVWQIGVRVPIGEEGVVLRGEVANCIKDVMEGEIGDELRGNALKWKGLAVEAMEKGGSSDKNIDEFISKLVSS

UGTPg45核苷酸序列SEQ_ID_NO.20UGTPg45 nucleotide sequence SEQ_ID_NO.20

ATGGAGAGAGAAATGTTGAGCAAAACTCACATTATGTTCATCCCATTCCCAGCTCAAGGCCACATGAGCCCAATGATGCAATTCGCCAAGCGTTTAGCCTGGAAAGGCCTGCGAATCACGATAGTTCTTCCGGCTCAAATTCGAGATTTCATGCAAATAACCAACCCATTGATCAACACTGAGTGCATCTCCTTTGATTTTGATAAAGACGATGGGATGCCATACAGCATGCAGGCTTATATGGGAGTTGTAAAACTCAAGGTCACAAATAAACTGAGTGACCTACTCGAGAAGCAAAGAACAAATGGCTACCCTGTTAATTTGCTAGTGGTTGATTCATTATATCCATCTCGGGTAGAAATGTGCCACCAACTTGGGGTAAAAGGAGCTCCATTTTTCACTCACTCTTGTGCTGTTGGTGCCATTTATTATAATGCTCGCTTAGGGAAATTGAAGATACCTCCTGAGGAAGGGTTGACTTCTGTTTCATTGCCTTCAATTCCATTGTTGGGGAGAGATGATTTGCCAATTATTAGGACTGGCACCTTTCCTGATCTCTTTGAGCATTTGGGGAATCAGTTTTCAGATCTTGATAAAGCGGATTGGATCTTTTTCAATACTTTTGATAAGCTTGAAAATGAGGAAGCAAAATGGCTATCTAGCCAATGGCCAATTACATCCATCGGACCATTAATCCCTTCAATGTACTTAGACAAACAATTACCAAATGACAAAGACAATGGCATTAATTTCTACAAGGCAGACGTCGGATCGTGCATCAAGTGGCTAGACGCCAAAGACCCTGGCTCGGTAGTCTACGCCTCATTCGGGAGCGTGAAGCACAACCTCGGCGATGACTACATGGACGAAGTAGCATGGGGCTTGTTACATAGCAAATATCACTTCATATGGGTTGTTATAGAATCCGAACGTACAAAGCTCTCTAGCGATTTCTTGGCAGAGGCAGAGGCAGAGGAAAAAGGCCTAATAGTGAGTTGGTGCCCTCAACTCCAAGTTTTGTCACATAAATCTATAGGGAGTTTTATGACTCATTGTGGTTGGAACTCGACGGTTGAGGCATTGAGTTTGGGCGTGCCAATGGTGGCACTGCCACAACAGTTTGATCAGCCTGCTAATGCCAAGTATATCGTGGATGTATGGCAAATTGGGGTTCGGGTTCCGATTGGTGAAGAGGGGGTTGTTTTGAGGGGAGAAGTTGCTAACTGTATAAAGGATGTTATGGAGGGGGAAATAGGGGATGAGCTTAGAGGGAATGCTTTGAAATGGAAGGGGTTGGCTGTGGAGGCAATGGAGAAAGGGGGTAGCTCTGATAAGAATATTGATGAGTTCATTTCAAAGCTTGTTTCCTCCTGAATGGAGAGAGAAATGTTGAGCAAAACTCACATTATGTTCATCCCATTCCCAGCTCAAGGCCACATGAGCCCAATGATGCAATTCGCCAAGCGTTTAGCCTGGAAAGGCCTGCGAATCACGATAGTTCTTCCGGCTCAAATTCGAGATTTCATGCAAATAACCAACCCATTGATCAACACTGAGTGCATCTCCTTTGATTTTGATAAAGACGATGGGATGCCATACAGCATGCAGGCTTATATGGGAGTTGTAAAACTCAAGGTCACAAATAAACTGAGTGACCTACTCGAGAAGCAAAGAACAAATGGCTACCCTGTTAATTTGCTAGTGGTTGATTCATTATATCCATCTCGGGTAGAAATGTGCCACCAACTTGGGGTAAAAGGAGCTCCATTTTTCACTCACTCTTGTGCTGTTGGTGCCATTTATTATAATGCTCGCTTAGGGAAATTGAAGATACCTCCTGAGGAAGGGTTGACTTCTGTTTCATTGCCTTCAATTCCATTGTTGGGGAGAGATGATTTGCCAATTATTAGGACTGGCACCTTTCCTGATCTCTTTGAGCATTTGGGGAATCAGTTTTCAGATCTTGATAAAGCGGATTGGATCTTTTTCAATACTTTTGATAAGCTTGAAAATGAGGAAGCAAAATGGCTATCTAGCCAATGGCCAATTACATCCATCGGACCATTAATCCCTTCAATGTACTTAGACAAACAATTACCAAATGACAAAGACAATGGCATTAATTTCTACAAGGCAGACGTCGGATCGTGCATCAAGTGGCTAGACGCCAAAGACCCTGGCTCGGTAGTCTACGCCTCATTCGGGAGCGTGAAGCACAACCTCGGCGATGACTACATGGACGAAGTAGCATGGGGCTTGTTACATAGCAAATATCACTTCATATGGGTTGTTATAGAATCCGAACGTACAAAGCTCTCTAGCGATTTCTTGGCAGAGGCAGAGGCAGAGGAAAAAGGCCTAATAGTGAGTTGGT GCCCTCAACTCCAAGTTTTGTCACATAAATCTATAGGGAGTTTTATGACTCATTGTGGTTGGAACTCGACGGTTGAGGCATTGAGTTTGGGCGTGCCAATGGTGGCACTGCCACAACAGTTTGATCAGCCTGCTAATGCCAAGTATATCGTGGATGTATGGCAAATTGGGGTTCGGGTTCCGATTGGTGAAGAGGGGGTTGTTTTGAGGGGAGAAGTTGCTAACTGTATAAAGGATGTTATGGAGGGGGAAATAGGGGATGAGCTTAGAGGGAATGCTTTGAAATGGAAGGGGTTGGCTGTGGAGGCAATGGAGAAAGGGGGTAGCTCTGATAAGAATATTGATGAGTTCATTTCAAAGCTTGTTTCCTCCTGA

所述突变体基因8E7具有SEQ ID NO:22的核苷酸序列。自SEQ ID NO:22的5’端第1-1374位核苷酸为8E7的开放阅读框，自SEQ ID NO:22的5’端的第1-3位核苷酸为8E7基因的起始密码子ATG，自SEQ ID NO:22的5’端的第1371-1374位核苷酸为8E7基因的终止密码子TGA。糖基转移酶8E7基因编码一个含有457个氨基酸的蛋白质8E7，具有SEQ ID NO:21的氨基酸残基序列。The mutant gene 8E7 has the nucleotide sequence of SEQ ID NO:22. The 1-1374 nucleotides from the 5' end of SEQ ID NO:22 are the open reading frame of 8E7, and the 1-3 nucleotides from the 5' end of SEQ ID NO:22 are the initiation codon of the 8E7 gene Sub ATG, the 1371-1374th nucleotides from the 5' end of SEQ ID NO:22 is the stop codon TGA of the 8E7 gene. The glycosyltransferase 8E7 gene encodes a protein 8E7 containing 457 amino acids, and has the amino acid residue sequence of SEQ ID NO:21.

8E7氨基酸序列SEQ_ID_NO.218E7 amino acid sequence SEQ_ID_NO.21

MEREMLSKTHIMFIPFPAQGHMSPMMQFAKRLAWKGLRITIVLPAQIRDFMQITNPLINTECISFDFDKDDGMPYSMQAYMGVVKLKVTNKLSDLLEKQRTNGYPVNLLVVDSLYPSRVEMCHQLGVKGAPFFTHSCAVGAIYYNARLGKLKIPPEEGLTSVSLPSIPLLGRDDLPIIRTGTFPDLFEHLGNQFSDLDKADWIFFNTFDKLENEEAKWLSSHWPITSIGPLIPSMYLDKQLPNDKDNGINFYKADVGSCIKWLDAKDPGSVVYASFGSVKHNLGDDYMDEVAWGLLHSKYHFIWVVIESERTKLSSDFLAEVEAEEKGLIVSWCPQLQVLSHKSIGSFMTHCGWNSTVEALSLGVPMVALPQQFDQPANAKYIVDVWQIGVRVPIGEEGVVLRGEVANCIKDVMEGEIGDELRGNALKWKGLAVEAMEKGGSSDKNIDEFISKLVSSMEREMLSKTHIMFIPFPAQGHMSPMMQFAKRLAWKGLRITIVLPAQIRDFMQITNPLINTECISFDFDKDDGMPYSMQAYMGVVKLKVTNKLSDLLEKQRTNGYPVNLLVVDSLYPSRVEMCHQLGVKGAPFFTHSCAVGAIYYNARLGKLKIPPEEGLTSVSLPSIPLLGRDDLPIIRTGTFPDLFEHLGNQFSDLDKADWIFFNTFDKLENEEAKWLSSHWPITSIGPLIPSMYLDKQLPNDKDNGINFYKADVGSCIKWLDAKDPGSVVYASFGSVKHNLGDDYMDEVAWGLLHSKYHFIWVVIESERTKLSSDFLAEVEAEEKGLIVSWCPQLQVLSHKSIGSFMTHCGWNSTVEALSLGVPMVALPQQFDQPANAKYIVDVWQIGVRVPIGEEGVVLRGEVANCIKDVMEGEIGDELRGNALKWKGLAVEAMEKGGSSDKNIDEFISKLVSS

8E7核苷酸序列SEQ_ID_NO.228E7 nucleotide sequence SEQ_ID_NO.22

ATGGAGAGAGAAATGTTGAGCAAAACTCACATTATGTTCATCCCATTCCCAGCTCAAGGCCACATGAGCCCAATGATGCAATTCGCCAAGCGTTTAGCCTGGAAAGGCCTGCGAATCACGATAGTTCTTCCGGCTCAAATTCGAGATTTCATGCAAATAACCAACCCATTGATCAACACTGAGTGCATCTCCTTTGATTTTGATAAAGACGATGGGATGCCATACAGCATGCAGGCTTATATGGGAGTTGTAAAACTCAAGGTCACAAATAAACTGAGTGACCTACTCGAGAAGCAAAGAACAAATGGCTACCCTGTTAATTTGCTAGTGGTTGATTCATTATATCCATCTCGGGTAGAAATGTGCCACCAACTTGGGGTAAAAGGAGCTCCATTTTTCACTCACTCTTGTGCTGTTGGTGCCATTTATTATAATGCTCGCTTAGGGAAATTGAAGATACCTCCTGAGGAAGGGTTGACTTCTGTTTCATTGCCTTCAATTCCATTGTTGGGGAGAGATGATTTGCCAATTATTAGGACTGGCACCTTTCCTGATCTCTTTGAGCATTTGGGGAATCAGTTTTCAGATCTTGATAAAGCGGATTGGATCTTTTTCAATACTTTTGATAAGCTTGAAAATGAGGAAGCAAAATGGCTATCTAGCCATTGGCCAATTACATCCATCGGACCATTAATCCCTTCAATGTACTTAGACAAACAATTACCAAATGACAAAGACAATGGCATTAATTTCTACAAGGCAGACGTCGGATCGTGCATCAAGTGGCTAGACGCCAAAGACCCTGGCTCGGTAGTCTACGCCTCATTCGGGAGCGTGAAGCACAACCTCGGCGATGACTACATGGACGAAGTAGCATGGGGCTTGTTACATAGCAAATATCACTTCATATGGGTTGTTATAGAATCCGAACGTACAAAGCTCTCTAGCGATTTCTTGGCAGAGGTAGAGGCAGAGGAAAAAGGCCTAATAGTGAGTTGGTGCCCTCAACTCCAAGTTTTGTCACATAAATCTATAGGGAGTTTTATGACTCATTGTGGTTGGAACTCGACGGTTGAGGCATTGAGTTTGGGCGTGCCAATGGTGGCACTGCCACAACAGTTTGATCAGCCTGCTAATGCCAAGTATATCGTGGATGTATGGCAAATTGGGGTTCGGGTTCCGATTGGTGAAGAGGGGGTTGTTTTGAGGGGAGAAGTTGCTAACTGTATAAAGGATGTTATGGAGGGGGAAATAGGGGATGAGCTTAGAGGGAATGCTTTGAAATGGAAGGGGTTGGCTGTGGAGGCAATGGAGAAAGGGGGTAGCTCTGATAAGAATATTGATGAGTTCATTTCAAAGCTTGTTTCCTCCTGAATGGAGAGAGAAATGTTGAGCAAAACTCACATTATGTTCATCCCATTCCCAGCTCAAGGCCACATGAGCCCAATGATGCAATTCGCCAAGCGTTTAGCCTGGAAAGGCCTGCGAATCACGATAGTTCTTCCGGCTCAAATTCGAGATTTCATGCAAATAACCAACCCATTGATCAACACTGAGTGCATCTCCTTTGATTTTGATAAAGACGATGGGATGCCATACAGCATGCAGGCTTATATGGGAGTTGTAAAACTCAAGGTCACAAATAAACTGAGTGACCTACTCGAGAAGCAAAGAACAAATGGCTACCCTGTTAATTTGCTAGTGGTTGATTCATTATATCCATCTCGGGTAGAAATGTGCCACCAACTTGGGGTAAAAGGAGCTCCATTTTTCACTCACTCTTGTGCTGTTGGTGCCATTTATTATAATGCTCGCTTAGGGAAATTGAAGATACCTCCTGAGGAAGGGTTGACTTCTGTTTCATTGCCTTCAATTCCATTGTTGGGGAGAGATGATTTGCCAATTATTAGGACTGGCACCTTTCCTGATCTCTTTGAGCATTTGGGGAATCAGTTTTCAGATCTTGATAAAGCGGATTGGATCTTTTTCAATACTTTTGATAAGCTTGAAAATGAGGAAGCAAAATGGCTATCTAGCCATTGGCCAATTACATCCATCGGACCATTAATCCCTTCAATGTACTTAGACAAACAATTACCAAATGACAAAGACAATGGCATTAATTTCTACAAGGCAGACGTCGGATCGTGCATCAAGTGGCTAGACGCCAAAGACCCTGGCTCGGTAGTCTACGCCTCATTCGGGAGCGTGAAGCACAACCTCGGCGATGACTACATGGACGAAGTAGCATGGGGCTTGTTACATAGCAAATATCACTTCATATGGGTTGTTATAGAATCCGAACGTACAAAGCTCTCTAGCGATTTCTTGGCAGAGGTAGAGGCAGAGGAAAAAGGCCTAATAGTGAGTTGGT GCCCTCAACTCCAAGTTTTGTCACATAAATCTATAGGGAGTTTTATGACTCATTGTGGTTGGAACTCGACGGTTGAGGCATTGAGTTTGGGCGTGCCAATGGTGGCACTGCCACAACAGTTTGATCAGCCTGCTAATGCCAAGTATATCGTGGATGTATGGCAAATTGGGGTTCGGGTTCCGATTGGTGAAGAGGGGGTTGTTTTGAGGGGAGAAGTTGCTAACTGTATAAAGGATGTTATGGAGGGGGAAATAGGGGATGAGCTTAGAGGGAATGCTTTGAAATGGAAGGGGTTGGCTGTGGAGGCAATGGAGAAAGGGGGTAGCTCTGATAAGAATATTGATGAGTTCATTTCAAAGCTTGTTTCCTCCTGA

上述结果表明使用本发明中三七来源的糖基转移酶基因Pn50或者对野生型糖基转移酶基因改造获得的突变体基因8E7替换人参来源的糖基转移酶基因UGTPg45均能大幅提升稀有人参皂苷Rh2的合成效率和产量，具有显著有益效果。The above results show that replacing the glycosyltransferase gene UGTPg45 derived from ginseng with the glycosyltransferase gene Pn50 derived from Panax notoginseng or the mutant gene 8E7 obtained by genetically modifying the wild-type glycosyltransferase can greatly increase the rare ginsenoside The synthesis efficiency and yield of Rh2 have significant beneficial effects.

讨论discuss

目前，通过对人参，花旗参和三七的转录组分析，研究人员已经发现了大量的糖基转移酶候选基因，但是仅有极少数的糖基转移酶被验证参与了人参皂苷的合成。对三七中参与人参皂苷合成的糖基转移酶至今未有报道。由于三七也合成相同的人参皂苷，发掘三七来源糖基转移酶一方面可以使我们更好的了解这两类植物合成人参皂苷合成途径，另一方面可以为人参皂苷的合成生物学研究提供更丰富的元件，具有重要意义。At present, through the transcriptome analysis of ginseng, American ginseng and Panax notoginseng, researchers have discovered a large number of glycosyltransferase candidate genes, but only a very small number of glycosyltransferases have been verified to participate in the synthesis of ginsenosides. The glycosyltransferase involved in the synthesis of ginsenosides in Panax notoginseng has not been reported so far. Since Panax notoginseng also synthesizes the same ginsenosides, the discovery of glycosyltransferases from Panax notoginseng can enable us to better understand the synthetic pathways of ginsenosides in these two types of plants, and on the other hand, it can provide insights into the synthetic biology of ginsenosides. Richer components are of great significance.

在本发明提及的所有文献都在本申请中引用作为参考，就如同每一篇文献被单独引用作为参考那样。此外应理解，在阅读了本发明的上述讲授内容之后，本领域技术人员可以对本发明作各种改动或修改，这些等价形式同样落于本申请所附权利要求书所限定的范围。All documents mentioned in this application are incorporated by reference in this application as if each were individually incorporated by reference. In addition, it should be understood that after reading the above teaching content of the present invention, those skilled in the art can make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined by the appended claims of the present application.

序列表sequence listing

<110> 中国科学院上海生命科学研究院<110> Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences

<120> 糖基转移酶、突变体及其应用<120> Glycosyltransferases, mutants and applications thereof

<130> P2017-0476<130> P2017-0476

<160> 34<160> 34

<170> PatentIn version 3.5<170> PatentIn version 3.5

<210> 1<210> 1

<211> 22<211> 22

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 1<400> 1

atggagagag aaatgttgag ca 22atggagagag aaatgttgag ca 22

<210> 2<210> 2

<211> 22<211> 22

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 2<400> 2

tcaggaggaa acaagctttg aa 22tcaggaggaa acaagctttg aa 22

<210> 3<210> 3

<211> 1368<211> 1368

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 3<400> 3

atggagagag aaatgttgag caaaactcac attatgttca tcccattccc agctcaaggc 60atggagagag aaatgttgag caaaactcac attatgttca tcccattccc agctcaaggc 60

cacatgagcc caatgatgca attcgtcaag cgtttagcct ggaaaggcgt gcgaatcacg 120cacatgagcc caatgatgca attcgtcaag cgtttagcct ggaaaggcgt gcgaatcacg 120

atagttcttc cggctgagat tcgagattct atgcaaataa acaactcatt gatcaacact 180atagttcttc cggctgagat tcgagattct atgcaaataa acaactcatt gatcaacact 180

gagtgcatct cctttgattt tgataaagat gatgagatgc catacagcat gcgggcttat 240gagtgcatct cctttgattt tgataaagat gatgagatgc catacagcat gcgggcttat 240

atgggagttg taaagctcaa ggtcacaaat aaactgagtg acctactcga gaagcaaaaa 300atgggagttg taaagctcaa ggtcacaaat aaactgagtg acctactcga gaagcaaaaa 300

acaaatggct accctgttaa tttgctagtg gtcgattcat tatatccatc tcgggtagaa 360acaaatggct accctgttaa tttgctagtg gtcgattcat tatatccatc tcgggtagaa 360

atgtgccacc aacttggggt aaaaggagct ccatttttca ctcactcttg tgctgttggt 420atgtgccacc aacttggggt aaaaggagct ccatttttca ctcactcttg tgctgttggt 420

gccatttatt ataatgctcg cttagggaaa ttgaagatac ctcctgagga agggttgact 480gccatttatt ataatgctcg cttagggaaa ttgaagatac ctcctgagga agggttgact 480

tctgtttcat tgccttcaat tccattgttg gggagaaatg atttgccaat tattaggact 540tctgtttcat tgccttcaat tccattgttg gggagaaatg atttgccaat tattaggact 540

ggcacctttc ctgatctctt tgagcatttg gggaatcagt tttcagatct tgataaagcg 600ggcacctttc ctgatctctt tgagcatttg gggaatcagt tttcagatct tgataaagcg 600

gattggatct ttttcaatac ttttgataag cttgaaaatg aggaagcaaa atggctatct 660gattggatct ttttcaatac ttttgataag cttgaaaatg aggaagcaaa atggctatct 660

agccaatggc caattacatc catcggacca ttaatccctt caatgtactt agacaaacaa 720agccaatggc caattacatc catcggacca ttaatccctt caatgtactt agacaaacaa 720

ttaccaaatg acaaagacaa tgacattaat ttctacaagg cagacgtcgg atcgtgcatc 780ttaccaaatg acaaagacaa tgacattaat ttctacaagg cagacgtcgg atcgtgcatc 780

aagtggctag acgccaaaga ccctggctcg gtagtctacg cctcattcgg gagcgtgaag 840aagtggctag acgccaaaga ccctggctcg gtagtctacg cctcattcgg gagcgtgaag 840

cacaacctcg gcgatgacta catggacgaa gtagcatggg gcttgttaca cagcaaatat 900cacaacctcg gcgatgacta catggacgaa gtagcatggg gcttgttaca cagcaaatat 900

cacttcatat gggttgttat agaatccgaa cgtacaaagc tctctagcga tttcttggca 960cacttcatat gggttgttat agaatccgaa cgtacaaagc tctctagcga tttcttggca 960

gaggcagagg aaaaaggcct aatagtgagt tggtgccctc aactcgaagt tttgtcacat 1020gaggcagagg aaaaaggcct aatagtgagt tggtgccctc aactcgaagt tttgtcacat 1020

aaatctatag gtagttttat gactcattgt ggttggaact cgacggttga ggcattgagt 1080aaatctatag gtagttttat gactcattgt ggttggaact cgacggttga ggcattgagt 1080

ttgggcgtgc caatggtggc agtgccacaa cagtttgatc agcctgttaa tgccaagtat 1140ttgggcgtgc caatggtggc agtgccacaa cagtttgatc agcctgttaa tgccaagtat 1140

atcgtggatg tatggcgaat tggggttcag gttccgattg gtgaaaatgg ggttcttttg 1200atcgtggatg tatggcgaat tggggttcag gttccgattg gtgaaaatgg ggttcttttg 1200

aggggagaag ttgctaactg tataaaggat gttatggagg gggaaatagg ggatgagctt 1260aggggagaag ttgctaactg tataaaggat gttatggagg gggaaatagg ggatgagctt 1260

agagggaatg ctttgaaatg gaaggggttg gctgtggagg caatggagaa agggggtagc 1320agagggaatg ctttgaaatg gaaggggttg gctgtggagg caatggagaa aggggggtagc 1320

tctgataaga atattgatga gttcatttca aagcttgttt cctcctga 1368tctgataaga atattgatga gttcatttca aagcttgttt cctcctga 1368

<210> 4<210> 4

<211> 455<211> 455

<212> PRT<212> PRT

<213> 人工序列<213> Artificial sequence

<400> 4<400> 4

Met Glu Arg Glu Met Leu Ser Lys Thr His Ile Met Phe Ile Pro PheMet Glu Arg Glu Met Leu Ser Lys Thr His Ile Met Phe Ile Pro Phe

1 5 10 151 5 10 15

Pro Ala Gln Gly His Met Ser Pro Met Met Gln Phe Val Lys Arg LeuPro Ala Gln Gly His Met Ser Pro Met Met Gln Phe Val Lys Arg Leu

20 25 30 20 25 30

Ala Trp Lys Gly Val Arg Ile Thr Ile Val Leu Pro Ala Glu Ile ArgAla Trp Lys Gly Val Arg Ile Thr Ile Val Leu Pro Ala Glu Ile Arg

35 40 45 35 40 45

Asp Ser Met Gln Ile Asn Asn Ser Leu Ile Asn Thr Glu Cys Ile SerAsp Ser Met Gln Ile Asn Asn Ser Leu Ile Asn Thr Glu Cys Ile Ser

50 55 60 50 55 60

Phe Asp Phe Asp Lys Asp Asp Glu Met Pro Tyr Ser Met Arg Ala TyrPhe Asp Phe Asp Lys Asp Asp Glu Met Pro Tyr Ser Met Arg Ala Tyr

65 70 75 8065 70 75 80

Met Gly Val Val Lys Leu Lys Val Thr Asn Lys Leu Ser Asp Leu LeuMet Gly Val Val Lys Leu Lys Val Thr Asn Lys Leu Ser Asp Leu Leu

85 90 95 85 90 95

Glu Lys Gln Lys Thr Asn Gly Tyr Pro Val Asn Leu Leu Val Val AspGlu Lys Gln Lys Thr Asn Gly Tyr Pro Val Asn Leu Leu Val Val Asp

100 105 110 100 105 110

Ser Leu Tyr Pro Ser Arg Val Glu Met Cys His Gln Leu Gly Val LysSer Leu Tyr Pro Ser Arg Val Glu Met Cys His Gln Leu Gly Val Lys

115 120 125 115 120 125

Gly Ala Pro Phe Phe Thr His Ser Cys Ala Val Gly Ala Ile Tyr TyrGly Ala Pro Phe Phe Thr His Ser Cys Ala Val Gly Ala Ile Tyr Tyr

130 135 140 130 135 140

Asn Ala Arg Leu Gly Lys Leu Lys Ile Pro Pro Glu Glu Gly Leu ThrAsn Ala Arg Leu Gly Lys Leu Lys Ile Pro Pro Glu Glu Gly Leu Thr

145 150 155 160145 150 155 160

Ser Val Ser Leu Pro Ser Ile Pro Leu Leu Gly Arg Asn Asp Leu ProSer Val Ser Leu Pro Ser Ile Pro Leu Leu Gly Arg Asn Asp Leu Pro

165 170 175 165 170 175

Ile Ile Arg Thr Gly Thr Phe Pro Asp Leu Phe Glu His Leu Gly AsnIle Ile Arg Thr Gly Thr Phe Pro Asp Leu Phe Glu His Leu Gly Asn

180 185 190 180 185 190

Gln Phe Ser Asp Leu Asp Lys Ala Asp Trp Ile Phe Phe Asn Thr PheGln Phe Ser Asp Leu Asp Lys Ala Asp Trp Ile Phe Phe Asn Thr Phe

195 200 205 195 200 205

Asp Lys Leu Glu Asn Glu Glu Ala Lys Trp Leu Ser Ser Gln Trp ProAsp Lys Leu Glu Asn Glu Glu Ala Lys Trp Leu Ser Ser Gln Trp Pro

210 215 220 210 215 220

Ile Thr Ser Ile Gly Pro Leu Ile Pro Ser Met Tyr Leu Asp Lys GlnIle Thr Ser Ile Gly Pro Leu Ile Pro Ser Met Tyr Leu Asp Lys Gln

225 230 235 240225 230 235 240

Leu Pro Asn Asp Lys Asp Asn Asp Ile Asn Phe Tyr Lys Ala Asp ValLeu Pro Asn Asp Lys Asp Asn Asp Ile Asn Phe Tyr Lys Ala Asp Val

245 250 255 245 250 255

Gly Ser Cys Ile Lys Trp Leu Asp Ala Lys Asp Pro Gly Ser Val ValGly Ser Cys Ile Lys Trp Leu Asp Ala Lys Asp Pro Gly Ser Val Val

260 265 270 260 265 270

Tyr Ala Ser Phe Gly Ser Val Lys His Asn Leu Gly Asp Asp Tyr MetTyr Ala Ser Phe Gly Ser Val Lys His Asn Leu Gly Asp Asp Tyr Met

275 280 285 275 280 285

Asp Glu Val Ala Trp Gly Leu Leu His Ser Lys Tyr His Phe Ile TrpAsp Glu Val Ala Trp Gly Leu Leu His Ser Lys Tyr His Phe Ile Trp

290 295 300 290 295 300

Val Val Ile Glu Ser Glu Arg Thr Lys Leu Ser Ser Asp Phe Leu AlaVal Val Ile Glu Ser Glu Arg Thr Lys Leu Ser Ser Asp Phe Leu Ala

305 310 315 320305 310 315 320

Glu Ala Glu Glu Lys Gly Leu Ile Val Ser Trp Cys Pro Gln Leu GluGlu Ala Glu Glu Lys Gly Leu Ile Val Ser Trp Cys Pro Gln Leu Glu

325 330 335 325 330 335

Val Leu Ser His Lys Ser Ile Gly Ser Phe Met Thr His Cys Gly TrpVal Leu Ser His Lys Ser Ile Gly Ser Phe Met Thr His Cys Gly Trp

340 345 350 340 345 350

Asn Ser Thr Val Glu Ala Leu Ser Leu Gly Val Pro Met Val Ala ValAsn Ser Thr Val Glu Ala Leu Ser Leu Gly Val Pro Met Val Ala Val

355 360 365 355 360 365

Pro Gln Gln Phe Asp Gln Pro Val Asn Ala Lys Tyr Ile Val Asp ValPro Gln Gln Phe Asp Gln Pro Val Asn Ala Lys Tyr Ile Val Asp Val

370 375 380 370 375 380

Trp Arg Ile Gly Val Gln Val Pro Ile Gly Glu Asn Gly Val Leu LeuTrp Arg Ile Gly Val Gln Val Pro Ile Gly Glu Asn Gly Val Leu Leu

385 390 395 400385 390 395 400

Arg Gly Glu Val Ala Asn Cys Ile Lys Asp Val Met Glu Gly Glu IleArg Gly Glu Val Ala Asn Cys Ile Lys Asp Val Met Glu Gly Glu Ile

405 410 415 405 410 415

Gly Asp Glu Leu Arg Gly Asn Ala Leu Lys Trp Lys Gly Leu Ala ValGly Asp Glu Leu Arg Gly Asn Ala Leu Lys Trp Lys Gly Leu Ala Val

420 425 430 420 425 430

Glu Ala Met Glu Lys Gly Gly Ser Ser Asp Lys Asn Ile Asp Glu PheGlu Ala Met Glu Lys Gly Gly Ser Ser Asp Lys Asn Ile Asp Glu Phe

435 440 445 435 440 445

Ile Ser Lys Leu Val Ser SerIle Ser Lys Leu Val Ser Ser

450 455 450 455

<210> 5<210> 5

<211> 28<211> 28

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 5<400> 5

ggatccatgg agagagaaat gttgagca 28ggatccatgg agagagaaat gttgagca 28

<210> 6<210> 6

<211> 28<211> 28

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 6<400> 6

ctcgagtcag gaggaaacaa gctttgaa 28ctcgagtcag gaggaaacaa gctttgaa 28

<210> 7<210> 7

<211> 56<211> 56

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 7<400> 7

tagctctgat aagaatattg atgagttcat ttcaaagctt gtttcctcct gagatt 56tagctctgat aagaatattg atgagttcat ttcaaagctt gtttcctcct gagatt 56

<210> 8<210> 8

<211> 69<211> 69

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 8<400> 8

actgtcaagg agggtattct gggcctccat gtcgctgcta tataacagtt gaaatttgga 60actgtcaagg agggtattct gggcctccat gtcgctgcta tataacagtt gaaatttgga 60

taagaacat 69taagaacat 69

<210> 9<210> 9

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 9<400> 9

gctatactgc tgtcgattcg atactaacgc cgccatccag tgtcgagaat tacaatagt 59gctatactgc tgtcgattcg atactaacgc cgccatccag tgtcgagaat tacaatagt 59

<210> 10<210> 10

<211> 44<211> 44

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 10<400> 10

tctggtgagg atttacggta tgatcatgct ggtaagcttc gtta 44tctggtgagg atttacggta tgatcatgct ggtaagcttc gtta 44

<210> 11<210> 11

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 11<400> 11

gaaaagaaga taatattttt atataattat attaatctca ggaggaaaca agctttgaa 59gaaaagaaga taatattttt atataattat attaatctca ggaggaaaca agctttgaa 59

<210> 12<210> 12

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 12<400> 12

gcatagcaat ctaatctaag ttttaattac aaaatggaga gagaaatgtt gagcaaaac 59gcatagcaat ctaatctaag ttttaattac aaaatggaga gagaaatgtt gagcaaaac 59

<210> 13<210> 13

<211> 50<211> 50

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 13<400> 13

cgcgttgaga agatgttctt atccaaattt caactgttat atagcagcga 50cgcgttgaga agatgttctt atccaaattt caactgttat atagcagcga 50

<210> 14<210> 14

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 14<400> 14

ttacttcttg cagacatcag acatactatt gtaattctcg acactggatg gcggcgtta 59ttacttcttg cagacatcag acatactatt gtaattctcg acactggatg gcggcgtta 59

<210> 15<210> 15

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 15<400> 15

ggctgtcgcc attcaagagc agatagcttc aaaatgtttc tactcctttt ttactcttc 59ggctgtcgcc attcaagagc agatagcttc aaaatgtttc tactcctttt ttactcttc 59

<210> 16<210> 16

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 16<400> 16

cataatgtga gttttgctca acatttctct ctccattttg taattaaaac ttagattag 59cataatgtga gttttgctca aatttctct ctccatttg taattaaaac ttagattag 59

<210> 17<210> 17

<211> 27<211> 27

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 17<400> 17

cccaaagcta agagtcccat tttattc 27cccaaagcta agagtcccat tttattc 27

<210> 18<210> 18

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 18<400> 18

gagtagaaac attttgaagc tatctgctct tgaatggcga cagcctattg ccccagtgt 59gagtagaaac attttgaagc tatctgctct tgaatggcga cagcctattg ccccagtgt59

<210> 19<210> 19

<211> 457<211> 457

<212> PRT<212> PRT

<213> 人工序列<213> Artificial sequence

<400> 19<400> 19

1 5 10 151 5 10 15

Pro Ala Gln Gly His Met Ser Pro Met Met Gln Phe Ala Lys Arg LeuPro Ala Gln Gly His Met Ser Pro Met Met Gln Phe Ala Lys Arg Leu

20 25 30 20 25 30

Ala Trp Lys Gly Leu Arg Ile Thr Ile Val Leu Pro Ala Gln Ile ArgAla Trp Lys Gly Leu Arg Ile Thr Ile Val Leu Pro Ala Gln Ile Arg

35 40 45 35 40 45

Asp Phe Met Gln Ile Thr Asn Pro Leu Ile Asn Thr Glu Cys Ile SerAsp Phe Met Gln Ile Thr Asn Pro Leu Ile Asn Thr Glu Cys Ile Ser

50 55 60 50 55 60

Phe Asp Phe Asp Lys Asp Asp Gly Met Pro Tyr Ser Met Gln Ala TyrPhe Asp Phe Asp Lys Asp Asp Gly Met Pro Tyr Ser Met Gln Ala Tyr

65 70 75 8065 70 75 80

85 90 95 85 90 95

Glu Lys Gln Arg Thr Asn Gly Tyr Pro Val Asn Leu Leu Val Val AspGlu Lys Gln Arg Thr Asn Gly Tyr Pro Val Asn Leu Leu Val Val Asp

100 105 110 100 105 110

115 120 125 115 120 125

130 135 140 130 135 140

145 150 155 160145 150 155 160

Ser Val Ser Leu Pro Ser Ile Pro Leu Leu Gly Arg Asp Asp Leu ProSer Val Ser Leu Pro Ser Ile Pro Leu Leu Gly Arg Asp Asp Leu Pro

165 170 175 165 170 175

180 185 190 180 185 190

195 200 205 195 200 205

210 215 220 210 215 220

225 230 235 240225 230 235 240

Leu Pro Asn Asp Lys Asp Asn Gly Ile Asn Phe Tyr Lys Ala Asp ValLeu Pro Asn Asp Lys Asp Asn Gly Ile Asn Phe Tyr Lys Ala Asp Val

245 250 255 245 250 255

260 265 270 260 265 270

275 280 285 275 280 285

290 295 300 290 295 300

305 310 315 320305 310 315 320

Glu Ala Glu Ala Glu Glu Lys Gly Leu Ile Val Ser Trp Cys Pro GlnGlu Ala Glu Ala Glu Glu Lys Gly Leu Ile Val Ser Trp Cys Pro Gln

325 330 335 325 330 335

Leu Gln Val Leu Ser His Lys Ser Ile Gly Ser Phe Met Thr His CysLeu Gln Val Leu Ser His Lys Ser Ile Gly Ser Phe Met Thr His Cys

340 345 350 340 345 350

Gly Trp Asn Ser Thr Val Glu Ala Leu Ser Leu Gly Val Pro Met ValGly Trp Asn Ser Thr Val Glu Ala Leu Ser Leu Gly Val Pro Met Val

355 360 365 355 360 365

Ala Leu Pro Gln Gln Phe Asp Gln Pro Ala Asn Ala Lys Tyr Ile ValAla Leu Pro Gln Gln Phe Asp Gln Pro Ala Asn Ala Lys Tyr Ile Val

370 375 380 370 375 380

Asp Val Trp Gln Ile Gly Val Arg Val Pro Ile Gly Glu Glu Gly ValAsp Val Trp Gln Ile Gly Val Arg Val Pro Ile Gly Glu Glu Gly Val

385 390 395 400385 390 395 400

Val Leu Arg Gly Glu Val Ala Asn Cys Ile Lys Asp Val Met Glu GlyVal Leu Arg Gly Glu Val Ala Asn Cys Ile Lys Asp Val Met Glu Gly

405 410 415 405 410 415

Glu Ile Gly Asp Glu Leu Arg Gly Asn Ala Leu Lys Trp Lys Gly LeuGlu Ile Gly Asp Glu Leu Arg Gly Asn Ala Leu Lys Trp Lys Gly Leu

420 425 430 420 425 430

Ala Val Glu Ala Met Glu Lys Gly Gly Ser Ser Asp Lys Asn Ile AspAla Val Glu Ala Met Glu Lys Gly Gly Ser Ser Asp Lys Asn Ile Asp

435 440 445 435 440 445

Glu Phe Ile Ser Lys Leu Val Ser SerGlu Phe Ile Ser Lys Leu Val Ser Ser

450 455 450 455

<210> 20<210> 20

<211> 1374<211> 1374

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 20<400> 20

cacatgagcc caatgatgca attcgccaag cgtttagcct ggaaaggcct gcgaatcacg 120cacatgagcc caatgatgca attcgccaag cgtttagcct ggaaaggcct gcgaatcacg 120

atagttcttc cggctcaaat tcgagatttc atgcaaataa ccaacccatt gatcaacact 180atagttcttc cggctcaaat tcgagatttc atgcaaataa ccaacccatt gatcaacact 180

gagtgcatct cctttgattt tgataaagac gatgggatgc catacagcat gcaggcttat 240gagtgcatct cctttgattt tgataaagac gatgggatgc catacagcat gcaggcttat 240

atgggagttg taaaactcaa ggtcacaaat aaactgagtg acctactcga gaagcaaaga 300atgggagttg taaaactcaa ggtcacaaat aaactgagtg acctactcga gaagcaaaga 300

acaaatggct accctgttaa tttgctagtg gttgattcat tatatccatc tcgggtagaa 360acaaatggct accctgttaa tttgctagtg gttgattcat tatatccatc tcgggtagaa 360

tctgtttcat tgccttcaat tccattgttg gggagagatg atttgccaat tattaggact 540tctgtttcat tgccttcaat tccattgttg gggagagatg atttgccaat tattaggact 540

ttaccaaatg acaaagacaa tggcattaat ttctacaagg cagacgtcgg atcgtgcatc 780ttaccaaatg acaaagacaa tggcattaat ttctacaagg cagacgtcgg atcgtgcatc 780

cacaacctcg gcgatgacta catggacgaa gtagcatggg gcttgttaca tagcaaatat 900cacaacctcg gcgatgacta catggacgaa gtagcatggg gcttgttaca tagcaaatat 900

gaggcagagg cagaggaaaa aggcctaata gtgagttggt gccctcaact ccaagttttg 1020gaggcagagg cagaggaaaa aggcctaata gtgagttggt gccctcaact ccaagttttg 1020

tcacataaat ctatagggag ttttatgact cattgtggtt ggaactcgac ggttgaggca 1080tcacataaat ctataggggag ttttatgact cattgtggtt ggaactcgac ggttgaggca 1080

ttgagtttgg gcgtgccaat ggtggcactg ccacaacagt ttgatcagcc tgctaatgcc 1140ttgagtttgg gcgtgccaat ggtggcactg ccacaacagt ttgatcagcc tgctaatgcc 1140

aagtatatcg tggatgtatg gcaaattggg gttcgggttc cgattggtga agagggggtt 1200aagtatatcg tggatgtatg gcaaattggg gttcgggttc cgattggtga agagggggtt 1200

gttttgaggg gagaagttgc taactgtata aaggatgtta tggaggggga aataggggat 1260gttttgaggg gagaagttgc taactgtata aaggatgtta tggaggggga aataggggat 1260

gagcttagag ggaatgcttt gaaatggaag gggttggctg tggaggcaat ggagaaaggg 1320gagcttagag ggaatgcttt gaaatggaag gggttggctg tggaggcaat ggagaaaggg 1320

ggtagctctg ataagaatat tgatgagttc atttcaaagc ttgtttcctc ctga 1374ggtagctctg ataagaatat tgatgagttc atttcaaagc ttgtttcctc ctga 1374

<210> 21<210> 21

<211> 457<211> 457

<212> PRT<212> PRT

<213> 人工序列<213> Artificial sequence

<400> 21<400> 21

1 5 10 151 5 10 15

20 25 30 20 25 30

35 40 45 35 40 45

50 55 60 50 55 60

65 70 75 8065 70 75 80

85 90 95 85 90 95

100 105 110 100 105 110

115 120 125 115 120 125

130 135 140 130 135 140

145 150 155 160145 150 155 160

165 170 175 165 170 175

180 185 190 180 185 190

195 200 205 195 200 205

Asp Lys Leu Glu Asn Glu Glu Ala Lys Trp Leu Ser Ser His Trp ProAsp Lys Leu Glu Asn Glu Glu Ala Lys Trp Leu Ser Ser His Trp Pro

210 215 220 210 215 220

225 230 235 240225 230 235 240

245 250 255 245 250 255

260 265 270 260 265 270

275 280 285 275 280 285

290 295 300 290 295 300

305 310 315 320305 310 315 320

Glu Val Glu Ala Glu Glu Lys Gly Leu Ile Val Ser Trp Cys Pro GlnGlu Val Glu Ala Glu Glu Lys Gly Leu Ile Val Ser Trp Cys Pro Gln

325 330 335 325 330 335

340 345 350 340 345 350

355 360 365 355 360 365

370 375 380 370 375 380

385 390 395 400385 390 395 400

405 410 415 405 410 415

420 425 430 420 425 430

435 440 445 435 440 445

Glu Phe Ile Ser Lys Leu Val Ser SerGlu Phe Ile Ser Lys Leu Val Ser Ser

450 455 450 455

<210> 22<210> 22

<211> 1374<211> 1374

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 22<400> 22

agccattggc caattacatc catcggacca ttaatccctt caatgtactt agacaaacaa 720agccattggc caattacatc catcggacca ttaatccctt caatgtactt agacaaacaa 720

gaggtagagg cagaggaaaa aggcctaata gtgagttggt gccctcaact ccaagttttg 1020gaggtagagg cagaggaaaa aggcctaata gtgagttggt gccctcaact ccaagttttg 1020

<210> 23<210> 23

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 23<400> 23

<210> 24<210> 24

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 24<400> 24

<210> 25<210> 25

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 25<400> 25

acactggggc aataggctgt cgccattcaa gagcagatag cttcaaaatg tttctactc 59acactggggc aataggctgt cgccattcaa gagcagatag cttcaaaatg tttctactc 59

<210> 26<210> 26

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 26<400> 26

<210> 27<210> 27

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 27<400> 27

ttgatgagtt catttcaaag cttgtttcct cctgagatta atataattat ataaaaata 59ttgatgagtt catttcaaag cttgtttcct cctgagatta atataattat ataaaaata 59

<210> 28<210> 28

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 28<400> 28

actgtcaagg agggtattct gggcctccat gtcgctgcta tataacagtt gaaatttgg 59actgtcaagg agggtattct gggcctccat gtcgctgcta tataacagtt gaaatttgg 59

<210> 29<210> 29

<211> 27<211> 27

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 29<400> 29

cccaaagcta agagtcccat tttattc 27cccaaagcta agagtcccat tttattc 27

<210> 30<210> 30

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 30<400> 30

gaagagtaaa aaaggagtag aaacattttg aagctatctg ctcttgaatg gcgacagcc 59gaagagtaaa aaaggagtag aaacattttg aagctatctg ctcttgaatg gcgacagcc 59

<210> 31<210> 31

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 31<400> 31

tgtcgattcg atactaacgc cgccatccag tgtcgagaat tacaatagta tgtctgatg 59tgtcgattcg atactaacgc cgccatccag tgtcgagaat tacaatagta tgtctgatg 59

<210> 32<210> 32

<211> 22<211> 22

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 32<400> 32

tctggtgagg atttacggta tg 22tctggtgagg atttacggta tg 22

<210> 33<210> 33

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 33<400> 33

aagatgttct tatccaaatt tcaactgtta tatagcagcg acatggaggc ccagaatac 59aagatgttct tatccaaatt tcaactgtta tatagcagcg acatggaggc ccagaatac 59

<210> 34<210> 34

<211> 59<211> 59

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 34<400> 34

tacttcttgc agacatcaga catactattg taattctcga cactggatgg cggcgttag 59tacttcttgc agacatcaga catactattg taattctcga cactggatgg cggcgttag 59

Claims

1. a kind of isolated polypeptide, which is characterized in that the amino acid sequence of the isolated polypeptide is corresponding to SEQ ID NO: 222nd amino acid residue of amino acid sequence shown in 19 is non-Gln and/or is corresponding to SEQ ID NO:Amino shown in 19 The amino acid residue that acid sequence is the 322nd is non-Ala.

2. a kind of isolated polypeptide, which is characterized in that the polypeptide is selected from the group：

(a) there is SEQ ID NO.:4 or SEQ ID NO.:The polypeptide of amino acid sequence shown in 21；

(b) by SEQ ID NO.:4 or SEQ ID NO.:The polypeptide of amino acid sequence shown in 21 passes through one or several amino acid Residue, preferably 1-20 1-15 more preferable, 1-10 more preferable, 1-3 more preferable, most preferably 1 amino acid residue take Generation, missing or addition and formed or addition signal peptide sequence after formed and with glycosyl transferase activity derivative it is more Peptide；

(c) in sequence containing (a) or (b) described in polypeptide sequence derived peptides；

(d) amino acid sequence and SEQ ID NO.:4 or SEQ ID NO.:Amino acid sequence shown in 21 homology >=85% (compared with Goodly >=90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99%), and have glycosyl transferase living The derived peptides of property.

3. a kind of isolated polynucleotides, which is characterized in that the polynucleotides are sequence selected from the group below：

(A) nucleotide sequence of polypeptide as claimed in claim 1 or 2 is encoded；

(B) coding such as SEQ ID NO.:4 or SEQ ID NO.:The nucleotide sequence of polypeptide or its derived peptides shown in 21；

(C) such as SEQ ID NO.:3 or SEQ ID NO.:Nucleotide sequence shown in 22；

(D) with SEQ ID NO.:3 or SEQ ID NO.:Homology >=90% of sequence shown in 22 is (preferably >=91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or nucleotide sequence 99%)；

(E) in SEQ ID NO.:3 or SEQ ID NO.:5 ' the ends and/or 3 ' ends of nucleotide sequence shown in 22 truncate or addition 1- 60 (preferably 1-30, more preferably 1-10) nucleotide are formed by nucleotide sequence；

(F) nucleotide sequence complementary with (A)-(E) any nucleotide sequence.

4. a kind of carrier, which is characterized in that the carrier contains polynucleotides as claimed in claim 3.

5. the purposes of isolated polypeptide as claimed in claim 1 or 2, which is characterized in that it be used to be catalyzed following reaction, or by with The following catalyst formulations reacted are catalyzed in preparation：

(i) glycosyl from glycosyl donor is transferred on the position the C-3 hydroxyl of tetracyclic triterpenoid.

6. purposes as claimed in claim 5, which is characterized in that the isolated polypeptide is for being catalyzed following reactions or being used for Preparation is catalyzed the catalyst formulations of following reactions：

Wherein, R1 is H or OH；R2 is H or OH；R3 is H or glycosyl；R4 is glycosyl.

7. a kind of external glycosylation process, which is characterized in that including step：

In the presence of glycosyl transferase, the glycosyl of glycosyl donor is transferred on the position the C-3 hydroxyl of tetracyclic triterpenoid；From And form glycosylated tetracyclic triterpenoid；

Wherein, the glycosyl transferase is polypeptide of any of claims 1 or 2 or its derived peptides.

8. the method for claim 7, which is characterized in that the derived peptides are selected from：

By SEQ ID NO.:4 or SEQ ID NO.:The polypeptide of amino acid sequence shown in 21 passes through one or several amino acid residues Replace, miss or add and form or add formed after signal peptide sequence and spreading out with glycosyl transferase activity Raw polypeptide；Or

Amino acid sequence and SEQ ID NO.:4 or SEQ ID NO.:21 amino acid sequences homology >=85% (preferably >= 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%), and the derivative with glycosyl transferase activity 90%, Polypeptide；

Wherein, the glycosyl transferase activity refers to the position C-3 that the glycosyl of glycosyl donor can be transferred to tetracyclic triterpenoid Activity on hydroxyl.

9. a kind of method for carrying out glycosyl catalysis reaction, which is characterized in that including step：In polypeptide of any of claims 1 or 2 Or under the conditions of its derived peptides is existing, glycosyl catalysis reaction is carried out.

10. method as claimed in claim 9, which is characterized in that the substrate of the glycosyl catalysis reaction is formula (I) compound, And the product is formula (II) compound.

11. a kind of genetically engineered host cell, which is characterized in that the host cell contains as claimed in claim 4 Polynucleotides as claimed in claim 3 are integrated in carrier or its genome.

12. the purposes of host cell described in claim 11, which is characterized in that be used to prepare enzymatic reagent, or production sugar Based transferase as activated cell or generates glycosylated tetracyclic triterpenoid.

13. a kind of method for generating genetically modified plants, which is characterized in that including step：By hereditary work described in claim 11 The host cell of journey is regenerated as plant, and the genetically engineered host cell is plant cell.