CN110117582B

CN110117582B - Fusion protein, encoding gene thereof and application thereof in biosynthesis

Info

Publication number: CN110117582B
Application number: CN201910392068.1A
Authority: CN
Inventors: 刘春生; 李妍芃; 尹艳; 高伟; 姜丹
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-05-13
Filing date: 2019-05-13
Publication date: 2020-12-29
Anticipated expiration: 2039-05-13
Also published as: CN110117582A

Abstract

The invention relates to a fusion protein, which comprises chalcone synthetase and chalcone reductase which are connected through a linker GGGS, a gene for encoding the fusion protein and recombinant engineering bacteria containing the gene.

Description

Fusion protein, encoding gene thereof and application thereof in biosynthesis

Technical Field

The invention relates to a fusion protein, a gene encoding the fusion protein, an expression vector and a recombinant strain containing the gene encoding the fusion protein, and application of the fusion protein or the recombinant strain in synthesis of isoliquiritigenin, belonging to the field of the biology of synthesis of medicinal components.

Background

As an ancient botanical drug with thousands of calendar histories, licorice (Glycyrrhizae Radix et Rhizoma) is one of the most common bulk rare or endangered medicinal materials, shows good and safe prevention and treatment effects on various diseases, and almost appears in all Chinese medicine prescriptions, patent medicines, more and more health care products and even foods. The isoliquiritigenin is chalcone with high content in licorice and a common natural pigment, has a simple structure, and is found to have a plurality of significant activities such as anti-inflammation, anti-cancer, antihistamine, anti-oxidation, anti-platelet aggregation, anti-cancer, anti-allergy, anti-virus, estrogen-like and the like, wherein the anticancer activity is prominent, and the isoliquiritigenin can inhibit the proliferation of a plurality of cancer cells and induce the apoptosis of the cancer cells. Its glycoside compound isoliquiritin can inhibit the neogenesis of tumor blood vessel, and has antidepressant and antioxidant effects. These indicate that isoliquiritigenin has better development and application prospects in the aspects of cancer treatment and the like. However, the content of isoliquiritigenin in the plants in liquorice is limited, and the storage amount of wild liquorice in China is less than 50 ten thousand as early as 2009 due to the huge demand of the market. The plunder digging of wild liquorice and the like not only causes great reduction of the yield of the liquorice, but also causes the damage and desertification of the environment of a liquorice planting area, and the liquorice population is difficult to recover after being damaged. In order to relieve the shortage of medicine sources, a large number of ways of synthesizing isoliquiritigenin by a chemical method exist, but the chemical synthesis method needs a large amount of organic solvents, is not favorable for environmental protection and has certain explosion risk. Therefore, tissue cell culture and other methods have been developed to obtain isoliquiritigenin, but the yield is too low and the time is too long. Heterologous biosynthesis of isoliquiritigenin becomes an effective strategy for sustainable development of liquorice resources. Isoliquiritigenin belongs to 5-oxidative chalcone, whose biosynthetic pathway starts with the phenylpropane pathway, and phenylalanine is converted into coumaroyl-CoA under the stepwise catalysis of phenylalanine ammonia-lyase (PAL), cinnamic acid 4-hydroxylase (C4H) and coumaroyl-CoA ligase (4-coumarate: CoA ligase, 4CL), and then 3 molecules of malonyl-CoA and 1 molecule of coumaroyl-CoA are produced under the action of chalcone synthase (CHS) and chalcone reductase (CHR). Cloning of isoliquiritigenin biosynthetic pathway genes (PAL, C4H,4CL, CHS and CHR) in licorice provides an important basis for mass production of isoliquiritigenin as an active ingredient by fermentation engineering.

Disclosure of Invention

Specifically, the first aspect of the present invention relates to a fusion protein comprising:

(1) chalcone synthase (CHS); and

(2) chalcone reductase (CHR);

the chalcone synthetase and the chalcone reductase are connected through a linker, and the linker is any one of GGGS, GSG, GSGGS, GSGEAAAK, GSGEAAAKEAAAK or GSGMGSSSN.

In a particular embodiment, the invention relates to a fusion protein comprising the amino acid sequence of SEQ ID NO: 8 and the amino acid sequence of SEQ ID NO: 10, the CHS and the CHR are connected through a linker, the linker is any one of GGGS, GSG, GSGGGGS, GSGEAAAK, GSGEAAAKEAAAK or GSGMGSSSN, such as GGGS.

In a second aspect of the invention, the invention also relates to a polynucleotide sequence encoding the fusion protein of the invention, said polynucleotide sequence comprising: SEQ ID NO: 7 and a chalcone synthase encoding gene (CHS) shown in SEQ ID NO: 9 the chalcone reductase-encoding gene (CHR), preferably, the CHS and the CHR are linked by a linker sequence, such as the linker sequence GGTGGTGGTTCT, more particularly, CHS:: CHR, formed at the 3 'end of the CHS (after removal of the stop codon TGA shown in SEQ ID NO: 7) and linked to the 5' end of the CHR by the linker sequence GGTGGTGGTTCT.

The third aspect of the present invention also relates to a recombinant expression vector comprising a promoter, a polynucleotide encoding the fusion protein of the present invention, and a transcription terminator, wherein the expression vector is preferably a yeast expression vector comprising the promoter, the polynucleotide encoding the fusion protein of the present invention, the terminator, and an episomal vector spliced together by homologous recombination in yeast, wherein the episomal vector is selected from the group consisting of pESC, pYX212, pyes2.0, pRS425, pRS426, and p 424; preferably a pESC expression vector selected from pESC-Leu, pESC-His or pESC-Trp. .

In a preferred embodiment, the carrier of the present invention may be any one selected from the group consisting of:

a recombinant expression vector pYM3 comprising a CHS and a CHR, the CHS being linked at the 3 'end (after removal of the stop codon TGA from the nucleotide sequence shown in SEQ ID NO: 7) to the 5' end of the CHR via a linker sequence GGTGGTGGTTCT to form a CHS:: CHR, the CHS:: CHR being inserted downstream of the promoter GAL10 of expression vector pESC-Leu;

a recombinant expression vector, pYM2, comprising a nucleotide sequence encoding SEQ ID NO: 6 (4CL) and CHR, wherein the CHS is connected to the 5 'end of the CHR through a linker sequence GGTGGTGGTTCT at the 3' end (after the nucleotide sequence shown in SEQ ID NO: 7 is removed of a stop codon TGA) to form CHS, the CHS is inserted into the downstream of a promoter GAL1 of a yeast expression vector pESC-Leu, and the CHS is inserted into the downstream of a promoter GAL10 of the expression vector pESC-Leu; the code of SEQ ID NO: 6 can be the coumaroyl-CoA ligase encoding gene of the amino acid sequence shown as SEQ ID NO: 5.

In the fourth aspect of the present invention, the present invention also relates to a recombinant engineered yeast strain, which comprises a polynucleotide sequence encoding the fusion protein of the present invention or the recombinant expression vector of the fourth aspect of the present invention.

In the fifth aspect of the present invention, a preferred recombinant engineered yeast bacterium is provided, which further comprises an expression vector pYM1 in the fourth engineered yeast bacterium of the present invention, wherein the expression vector pYM1 comprises a nucleotide sequence encoding SEQ ID NO: 2 and a phenylalanine ammonia lyase coding gene (PAL) encoding the amino acid sequence shown in SEQ ID NO: 4 (C4H), the PAL is inserted downstream of a promoter GAL1 of an expression vector pESC-His, and the C4H is inserted downstream of a promoter GAL10 of the expression vector pESC-His.

Specifically, the nucleic acid sequence encoding SEQ ID NO: 2 can be a phenylalanine ammonia lyase encoding gene (PAL) having an amino acid sequence as set forth in SEQ ID NO: 1; encoding the amino acid sequence of SEQ ID NO: 4 (C4H) can be the amino acid sequence shown in SEQ ID NO: 3.

The sixth aspect of the invention relates to a method for constructing the recombinant engineered yeast strain of the invention, which comprises the following steps:

transferring the recombinant expression vector pYM3 into a yeast engineering bacterium WAT11 to obtain a strain WM 4; or

Transferring the recombinant expression vector pYM2 into a yeast engineering bacterium WAT11 to obtain a strain WM 3; or

Transferring the recombinant expression vectors pYM1 and pYM2 into a yeast engineering bacterium WAT11 to obtain a strain WM 2-1; or

The recombinant expression vectors pYM1, pYM2 and pYM3 are transferred into the yeast engineering bacteria WAT11 to obtain the strain WM 2-2.

The seventh aspect of the present invention relates to the use of the fusion protein of the present invention, or the polynucleotide encoding the fusion protein of the present invention, or the recombinant expression vector comprising the polynucleotide of the present invention, or the recombinant engineered yeast strain of the present invention for the production of chalcone or isoliquiritigenin, wherein after the strain obtained by galactose-induced fermentation is extracted with ethyl acetate, and the isoliquiritigenin can be detected by LC-MS detection.

The invention also relates to the application of the PAL, C4H,4CL, CHS or CHR gene in saccharomyces cerevisiae fermentation, in particular to the preparation of isoliquiritigenin and other flavonoid biosynthetic intermediates by fermentation engineering. Further, the other flavonoid biosynthesis intermediates are cinnamic acid, p-coumaric acid, p-coumaroyl coenzyme A and naringenin chalcone.

The invention can generate isoliquiritigenin and flavonoid biosynthetic intermediates, namely cinnamic acid, p-coumaric acid, p-coumaroyl coenzyme A and naringenin chalcone, by a biosynthesis technology, and has good application prospect.

The invention also provides application of the genes of the phenylalanine ammonia lyase, the cinnamic acid 4-hydroxylase, the coumaroyl-CoA ligase, the chalcone synthetase and the chalcone reductase or the genes of the phenylalanine ammonia lyase, the cinnamic acid 4-hydroxylase, the coumaroyl-CoA ligase, the chalcone synthetase and the chalcone reductase which are used for encoding the genes of the phenylalanine ammonia lyase, the cinnamic acid 4-hydroxylase, the coumaroyl-CoA ligase, the chalcone synthetase and the chalcone reductase in plant breeding containing flavonoid chemical components. The content of flavone in a plant body can be improved by applying the phenylalanine ammonia lyase, the cinnamic acid 4-hydroxylase, the coumaroyl-CoA ligase, the chalcone synthetase and the chalcone reductase or encoding genes thereof to a plant cell.

Drawings

FIG. 1 is an analysis diagram of catalytic products of different genes encoding proteins. EIC (isolated ion chromatography) is ion flow diagram extracted from the plant, wherein M/

z

164, 147, 163, 271 and 255 are phenylalanine, cinnamic acid, p-coumaric acid, naringenin chalcone and isoliquiritigenin [ M-H ] respectively]^-The nuclear to cytoplasmic ratio of (c).

FIG. 2 shows the analysis of the fermentation products of recombinant yeasts WM1, WM2-1 and WM 2-2. In FIG. 2A, M/

z

163 and 255 are p-coumaric acid and isoliquiritigenin [ M-H ] respectively]^-The nuclear to cytoplasmic ratio of (c).

FIG. 3 shows the analysis chart of the fermentation products of recombinant yeasts WM3, WM4 and WM 5. In the figure, M/

z

Detailed Description

Various aspects and features of the present invention are described in detail below in the preferred embodiments and with reference to the figures, it should be understood by those skilled in the art that these embodiments are merely illustrative and not restrictive of the scope of the invention. Various modifications and improvements to the various aspects of the invention may be made by those skilled in the art without departing from the scope of the claims, and these modifications and improvements are within the scope of the invention. For example, the replacement of the expression vectors and host bacteria used in the examples with other expression vectors and host bacteria commonly used in the art is understood and accomplished by those of ordinary skill in the art.

The experimental procedures used in the following examples are all conventional procedures unless otherwise specified.

Materials, reagents and the like used in the following examples are commercially available unless otherwise specified.

Example 1 cloning of Gene of isoliquiritigenin biosynthetic pathway in Glycyrrhiza uralensis

1. Primer design

Obtaining a gene full-length sequence fragment by annotation screening according to Ural liquorice transcriptome data, and designing upstream and downstream cloning primers, wherein the primer sequences are as follows:

PCR amplification

Glycyrrhiza uralensis Fisch RNA was reverse-transcribed into cDNA using QuantScript RT Kit (Tiangen Biochemical technology Co., Ltd., Beijing, China).

PCR amplification was performed using cDNA as a template.

The amplification system is as follows: 2 XKAPA HiFi Hotstart ReadyMix (Kapa Biosystems, Wilmington, USA) 25. mu.L, 1.5. mu.L each of primers P1 and P2, 2. mu.L template, and 50. mu.L double distilled water. Reaction conditions are as follows: pre-denaturation at 98 deg.C for 3min, annealing at 98 deg.C for 20s, annealing at 62 deg.C for 15s, extension at 72 deg.C for 1.5min, extension at 72 deg.C for 5min after 35 cycles, and storing at 4 deg.C.

Sequencing results show that the sequences of PCR amplification products are respectively shown as SEQ ID No.1, 3, 5, 7 and 9, the genes shown as the

sequences

1, 3, 5, 7 and 9 are respectively named as PAL, C4H,4CL, CHS and CHR, the encoded proteins are respectively named as PAL, C4H,4CL, CHS and CHR, and the amino acid sequences of the corresponding proteins are respectively shown as SEQ ID No.2, 4, 6, 8 and 10.

In the examples below, the genes PAL, C4H,4CL, CHS and CHR and the proteins PAL, C4H,4CL, CHS and CHR, are all identical to the corresponding nucleic acid or amino acid sequences above.

Example 2 prokaryotic expression of PAL, CHS and CHR genes and in vitro enzymatic reactions

1. Prokaryotic expression

PAL, CHS and CHR were inserted between KpnI and XhoI of pET-32a (+) using EasyGeno Assembly Cloning kit (Tiangen Biochemical technology Co., Ltd., Beijing, China), respectively, and transferred into E.coli BL21(DE 3). Transformants were screened on LB plates containing 100mg/mL ampicillin and single colonies picked for sequencing validation. Recombinant expression cells were shake-cultured in 200mL LB medium containing 100mg/mL ampicillin at 37 ℃ until OD600 became 0.6-1.0, and induced with 0.2mM IPTG for 10h at 16 ℃. The cells were collected by centrifugation at 5000rpm and 4 ℃. 3mL of PBS buffer (pH 8.0) was used to resuspend the cells, sonicate the cells in an ice bath, and the supernatant was collected by centrifugation. The recombinant protein was purified by the method of Bradford (Kangji Biotech Co., Ltd., Beijing, China) and the concentration was measured.

1. In vitro enzymatic reactions

1mL of in vitro enzymatic reaction systems each included:

recombinant protein PAL about 50ng/μ L (10mM PBS, pH 8.0), Dithiothreitol (DTT)1mM, substrate phenylalanine 1 mM;

recombinant protein CHS approximately 50 ng/. mu.l (10mM PBS, pH 8.0), DTT 1mM, substrate 1mM (coumaroyl-coa and malonyl-coa in a molar mass ratio of 1: 3);

recombinant proteins CHS and CHR about 50 ng/. mu.l (10mM PBS, pH 8.0), DTT 1mM, substrate 1mM (coumaroyl-coa and malonyl-coa in a molar mass ratio of 1: 3);

recombinant proteins CHS and CHR about 50 ng/. mu.l (10mM PBS, pH 8.0), DTT 1mM, substrate 1mM (coumaroyl-coa and malonyl-coa in a molar mass ratio of 1: 3), and, 1mM NADPH;

each system was incubated at 30 ℃ for 12h and the reaction was stopped by adding 200. mu.L of methanol. 12000g of the catalytic product was centrifuged and the supernatant was filtered through a 0.22 μm PTFE filter and examined by HPLC-MS. The results are shown in FIG. 1A and FIG. 1D. Compared with the unloaded control, the recombinant PAL1 protein with phenylalanine as the substrate produces cinnamic acid in an in vitro enzymatic system (FIG. 1A), so that PAL1 is considered to have the activity of catalyzing phenylalanine to generate cinnamic acid.

In an in vitro enzymatic system taking coumaroyl coenzyme A and malonyl coenzyme A in a molar mass ratio of 1:3 as substrates, naringenin chalcone is generated when only CHS is added for incubation; when CHS, CHR recombinant protein and NADPH were added in equal amounts at the same time, the production of isoliquiritigenin was detected in addition to naringenin chalcone (FIG. 1D); the chalcone synthetase and chalcone reductase activities of CHS and CHR were determined.

However, when CHS and CHR recombinant proteins were added in equal amounts at the same time, but NADPH was not added to the substrate, no production of isoliquiritigenin could be detected.

Example 3 Yeast expression and characterization of the C4H and 4CL genes

Yeast expression vector construction, using EasyGeno Assembly Cloning kit (Tiangen Biotechnology science, Inc., Beijing, China), C4H or CHS was inserted between SpeI and NotI sites of pESC-His (Agilent science, Inc., Santa Clara, USA), respectively, or pESC-Leu (Agilent science, Inc., Santa Clara, USA), respectively, and 4CL was inserted between NheI and BamHI sites of the recombinant vector to which CHS had been ligated. The recombinant plasmid was transformed into the host strain WAT11 using a yeast transformation kit (Zymo Research Corporation, Irvine, USA), and the transformants were screened and cultured on the corresponding defect medium (SC-His or-Leu, 2% glucose and 2% agar) at 30 ℃ for 4 days. Positive clones were plated out in the corresponding liquid-deficient medium (2% glucose) and shaken to OD600 of about 0.8 at 30 ℃. The induction culture medium of glucose was replaced by 2% galactose, and after induction expression at 30 ℃ and 220rpm for 6 hours, 20. mu.M cinnamic acid or p-coumaric acid was added to continue the culture for 12 hours. Extracting the culture solution with equal volume of ethyl acetate for three times, evaporating the solvent from the extract, re-dissolving with methanol, filtering with 0.22 μm PTFE filter membrane, and detecting the product under HPLC-MS anion mode. The results are shown in FIG. 1B and FIG. 1C. P-coumaric acid was detected in the culture broth after feeding recombinant yeast containing C4H with cinnamic acid and inducing expression with galactose (FIG. 1B), indicating that C4H catalyzes the production of p-coumaric acid from cinnamic acid. Because the catalytic product of 4CL is unstable, 4CL and CHS are co-expressed by using a binary expression vector pESC-Leu in WAT11, naringenin chalcone can be detected in the culture solution extract by feeding p-coumaric acid (figure 1C), which shows that 4CL can utilize coenzyme A endogenous in yeast to combine with p-coumaric acid to generate coumaroyl coenzyme A and provide a substrate for CHS to synthesize chalcone.

Example 4 construction of Yeast engineering bacteria producing Isoliquiritigenin

Using the EasyGeno Assembly Cloning kit (Tiangen Biochemical technology Co., Ltd., Beijing, China):

(1) the gene PAL was inserted into the downstream of the promoter GAL1 of the binary yeast expression vector pESC-His, and the gene C4H was inserted into the downstream of the promoter GAL10 of the binary yeast expression vector pESC-His to obtain plasmid YM1(pHIS-^GAL1PAL-^GAL10C4H)；

(2) The gene 4CL is inserted into the downstream of the promoter GAL1 of the binary yeast expression vector pESC-Leu to fuse the gene CHSThe HR was inserted into the downstream of GAL10 promoter of binary yeast expression vector pESC-Leu (gene CHS:: CHR is a fusion gene (SEQ ID NO: 21) formed by connecting the 3 'end of CHS after removal of the stop codon TGA to the 5' end of CHR through linker sequence GGTGGTGGTTCT) to obtain plasmid pYM2(pLEU-^GAL14CL-^GAL10CHS::CHR)；

(3) The fusion gene CHS is inserted into the downstream of the binary yeast expression vector pESC-TrpGAL10 promoter to obtain the plasmid pYM3(pTRP-^GAL10CHS::CHR)。

(4) The gene CHS is inserted into the downstream of a promoter GAL1 of a binary yeast expression vector pESC-His, and the gene CHR is inserted into the downstream of a promoter GAL10 of the binary yeast expression vector pESC-His to obtain a plasmid pYM4(pHIS-^GAL1CHS-^GAL10CHR)

Using a yeast transformation kit (Zymo Research Corporation, Irvine, USA), the following were each:

(1) transferring the recombinant expression vector pYM1 into a yeast engineering bacterium WAT11 to obtain a transformant 1;

(2) transferring the recombinant expression vectors pYM1 and pYM2 into the engineered yeast WAT11 to obtain a transformant 2;

(3) transferring the recombinant expression vectors pYM1, pYM2 and pYM3 into the yeast engineering bacteria WAT11 to obtain a transformant 3;

(4) transferring the recombinant expression vector pYM2 into a yeast engineering bacterium WAT11 to obtain a transformant 4;

(5) transferring the recombinant expression vector pYM3 into a yeast engineering bacterium WAT11 to obtain a transformant 5;

(6) transferring the recombinant expression vector pYM4 into a yeast engineering bacterium WAT11 to obtain a transformant 6;

each transformant was selected and cultured at 30 ℃ for 4 days on the corresponding defect medium (SC-His (transformant 1, transformant 6), SC-His-Lue (transformant 2), SC-His-Lue-Trp (transformant 3), SC-Leu (transformant 4) or SC-Trp (transformant 5), 2% glucose and 2% agar) according to the vector tag. The obtained recombinant yeast strains for positive cloning were designated WM1 (containing pYM1), WM2-1 (containing pYM1 and pYM2), WM2-2 (containing pYM1, pYM2 and pYM3), WM3 (containing pYM2), WM4 (containing pYM3) and WM5 (containing pYM4), respectively.

Recombinant yeasts were shaken to OD600 of about 0.8 in the corresponding liquid deficient medium (SC-His (WM1), SC-His-Lue (WM2-1) or SC-His-Lue-Trp (WM2-2), 2% glucose, respectively). The induction medium of glucose was replaced with 2% galactose, and the expression was induced at 30 ℃ and 220rpm for 12-48 h. Extracting the culture solution with equal volume of ethyl acetate for three times, evaporating the solvent from the extract, re-dissolving with methanol, filtering with 0.22 μm PTFE filter membrane, and detecting the product under HPLC-MS anion mode. The results are shown in FIG. 2. The recombinant yeast WM1 can produce p-coumaric acid under galactose induction fermentation (figure 2A), and the yield of the p-coumaric acid is detected to be 7.59umol/L after about 36h of culture (figure 2B). When pYM2 was transferred into WM1, a small amount of isoliquiritigenin could be detected in the fermentation broth (FIG. 2, WM 2-1). And over-expression of CHS, the yield of isoliquiritigenin of WM2-2 was increased by 18.2 times compared with WM2-1 by CHR (FIG. 2B).

The recombinant yeasts WM3, WM4 and WM5 correspond to liquid deficient culture medium (SC-Leu, SC-Trp or SC-His, 2% glucose) in sequence, and are shaken at 30 ℃ until OD600 is about 0.8. Replacing the induction culture medium of glucose with 2% galactose, carrying out induction expression at 30 ℃ and 220rpm for 6h, adding 20 μ M p-coumaric acid to continue culturing WM3, adding 20 μ M coumaroyl-coenzyme A to continue culturing WM4 and WM 5. Extracting the culture solution with equal volume of ethyl acetate for three times after 12h, evaporating the solvent from the extract, redissolving with methanol, filtering with 0.22 μm PTFE filter membrane, and detecting the product under HPLC-MS anion mode. The results showed that isoliquiritigenin could also be detected in the broth extract when fed with p-coumaric acid (WM3) or coumaroyl-coa (WM4), whereas no isoliquiritigenin could be obtained when fed with coumaroyl-coa to WM5 (fig. 3).

The above description is not intended to limit the present invention, and the present invention is not limited to the above examples. Those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

SEQUENCE LISTING

<110> Liu, Chunsheng

<120> fusion protein, encoding gene thereof and application thereof in biosynthesis

<130> background Art

<160> 21

<170> PatentIn version 3.5

<210> 1

<211> 2352

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 1

atgccccatt ctctctccct cttcccttca ggaaatcctt tgattccccc cacaaccaaa 60

gatttcctct cctctcatca actactacaa ctattattat tattactact actactctct 120

cctctccctt ctctatctct tcagcattcc ttaaatacat ttttactctc ttttcgtcgt 180

gacaattcat caaacatgga cgctactaca gccaatggcc atgtcgtcga cggtgtcaat 240

agtttttgct tgaagagcgg tagtggtggt ggtgatccat tgaactgggg tgcggcggcg 300

gagtcgatga aggggagtca cttggacgag gtgaaacgga tggtggcgga gtaccggaag 360

ccggtggtgc ggctcggcgg cgagagcctc acgattgctc aggtggccgg catcgcctca 420

cacgacaccg gcgtacgcgt ggagctgtcg gagtcggcga gggcaggggt taaggcaagc 480

agtgactggg tgatggacag catgaataat ggcaccgaca gctacggtgt caccaccggt 540

ttcggtgcta cctcccaccg tagaaccaaa cagggcggtg ccttgcagaa ggagctaatt 600

aggtttttga atgctggaat atttggcaat ggtacggagt caaattgcac cctaccacac 660

acagcaacaa gggcagcaat gctagtgaga atcaacaccc ttcttcaagg ctactctggc 720

attagatttg aaatcttaga agccatgaca aagttcctaa acagcaacat caccccatgc 780

ctaccactaa ggggaacaat tacagcatct ggtgaccttg tccctctttc ttacattgcc 840

ggtttgttaa cgggcagacc caattccaaa gctgtgggac ccactggaga gattctcaat 900

gccaaggaag catttcaatt ggccaaaatt ggttcagagt tctttgaatt gcaacccaaa 960

gaaggccttg cacttgttaa tggcactgcc gttggttctg gtttggcttc aatcgttctg 1020

tttgaagcaa acattctagc tgttttgtct gaagttatat cagcaatttt cgctgaagtt 1080

atgcaaggga aacctgaatt cactgactat ttgacacata aactgaaaca ccatcctggg 1140

caaatcgaag ctgcagctat tatggagcat gttttggatg gaagctctta tgttaaagca 1200

gctaagaagt tgcatgaggt tgacccttta caaaagccta aacaggatcg ctatgcactt 1260

aggacttcac cacaatggct tggtccttta attgaagtga taaggttctc aactaagtca 1320

attgagagag agattaactc ggtcaatgac aaccctttga ttgatgtgtc aaggaacaag 1380

gctttacatg gtggtaactt tcagggaaca cctattgggg tctcaatgga taacacacgt 1440

ttggcacttg cttcaattgg taaactcatg tttgctcaat tctctgagct tgttaatgat 1500

ttttacaaca atgggttgcc ttcgaatctc tctggtggta gaaacccaag cttggattat 1560

ggtttcaagg gagctgaaat tgctatggct tcttattgct ctgagctaca ataccttgca 1620

aacccggtta caagccatgt acaaagtgct gaacaacaca accaggatgt gaactcgttg 1680

ggtttgattt cttctaggaa aacaaacgag gccattgaga tccttaagct catgtcttcc 1740

acgttcttga ttgcactctg ccaagctatt gacttgaggc acttggagga gaacctgagg 1800

aacaccgtca agaacaccgt gagccaagtt gccaagagga cactcaccac aggtgtcaat 1860

ggagaactcc acccttctag attctgtgag aaagacttgc tcaaggttgt tgatagggag 1920

tatgtttttg cctacattga cgacccttgc agtgccacgt acccattgat gcaaaagctg 1980

aggcaagtgc ttgtggatca tgcacttgta aatggagaga gcgagaagag cttgaacaca 2040

tcgatcttcc aaaagattgc aacttttgag gatgagttga aggccctttt gccaaaagag 2100

gtggaaggtg cgagggttgc atatgagaat gggcaatgtg caatcccgaa caagatcaag 2160

gaatgtaggt catacccgtt gtacaagttt gtgagggaag agttggggac agggttgtta 2220

acaggggaga aggtgatttc accgggtgag gagtgtgaca aactgttcat agcaatgtgc 2280

cagggtaaga ttattgatcc ccttttggaa tgccttgggg agtggaatgg tgcgcctctt 2340

ccaatttgtt aa 2352

<210> 2

<211> 783

<212> PRT

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 2

Met Pro His Ser Leu Ser Leu Phe Pro Ser Gly Asn Pro Leu Ile Pro

1 5 10 15

Pro Thr Thr Lys Asp Phe Leu Ser Ser His Gln Leu Leu Gln Leu Leu

20 25 30

Leu Leu Leu Leu Leu Leu Leu Ser Pro Leu Pro Ser Leu Ser Leu Gln

35 40 45

His Ser Leu Asn Thr Phe Leu Leu Ser Phe Arg Arg Asp Asn Ser Ser

50 55 60

Asn Met Asp Ala Thr Thr Ala Asn Gly His Val Val Asp Gly Val Asn

65 70 75 80

Ser Phe Cys Leu Lys Ser Gly Ser Gly Gly Gly Asp Pro Leu Asn Trp

85 90 95

Gly Ala Ala Ala Glu Ser Met Lys Gly Ser His Leu Asp Glu Val Lys

100 105 110

Arg Met Val Ala Glu Tyr Arg Lys Pro Val Val Arg Leu Gly Gly Glu

115 120 125

Ser Leu Thr Ile Ala Gln Val Ala Gly Ile Ala Ser His Asp Thr Gly

130 135 140

Val Arg Val Glu Leu Ser Glu Ser Ala Arg Ala Gly Val Lys Ala Ser

145 150 155 160

Ser Asp Trp Val Met Asp Ser Met Asn Asn Gly Thr Asp Ser Tyr Gly

165 170 175

Val Thr Thr Gly Phe Gly Ala Thr Ser His Arg Arg Thr Lys Gln Gly

180 185 190

Gly Ala Leu Gln Lys Glu Leu Ile Arg Phe Leu Asn Ala Gly Ile Phe

195 200 205

Gly Asn Gly Thr Glu Ser Asn Cys Thr Leu Pro His Thr Ala Thr Arg

210 215 220

Ala Ala Met Leu Val Arg Ile Asn Thr Leu Leu Gln Gly Tyr Ser Gly

225 230 235 240

Ile Arg Phe Glu Ile Leu Glu Ala Met Thr Lys Phe Leu Asn Ser Asn

245 250 255

Ile Thr Pro Cys Leu Pro Leu Arg Gly Thr Ile Thr Ala Ser Gly Asp

260 265 270

Leu Val Pro Leu Ser Tyr Ile Ala Gly Leu Leu Thr Gly Arg Pro Asn

275 280 285

Ser Lys Ala Val Gly Pro Thr Gly Glu Ile Leu Asn Ala Lys Glu Ala

290 295 300

Phe Gln Leu Ala Lys Ile Gly Ser Glu Phe Phe Glu Leu Gln Pro Lys

305 310 315 320

Glu Gly Leu Ala Leu Val Asn Gly Thr Ala Val Gly Ser Gly Leu Ala

325 330 335

Ser Ile Val Leu Phe Glu Ala Asn Ile Leu Ala Val Leu Ser Glu Val

340 345 350

Ile Ser Ala Ile Phe Ala Glu Val Met Gln Gly Lys Pro Glu Phe Thr

355 360 365

Asp Tyr Leu Thr His Lys Leu Lys His His Pro Gly Gln Ile Glu Ala

370 375 380

Ala Ala Ile Met Glu His Val Leu Asp Gly Ser Ser Tyr Val Lys Ala

385 390 395 400

Ala Lys Lys Leu His Glu Val Asp Pro Leu Gln Lys Pro Lys Gln Asp

405 410 415

Arg Tyr Ala Leu Arg Thr Ser Pro Gln Trp Leu Gly Pro Leu Ile Glu

420 425 430

Val Ile Arg Phe Ser Thr Lys Ser Ile Glu Arg Glu Ile Asn Ser Val

435 440 445

Asn Asp Asn Pro Leu Ile Asp Val Ser Arg Asn Lys Ala Leu His Gly

450 455 460

Gly Asn Phe Gln Gly Thr Pro Ile Gly Val Ser Met Asp Asn Thr Arg

465 470 475 480

Leu Ala Leu Ala Ser Ile Gly Lys Leu Met Phe Ala Gln Phe Ser Glu

485 490 495

Leu Val Asn Asp Phe Tyr Asn Asn Gly Leu Pro Ser Asn Leu Ser Gly

500 505 510

Gly Arg Asn Pro Ser Leu Asp Tyr Gly Phe Lys Gly Ala Glu Ile Ala

515 520 525

Met Ala Ser Tyr Cys Ser Glu Leu Gln Tyr Leu Ala Asn Pro Val Thr

530 535 540

Ser His Val Gln Ser Ala Glu Gln His Asn Gln Asp Val Asn Ser Leu

545 550 555 560

Gly Leu Ile Ser Ser Arg Lys Thr Asn Glu Ala Ile Glu Ile Leu Lys

565 570 575

Leu Met Ser Ser Thr Phe Leu Ile Ala Leu Cys Gln Ala Ile Asp Leu

580 585 590

Arg His Leu Glu Glu Asn Leu Arg Asn Thr Val Lys Asn Thr Val Ser

595 600 605

Gln Val Ala Lys Arg Thr Leu Thr Thr Gly Val Asn Gly Glu Leu His

610 615 620

Pro Ser Arg Phe Cys Glu Lys Asp Leu Leu Lys Val Val Asp Arg Glu

625 630 635 640

Tyr Val Phe Ala Tyr Ile Asp Asp Pro Cys Ser Ala Thr Tyr Pro Leu

645 650 655

Met Gln Lys Leu Arg Gln Val Leu Val Asp His Ala Leu Val Asn Gly

660 665 670

Glu Ser Glu Lys Ser Leu Asn Thr Ser Ile Phe Gln Lys Ile Ala Thr

675 680 685

Phe Glu Asp Glu Leu Lys Ala Leu Leu Pro Lys Glu Val Glu Gly Ala

690 695 700

Arg Val Ala Tyr Glu Asn Gly Gln Cys Ala Ile Pro Asn Lys Ile Lys

705 710 715 720

Glu Cys Arg Ser Tyr Pro Leu Tyr Lys Phe Val Arg Glu Glu Leu Gly

725 730 735

Thr Gly Leu Leu Thr Gly Glu Lys Val Ile Ser Pro Gly Glu Glu Cys

740 745 750

Asp Lys Leu Phe Ile Ala Met Cys Gln Gly Lys Ile Ile Asp Pro Leu

755 760 765

Leu Glu Cys Leu Gly Glu Trp Asn Gly Ala Pro Leu Pro Ile Cys

770 775 780

<210> 3

<211> 1518

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 3

atggatctcc tccttctgga gaagacccta ttgggtctct tcatcgccgc cataaccgcc 60

attgcaatct caaagctccg aggccggcga ttcaagctcc caccgggacc aatcccggta 120

ccaatcttcg gtaactggct ccaagtcggc gacgacctca accaccgcaa cctcaccgac 180

ctagcgaaac gcttcggcga catcttcctc ctccgaatgg gacagcgaaa cctcgtcgtc 240

gtttcatcgc cggagctagc caaggaggtc ctccacacac agggcgtgga attcggatcc 300

cgaacacgaa acgtcgtatt cgacatcttc accggaaagg gacaagacat ggtgttcacc 360

gtctacggcg aacactggcg gaagatgagg aggatcatga cggtgccctt tttcaccaac 420

aaggttgttc agcagtaccg gttcgggtgg gaatctgagg ctgctagtgt cgtcgatgat 480

gttcggcgta accccgatgc agccgccggc gggattgtac tccgccggag acttcagctc 540

atgatgtata acaatatgta tcggattatg tttgatagga ggtttgagag tgaggaggat 600

cctctgttta tgaagctgaa ggctctgaat ggggagagga gtcgtttggc acagagtttt 660

gagtataact atggggattt cattcctatt ttgagaccct tcttgaaagg ttacttgacg 720

atttgtaagg aggttaagga gaggaggttg aagctcttca aggactattt cgttgatgag 780

aggatgaagc ttgaaagcac aaagagcacc agcaacgaag gacttaaatg cgctattgat 840

cacattttgg acgctcagaa gaagggtgag atcaacgaag acaacgtcct ttacattgtt 900

gagaacatca acgttgctgc aattgaaaca actctatggt caattgaatg gggaattgct 960

gagcttgtga accacccaga gatccaaaag aaagtgaggg atgagattga cagagttctt 1020

ggaccaggac accaagtgac tgagccagat atgcagaagc taccttacct tcaggcagtg 1080

atcaaggaga cactccggct ccgaatggcg atcccgctcc tcgtcccaca catgaacctc 1140

cacgacgcaa agctcggtgg gtacgacatt ccggcggaga gcaagatatt ggtgaatgca 1200

tggtggcttg caaacaaccc tgctaattgg aaaaggccag aggagtttag gccagagagg 1260

ttcttagagg aagagtcaca tgttgaggct aatgggaatg actttaggta ccttccattt 1320

ggtgttggta gaaggagttg ccctggaatc attcttgctt tgcctatcct tggtattact 1380

ttgggacgtt tggttcaaaa ttttgagcta ttgcctcctc ctggacagtc caaacttgac 1440

actgctgaga aaggagggca attcagtttg cacatactca aacactcaac cattgttgcc 1500

aagccaagat cattttag 1518

<210> 4

<211> 505

<212> PRT

<213> Artificial Sequence

<220>

<223> aritficial sequence

<400> 4

Met Asp Leu Leu Leu Leu Glu Lys Thr Leu Leu Gly Leu Phe Ile Ala

1 5 10 15

Ala Ile Thr Ala Ile Ala Ile Ser Lys Leu Arg Gly Arg Arg Phe Lys

20 25 30

Leu Pro Pro Gly Pro Ile Pro Val Pro Ile Phe Gly Asn Trp Leu Gln

35 40 45

Val Gly Asp Asp Leu Asn His Arg Asn Leu Thr Asp Leu Ala Lys Arg

50 55 60

Phe Gly Asp Ile Phe Leu Leu Arg Met Gly Gln Arg Asn Leu Val Val

65 70 75 80

Val Ser Ser Pro Glu Leu Ala Lys Glu Val Leu His Thr Gln Gly Val

85 90 95

Glu Phe Gly Ser Arg Thr Arg Asn Val Val Phe Asp Ile Phe Thr Gly

100 105 110

Lys Gly Gln Asp Met Val Phe Thr Val Tyr Gly Glu His Trp Arg Lys

115 120 125

Met Arg Arg Ile Met Thr Val Pro Phe Phe Thr Asn Lys Val Val Gln

130 135 140

Gln Tyr Arg Phe Gly Trp Glu Ser Glu Ala Ala Ser Val Val Asp Asp

145 150 155 160

Val Arg Arg Asn Pro Asp Ala Ala Ala Gly Gly Ile Val Leu Arg Arg

165 170 175

Arg Leu Gln Leu Met Met Tyr Asn Asn Met Tyr Arg Ile Met Phe Asp

180 185 190

Arg Arg Phe Glu Ser Glu Glu Asp Pro Leu Phe Met Lys Leu Lys Ala

195 200 205

Leu Asn Gly Glu Arg Ser Arg Leu Ala Gln Ser Phe Glu Tyr Asn Tyr

210 215 220

Gly Asp Phe Ile Pro Ile Leu Arg Pro Phe Leu Lys Gly Tyr Leu Thr

225 230 235 240

Ile Cys Lys Glu Val Lys Glu Arg Arg Leu Lys Leu Phe Lys Asp Tyr

245 250 255

Phe Val Asp Glu Arg Met Lys Leu Glu Ser Thr Lys Ser Thr Ser Asn

260 265 270

Glu Gly Leu Lys Cys Ala Ile Asp His Ile Leu Asp Ala Gln Lys Lys

275 280 285

Gly Glu Ile Asn Glu Asp Asn Val Leu Tyr Ile Val Glu Asn Ile Asn

290 295 300

Val Ala Ala Ile Glu Thr Thr Leu Trp Ser Ile Glu Trp Gly Ile Ala

305 310 315 320

Glu Leu Val Asn His Pro Glu Ile Gln Lys Lys Val Arg Asp Glu Ile

325 330 335

Asp Arg Val Leu Gly Pro Gly His Gln Val Thr Glu Pro Asp Met Gln

340 345 350

Lys Leu Pro Tyr Leu Gln Ala Val Ile Lys Glu Thr Leu Arg Leu Arg

355 360 365

Met Ala Ile Pro Leu Leu Val Pro His Met Asn Leu His Asp Ala Lys

370 375 380

Leu Gly Gly Tyr Asp Ile Pro Ala Glu Ser Lys Ile Leu Val Asn Ala

385 390 395 400

Trp Trp Leu Ala Asn Asn Pro Ala Asn Trp Lys Arg Pro Glu Glu Phe

405 410 415

Arg Pro Glu Arg Phe Leu Glu Glu Glu Ser His Val Glu Ala Asn Gly

420 425 430

Asn Asp Phe Arg Tyr Leu Pro Phe Gly Val Gly Arg Arg Ser Cys Pro

435 440 445

Gly Ile Ile Leu Ala Leu Pro Ile Leu Gly Ile Thr Leu Gly Arg Leu

450 455 460

Val Gln Asn Phe Glu Leu Leu Pro Pro Pro Gly Gln Ser Lys Leu Asp

465 470 475 480

Thr Ala Glu Lys Gly Gly Gln Phe Ser Leu His Ile Leu Lys His Ser

485 490 495

Thr Ile Val Ala Lys Pro Arg Ser Phe

500 505

<210> 5

<211> 1653

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 5

atggccattc agaacgagat gaagcagcag caacaacaga tcaaagaaga gttcatattc 60

aagtcgaagc ttcccgatat ccacattccc aaacacctcc ctctgcattc ctactgcttc 120

cagaatcttc cagagttcgg ttcacgtccc tgtctcatca acgccccaac gggtgaaata 180

ttcacctact ccgacgtgga actcgccgca cggagagtcg catcggggct aaaaaaacta 240

ggcatccaac acggcgatgt aatcatggtc ctcctcccaa attgccctga attcgttttc 300

tccttcctcg gcgcttcctt ttgcggcgca atcaccaccg ccgcgaaccc gttcttcacc 360

gccgcggaaa ttgccaaaca ggccaaagcc tcccatggga aggtgatcgt aacacaggct 420

tgttactacg agaaggtgaa ggacttgggt gtgaccaatc tcgtgttcgt ggattctccc 480

cctgaggggc acatgcattt cagcgagttg atggctgatg atgatgacgt catcaccggt 540

gatgaaatta agatccaccc tgatgacgtg gtggctttgc cttattcttc cgggacgacg 600

ggtctcccca aaggggtgat gctgacacac aaggggttgg taacgagcat agcacagcag 660

gtggatgggg agaacccaaa cctttactac cacagcgagg atgtgatcct ctgtgtgctt 720

cccctgtttc acatatactc cctcaactct gttctcctct gtgggttaag ggccaaagcc 780

tccatcttgt tgatgcccaa gttcgacatt catgctttct tgggtctggt tcacaggcac 840

agggtcacca ttgcaccact tgtgcccccc attgttctcg ccattgccaa gtcacctgat 900

cttgataaat atgacctctc atccattagg gtcctcaaat ctggaggggc tccccttggt 960

aaagaacttg aagacactgt cagggccaaa ttcccccaag ccaaacttgg acagggatat 1020

gggatgacgg aggcaggtcc agtgttgaca atgtgcttat catttgcaaa agtgccaata 1080

gatgtaaaac caggtgcatg tggaaccgtc gtcaggaatg cggagatgaa gattgtggat 1140

cctgaaaccg atacttcttt gcctcgaaat caacccggtg aaatctgtat tagaggcgac 1200

caaatcatga aaggttatct gaacgacccg gaagctacag agagaacaat agacaaagaa 1260

ggttggttgc atacgggtga cattgggtac attgacaatg atgatgagtt gttcatcgtt 1320

gataggctga aggaattgat taaatacaaa gggtttcaag tggctccagc tgaactcgaa 1380

gcccttattc tctcacaccc taagatctcc gatgttgctg tggtcccaat gaaggatgaa 1440

gcagctggtg aggtcccagt tgcatttgtg gtgagagcaa atggtcatat cgacacaact 1500

gaggatgaaa ttaagcaatt cgtctccaaa caggtggtgt tttacaaaag aataaacaga 1560

gtattcttca ttgatgccat tcccaagtca ccctcaggca aaatcttacg aaaggaccta 1620

agggctaagc ttgcagcggg tcttccaaat tga 1653

<210> 6

<211> 550

<212> PRT

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 6

Met Ala Ile Gln Asn Glu Met Lys Gln Gln Gln Gln Gln Ile Lys Glu

1 5 10 15

Glu Phe Ile Phe Lys Ser Lys Leu Pro Asp Ile His Ile Pro Lys His

20 25 30

Leu Pro Leu His Ser Tyr Cys Phe Gln Asn Leu Pro Glu Phe Gly Ser

35 40 45

Arg Pro Cys Leu Ile Asn Ala Pro Thr Gly Glu Ile Phe Thr Tyr Ser

50 55 60

Asp Val Glu Leu Ala Ala Arg Arg Val Ala Ser Gly Leu Lys Lys Leu

65 70 75 80

Gly Ile Gln His Gly Asp Val Ile Met Val Leu Leu Pro Asn Cys Pro

85 90 95

Glu Phe Val Phe Ser Phe Leu Gly Ala Ser Phe Cys Gly Ala Ile Thr

100 105 110

Thr Ala Ala Asn Pro Phe Phe Thr Ala Ala Glu Ile Ala Lys Gln Ala

115 120 125

Lys Ala Ser His Gly Lys Val Ile Val Thr Gln Ala Cys Tyr Tyr Glu

130 135 140

Lys Val Lys Asp Leu Gly Val Thr Asn Leu Val Phe Val Asp Ser Pro

145 150 155 160

Pro Glu Gly His Met His Phe Ser Glu Leu Met Ala Asp Asp Asp Asp

165 170 175

Val Ile Thr Gly Asp Glu Ile Lys Ile His Pro Asp Asp Val Val Ala

180 185 190

Leu Pro Tyr Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Leu

195 200 205

Thr His Lys Gly Leu Val Thr Ser Ile Ala Gln Gln Val Asp Gly Glu

210 215 220

Asn Pro Asn Leu Tyr Tyr His Ser Glu Asp Val Ile Leu Cys Val Leu

225 230 235 240

Pro Leu Phe His Ile Tyr Ser Leu Asn Ser Val Leu Leu Cys Gly Leu

245 250 255

Arg Ala Lys Ala Ser Ile Leu Leu Met Pro Lys Phe Asp Ile His Ala

260 265 270

Phe Leu Gly Leu Val His Arg His Arg Val Thr Ile Ala Pro Leu Val

275 280 285

Pro Pro Ile Val Leu Ala Ile Ala Lys Ser Pro Asp Leu Asp Lys Tyr

290 295 300

Asp Leu Ser Ser Ile Arg Val Leu Lys Ser Gly Gly Ala Pro Leu Gly

305 310 315 320

Lys Glu Leu Glu Asp Thr Val Arg Ala Lys Phe Pro Gln Ala Lys Leu

325 330 335

Gly Gln Gly Tyr Gly Met Thr Glu Ala Gly Pro Val Leu Thr Met Cys

340 345 350

Leu Ser Phe Ala Lys Val Pro Ile Asp Val Lys Pro Gly Ala Cys Gly

355 360 365

Thr Val Val Arg Asn Ala Glu Met Lys Ile Val Asp Pro Glu Thr Asp

370 375 380

Thr Ser Leu Pro Arg Asn Gln Pro Gly Glu Ile Cys Ile Arg Gly Asp

385 390 395 400

Gln Ile Met Lys Gly Tyr Leu Asn Asp Pro Glu Ala Thr Glu Arg Thr

405 410 415

Ile Asp Lys Glu Gly Trp Leu His Thr Gly Asp Ile Gly Tyr Ile Asp

420 425 430

Asn Asp Asp Glu Leu Phe Ile Val Asp Arg Leu Lys Glu Leu Ile Lys

435 440 445

Tyr Lys Gly Phe Gln Val Ala Pro Ala Glu Leu Glu Ala Leu Ile Leu

450 455 460

Ser His Pro Lys Ile Ser Asp Val Ala Val Val Pro Met Lys Asp Glu

465 470 475 480

Ala Ala Gly Glu Val Pro Val Ala Phe Val Val Arg Ala Asn Gly His

485 490 495

Ile Asp Thr Thr Glu Asp Glu Ile Lys Gln Phe Val Ser Lys Gln Val

500 505 510

Val Phe Tyr Lys Arg Ile Asn Arg Val Phe Phe Ile Asp Ala Ile Pro

515 520 525

Lys Ser Pro Ser Gly Lys Ile Leu Arg Lys Asp Leu Arg Ala Lys Leu

530 535 540

Ala Ala Gly Leu Pro Asn

545 550

<210> 7

<211> 1179

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 7

atggtgagtg tagctgaaat tcgcaaagct caaagggcag aaggccctgc aaacatcttg 60

gccattggca ctgcaaaccc accaaactgt gttgatcaga gtacttatcc tgatttttac 120

tttaagatca caaatagtga gcacaagacc gagcttaagg aaaaatttca gcgcatgtgt 180

gataaatcta tgatcaagaa gcgatatatg tacctaacgg aagagatttt gaaagagaat 240

cctaacattt gcgcttatat ggcaccttct ttggacgcta ggcaagacat ggtggtcgta 300

gaggtgccta gactagggaa ggaagctgcg gtcaaagcta taaaagaatg gggccaacca 360

aagtcgaaga ttacccactt aatcttttgc actactagtg gtgtggacat gcctggcgct 420

gattaccagc ttactaaact cttgggtctt cgcccatatg tgaaaaggta tatgatgtac 480

caacaagggt gctttgcagg tggcacggtg cttcgcttgg ccaaagactt ggcggagaac 540

aacaaaggtg ctcgtgtgct agttgtttgt tctgaagtta ctgcagtcac attccgtggc 600

cctactgata ctcacctaga tagccttgtg ggacaagcat tatttggaga tggagcagct 660

gcagtcattg ttggttctga cccaataccc gaaattgaga agcctatatt tgagttggtt 720

tggactgcac aaacaatagc tccagatagt gaaggagcca ttgatggtca ccttcgtgaa 780

gttgggctca catttcatct tcttaaagat gttcccggga ttgtctcaaa gaacattgat 840

aaagcactga ctgaggcatt ccaaccatta ggcatctctg attacaactc aatcttttgg 900

attgcacacc caggtggacc ggcaattctt gaccaagttg agcaaaagtt agctttgaaa 960

cctgaaaaga tgaaggccac tagggatgtg cttagtgatt atggtaacat gtcaagtgca 1020

tgtgtcctat tcatcttgga tgagatgaga aagaaatccg ctcaaaatgg acttaagacc 1080

actggcgaag ggctcgaatg gggtgtgtta ttcggctttg gacctggact taccatcgaa 1140

actgttgttt tgcacagtgt ggctacatga 1170

<210> 8

<211> 389

<212> PRT

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 8

Met Val Ser Val Ala Glu Ile Arg Lys Ala Gln Arg Ala Glu Gly Pro

1 5 10 15

Ala Asn Ile Leu Ala Ile Gly Thr Ala Asn Pro Pro Asn Cys Val Asp

20 25 30

Gln Ser Thr Tyr Pro Asp Phe Tyr Phe Lys Ile Thr Asn Ser Glu His

35 40 45

Lys Thr Glu Leu Lys Glu Lys Phe Gln Arg Met Cys Asp Lys Ser Met

50 55 60

Ile Lys Lys Arg Tyr Met Tyr Leu Thr Glu Glu Ile Leu Lys Glu Asn

65 70 75 80

Pro Asn Ile Cys Ala Tyr Met Ala Pro Ser Leu Asp Ala Arg Gln Asp

85 90 95

Met Val Val Val Glu Val Pro Arg Leu Gly Lys Glu Ala Ala Val Lys

100 105 110

Ala Ile Lys Glu Trp Gly Gln Pro Lys Ser Lys Ile Thr His Leu Ile

115 120 125

Phe Cys Thr Thr Ser Gly Val Asp Met Pro Gly Ala Asp Tyr Gln Leu

130 135 140

Thr Lys Leu Leu Gly Leu Arg Pro Tyr Val Lys Arg Tyr Met Met Tyr

145 150 155 160

Gln Gln Gly Cys Phe Ala Gly Gly Thr Val Leu Arg Leu Ala Lys Asp

165 170 175

Leu Ala Glu Asn Asn Lys Gly Ala Arg Val Leu Val Val Cys Ser Glu

180 185 190

Val Thr Ala Val Thr Phe Arg Gly Pro Thr Asp Thr His Leu Asp Ser

195 200 205

Leu Val Gly Gln Ala Leu Phe Gly Asp Gly Ala Ala Ala Val Ile Val

210 215 220

Gly Ser Asp Pro Ile Pro Glu Ile Glu Lys Pro Ile Phe Glu Leu Val

225 230 235 240

Trp Thr Ala Gln Thr Ile Ala Pro Asp Ser Glu Gly Ala Ile Asp Gly

245 250 255

His Leu Arg Glu Val Gly Leu Thr Phe His Leu Leu Lys Asp Val Pro

260 265 270

Gly Ile Val Ser Lys Asn Ile Asp Lys Ala Leu Thr Glu Ala Phe Gln

275 280 285

Pro Leu Gly Ile Ser Asp Tyr Asn Ser Ile Phe Trp Ile Ala His Pro

290 295 300

Gly Gly Pro Ala Ile Leu Asp Gln Val Glu Gln Lys Leu Ala Leu Lys

305 310 315 320

Pro Glu Lys Met Lys Ala Thr Arg Asp Val Leu Ser Asp Tyr Gly Asn

325 330 335

Met Ser Ser Ala Cys Val Leu Phe Ile Leu Asp Glu Met Arg Lys Lys

340 345 350

Ser Ala Gln Asn Gly Leu Lys Thr Thr Gly Glu Gly Leu Glu Trp Gly

355 360 365

Val Leu Phe Gly Phe Gly Pro Gly Leu Thr Ile Glu Thr Val Val Leu

370 375 380

His Ser Val Ala Thr

385

<210> 9

<211> 948

<212> DNA

<213> Artificial Sequence

<220>

<223> aritificial sequence

<400> 9

atggctgctg cccctacagt ccctgtaata gttctccctt cctcctctgg acagcggaag 60

atgccggtga tgggactcgg cacggcgccg gaagcaacca gtaaggttac cacaaaggat 120

gctgtccttg aggccatcaa gcagggttac aggcactttg atgctgctgc tgcatatggg 180

gttgagaaat cagtaggaga agccatagca gaagcactta aacttggact acttgcatcc 240

agagatgagg tcttcattac ttccaaactt tgggtcactg acaaccaccc tgaaaccatt 300

gttcctgctc tgaagaaatc tctcaggact cttcaactag aatacttaga cctcattttg 360

atccactggc ccattgctac aaaaccagga gaagttaaat accctattga tgtatcagat 420

attgtggagt ttgacatgaa gggtgtgtgg ggatcattgg aggaatgtca aagacttggt 480

ctcaccaaag ccattggagt cagcaacttc tctatcaaga agcttgaaaa attgctctcc 540

tttgccacca tccctcctgc agtaaatcaa gtggaagtca accttggttg gcaacaagag 600

aaacttagag ctttctgcaa ggaaaagggt attgtcataa ctgctttctc acccctgagg 660

aaaggtgcca gtaggggttc taatttggtg atggacaatg atgtgctgaa agaaattgca 720

gatgctcatg gcaagactat agctcagatt tgtcttcgat ggttatatga acaaggcttg 780

acatttgtgg tgaagagcta tgacaaggag aggatgaatc aaaacttgca gatctttgat 840

tggtcattga ctgaggatga ctacaagaaa ataagtgaaa tctatcaaga gaggctcatc 900

aaaggtccaa ccaagcctct tcttgatgac ctgtgggatg aagaatga 948

<210> 10

<211> 315

<212> PRT

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 10

Met Ala Ala Ala Pro Thr Val Pro Val Ile Val Leu Pro Ser Ser Ser

1 5 10 15

Gly Gln Arg Lys Met Pro Val Met Gly Leu Gly Thr Ala Pro Glu Ala

20 25 30

Thr Ser Lys Val Thr Thr Lys Asp Ala Val Leu Glu Ala Ile Lys Gln

35 40 45

Gly Tyr Arg His Phe Asp Ala Ala Ala Ala Tyr Gly Val Glu Lys Ser

50 55 60

Val Gly Glu Ala Ile Ala Glu Ala Leu Lys Leu Gly Leu Leu Ala Ser

65 70 75 80

Arg Asp Glu Val Phe Ile Thr Ser Lys Leu Trp Val Thr Asp Asn His

85 90 95

Pro Glu Thr Ile Val Pro Ala Leu Lys Lys Ser Leu Arg Thr Leu Gln

100 105 110

Leu Glu Tyr Leu Asp Leu Ile Leu Ile His Trp Pro Ile Ala Thr Lys

115 120 125

Pro Gly Glu Val Lys Tyr Pro Ile Asp Val Ser Asp Ile Val Glu Phe

130 135 140

Asp Met Lys Gly Val Trp Gly Ser Leu Glu Glu Cys Gln Arg Leu Gly

145 150 155 160

Leu Thr Lys Ala Ile Gly Val Ser Asn Phe Ser Ile Lys Lys Leu Glu

165 170 175

Lys Leu Leu Ser Phe Ala Thr Ile Pro Pro Ala Val Asn Gln Val Glu

180 185 190

Val Asn Leu Gly Trp Gln Gln Glu Lys Leu Arg Ala Phe Cys Lys Glu

195 200 205

Lys Gly Ile Val Ile Thr Ala Phe Ser Pro Leu Arg Lys Gly Ala Ser

210 215 220

Arg Gly Ser Asn Leu Val Met Asp Asn Asp Val Leu Lys Glu Ile Ala

225 230 235 240

Asp Ala His Gly Lys Thr Ile Ala Gln Ile Cys Leu Arg Trp Leu Tyr

245 250 255

Glu Gln Gly Leu Thr Phe Val Val Lys Ser Tyr Asp Lys Glu Arg Met

260 265 270

Asn Gln Asn Leu Gln Ile Phe Asp Trp Ser Leu Thr Glu Asp Asp Tyr

275 280 285

Lys Lys Ile Ser Glu Ile Tyr Gln Glu Arg Leu Ile Lys Gly Pro Thr

290 295 300

Lys Pro Leu Leu Asp Asp Leu Trp Asp Glu Glu

305 310 315

<210> 11

<211> 25

<212> DNA

<213> Artificial Sequence

<220>

<223> aitificial sequence

<400> 11

atgccccatt ctctctccct cttcc 25

<210> 12

<211> 28

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 12

ttaacaaatt ggaagaggtg caccattc 28

<210> 13

<211> 26

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 13

taacatttaa ctccctaccc atttgc 26

<210> 14

<211> 23

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 14

ccacccctcc attttcccac tac 23

<210> 15

<211> 22

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 15

cttccaccac ccttacccac tc 22

<210> 16

<211> 25

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 16

gaagcataag cgggcatcat aaaat 25

<210> 17

<211> 24

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 17

atggtgacag ttgaagagat ccgc 24

<210> 18

<211> 25

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 18

tcaattagcc tgcaagggaa cactg 25

<210> 19

<211> 26

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 19

ttacaatacc aagagcagct accact 26

<210> 20

<211> 25

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 20

gttcattctt catcccacag gtcat 25

<210> 21

<211> 2127

<212> DNA

<213> Artificial Sequence

<220>

<223> artificial sequence

<400> 21

atggtgagtg tagctgaaat tcgcaaagct caaagggcag aaggccctgc aaacatcttg 60

gccattggca ctgcaaaccc accaaactgt gttgatcaga gtacttatcc tgatttttac 120

tttaagatca caaatagtga gcacaagacc gagcttaagg aaaaatttca gcgcatgtgt 180

gataaatcta tgatcaagaa gcgatatatg tacctaacgg aagagatttt gaaagagaat 240

cctaacattt gcgcttatat ggcaccttct ttggacgcta ggcaagacat ggtggtcgta 300

gaggtgccta gactagggaa ggaagctgcg gtcaaagcta taaaagaatg gggccaacca 360

aagtcgaaga ttacccactt aatcttttgc actactagtg gtgtggacat gcctggcgct 420

gattaccagc ttactaaact cttgggtctt cgcccatatg tgaaaaggta tatgatgtac 480

caacaagggt gctttgcagg tggcacggtg cttcgcttgg ccaaagactt ggcggagaac 540

aacaaaggtg ctcgtgtgct agttgtttgt tctgaagtta ctgcagtcac attccgtggc 600

cctactgata ctcacctaga tagccttgtg ggacaagcat tatttggaga tggagcagct 660

gcagtcattg ttggttctga cccaataccc gaaattgaga agcctatatt tgagttggtt 720

tggactgcac aaacaatagc tccagatagt gaaggagcca ttgatggtca ccttcgtgaa 780

gttgggctca catttcatct tcttaaagat gttcccggga ttgtctcaaa gaacattgat 840

aaagcactga ctgaggcatt ccaaccatta ggcatctctg attacaactc aatcttttgg 900

attgcacacc caggtggacc ggcaattctt gaccaagttg agcaaaagtt agctttgaaa 960

cctgaaaaga tgaaggccac tagggatgtg cttagtgatt atggtaacat gtcaagtgca 1020

tgtgtcctat tcatcttgga tgagatgaga aagaaatccg ctcaaaatgg acttaagacc 1080

actggcgaag ggctcgaatg gggtgtgtta ttcggctttg gacctggact taccatcgaa 1140

actgttgttt tgcacagtgt ggctacaggt ggtggttcta tggctgctgc ccctacagtc 1200

cctgtaatag ttctcccttc ctcctctgga cagcggaaga tgccggtgat gggactcggc 1260

acggcgccgg aagcaaccag taaggttacc acaaaggatg ctgtccttga ggccatcaag 1320

cagggttaca ggcactttga tgctgctgct gcatatgggg ttgagaaatc agtaggagaa 1380

gccatagcag aagcacttaa acttggacta cttgcatcca gagatgaggt cttcattact 1440

tccaaacttt gggtcactga caaccaccct gaaaccattg ttcctgctct gaagaaatct 1500

ctcaggactc ttcaactaga atacttagac ctcattttga tccactggcc cattgctaca 1560

aaaccaggag aagttaaata ccctattgat gtatcagata ttgtggagtt tgacatgaag 1620

ggtgtgtggg gatcattgga ggaatgtcaa agacttggtc tcaccaaagc cattggagtc 1680

agcaacttct ctatcaagaa gcttgaaaaa ttgctctcct ttgccaccat ccctcctgca 1740

gtaaatcaag tggaagtcaa ccttggttgg caacaagaga aacttagagc tttctgcaag 1800

gaaaagggta ttgtcataac tgctttctca cccctgagga aaggtgccag taggggttct 1860

aatttggtga tggacaatga tgtgctgaaa gaaattgcag atgctcatgg caagactata 1920

gctcagattt gtcttcgatg gttatatgaa caaggcttga catttgtggt gaagagctat 1980

gacaaggaga ggatgaatca aaacttgcag atctttgatt ggtcattgac tgaggatgac 2040

tacaagaaaa taagtgaaat ctatcaagag aggctcatca aaggtccaac caagcctctt 2100

cttgatgacc tgtgggatga agaatga 2127

Claims

1. A recombinant yeast engineering bacterium WM2-2 for producing isoliquiritigenin is characterized in that recombinant expression vectors pYM1, pYM2 and pYM3 are transferred into yeast engineering bacterium WAT11,

the recombinant expression vector pYM1 comprises a phenylalanine ammonia lyase encoding gene PAL and a cinnamic acid 4-hydroxylase encoding gene C4H, wherein the PAL is inserted into the downstream of a promoter GAL1 of a binary yeast expression vector pESC-His, and the C4H is inserted into the downstream of a promoter GAL10 of the binary yeast expression vector pESC-His;

the recombinant expression vector pYM2 comprises a coumaroyl-CoA ligase coding gene 4CL and a chalcone synthetase-chalcone reductase fusion protein coding gene CHS, wherein the 4CL is inserted into the downstream of a promoter GAL1 of a yeast expression vector pESC-Leu, and the CHS is inserted into the downstream of a promoter GAL10 of the expression vector pESC-Leu;

the recombinant expression vector pYM3 comprises a chalcone synthetase-chalcone reductase fusion protein coding gene CHS:: CHR, wherein the CHS:: CHR is inserted into the downstream of a promoter GAL10 of an expression vector pESC-Trp,

CHR is a fusion gene CHS formed by connecting a linker GGGS encoding gene to the 5 'end of CHR at the 3' end through CHS; wherein,

the PAL encodes the amino acid sequence of SEQ ID NO: 2, and C4H encodes the amino acid sequence set forth in SEQ ID NO: 4, cinnamic acid 4-hydroxylase having an amino acid sequence shown in figure 4;

the 4CL encodes SEQ ID NO: 6, coumaroyl-CoA ligase of the amino acid sequence set forth in (a);

the CHR code is represented by SEQ ID NO: 8 and the amino acid sequence shown in SEQ ID NO: 10, the chalcone synthase and the chalcone reductase are linked by a linker GGGS.