CN118086285A

CN118086285A - Method for directed evolution of proteins

Info

Publication number: CN118086285A
Application number: CN202410490345.3A
Authority: CN
Inventors: 洪浩; 詹姆斯·盖吉; 肖毅; 张娜; 焦学成; 王翔; 史皖明; 杨益明; 赵军旗; 王磊
Original assignee: Tianjin Kailaiying Biotechnology Co ltd
Current assignee: Tianjin Kailaiying Biotechnology Co ltd
Priority date: 2024-04-23
Filing date: 2024-04-23
Publication date: 2024-05-28
Anticipated expiration: 2044-04-23
Also published as: CN118086285B

Abstract

The invention provides a method for directed evolution of proteins. The method for directional evolution of proteins comprises the following steps: obtaining a PCR product with the gene mutation of the target protein through PCR amplification; placing the PCR product in a protein in-vitro expression system for gene expression to obtain a target protein with mutation; and (3) carrying out character detection on the target protein with mutation, wherein the target protein is target enzyme, and the character detection comprises detection on the activity and/or the corresponding isomer excess percentage of the target enzyme. The method has high stability, shortens the time from 2-3 weeks to 7 hours, and greatly accelerates the evolution speed.

Description

Method for directed evolution of proteins

Technical Field

The invention relates to the field of directed evolution of proteins in protein engineering, in particular to a method for directed evolution of proteins.

Background

The idea of directed evolution of proteins is to repeatedly mutate, express and screen a target gene for many rounds by simulating natural evolution, so that evolution of thousands of years in the nature is completed in a short time, and finally, the proteins with improved performance or new functions are obtained. The method for directed evolution of proteins can be divided into 3 strategies of irrational design, semi-rational design and rational design.

The irrational design, namely the random evolution strategy, has the advantages that no deep knowledge on protein sequences and structures is needed, and natural evolution is simulated only by a random mutation and fragment recombination method. Mainly comprises Error-prone PCR (Error-prone polymerase chain reaction, epPCR) and DNA recombination (DNA shuffling). DNA shuffling is mainly used for single-gene or multi-gene recombination, and the technology uses DNase to cut a group of homologous genes with sense mutation sites into random fragments (usually 10-50 bp), and uses PCR to extend and recombine the whole-length genes. The method has the advantages of simple operation, no need of protein structure information and easy acquisition of sense mutation; the disadvantage is that at least 70% identity between the gene sequences is required, and the amino acid sequence is far less variable than the base sequence due to the codon degeneracy, so that a gene sequence of 70% identity means more than 90% identity at the amino acid sequence level of the protein, a fatal disadvantage which has led to this technology not being widely used in the last 20 years. Error-prone PCR is relatively more applied, and the basic principle is that the random mismatch rate of bases is increased by changing the reaction condition of a PCR reaction system or using low-fidelity DNA polymeric protein, so that multi-point mutation is caused, a mutant library with sequence diversity is generated, and the mutant library is widely adopted by researchers because protein structure information is not needed and the operation is simple. However, the application of this technique is limited in several ways: the base preference (generally AG > TC) of the polymeric protein and the mutation efficiency are low, the number of the bases of each mutation is generally 1, and the forward mutation can be gradually accumulated by continuous superposition accumulation, and at least 4 continuous rounds of epPCR are generally required to obtain the target mutant with obviously improved protein performance. Limited by the detection throughput, the stock capacity of a typical round of epPCR is around 1000-2000.

Rational design is an intelligent transformation means, relies on computer technology (in silico) to simulate the evolution track of natural proteins, and can rapidly and accurately predict target mutants through computer virtual mutation screening. The protein is specifically modified by predicting the active site of the protein and examining the influence of mutation at a specific site on the aspects of stability, folding, binding with a substrate and the like through a series of algorithms and programs developed based on bioinformatics. The biocatalyst can be modified and screened efficiently and rapidly based on computer-aided design and large-scale molecular dynamics simulation, so that the protein structure can be predicted with high precision, and new proteins which do not exist in nature can be designed from scratch. Although new protein designs have met with some success, many challenges remain: firstly, the success rate is low; secondly, the calculation is heavy, and the dependence on computer resources is very high; again, the designed new proteins tend to have poor structure and stability, and often low catalytic activity. Mainly because the knowledge of the relationship between protein sequences/structures/functions is not yet deep enough. Rational design typically introduces mutation sites by site-directed mutagenesis, with a pool size of between tens to hundreds.

The semi-rational design mainly uses bioinformatics method, based on homologous protein sequence comparison, three-dimensional structure or existing knowledge, rationally selects a plurality of amino acid residues as reconstruction targets, combines rational selection of effective codons, and carries out targeted protein reconstruction by constructing high-quality mutant library. Mutations are generally introduced by degenerate primers, with a capacity of hundreds to thousands being established (Qu Ge, zhao Jing, zheng Pingdeng, recent advances in directed evolution technology. Bioengineering journal, 2018, 34 (1): 1-11).

In summary, site-directed mutagenesis and site-directed saturation mutagenesis are the most used means in constructing mutation libraries in directed evolution of proteins, and besides epPCR is also an effective means.

At present, site-directed mutagenesis and site-directed saturation mutagenesis are generally carried out by designing mutation sites into target bases or merger bases, introducing the target bases or merger bases by PCR, constructing the target bases or merger bases on plasmids, transforming the target bases or merger bases into an expression host, selecting escherichia coli, culturing, transferring the escherichia coli, inducing and expressing target proteins, crushing the target proteins to obtain proteins, and reacting the proteins by using crude protein extracts or pure proteins.

The stock building capacity of error-prone PCR is large, but is limited by screening flux, and the mutants screened by error-prone PCR are generally about 1000-2000. For industrial proteins, the number of amino acids in the protein is typically between about 300. 3-5 primers are designed at each site, and the site is mutated into 3-5 amino acids representing different properties respectively, so that error-prone PCR can be solved by global PCR.

In summary, the existing methods of directed evolution of proteins can be essentially implemented by primers comprising a single mutant. In recent years, the technology of gene synthesis is broken through continuously, the cost of gene synthesis is lower and lower, generally, one primer takes about 10 yuan, the price of synthesizing thousands of primers is about 1 ten thousand yuan, and according to the current development trend, the cost is lower and lower in the future.

For the traditional directed evolution of proteins, the property detection from primer design to mutant needs more than 10 steps of PCR, protein cutting, connection, transformation, monoclonal selection, monoclonal culture, transfer, induction, expression, centrifugal bacterial collection, resuspension, crushing and the like, so that the crude cell extract of the target protein can be obtained, generally about 2-3 weeks is needed, the series of operations are labor and effort-consuming, the microorganism operation has the risk of contaminating mixed bacteria or phage, the requirements on equipment and consumables are high, the requirements on personnel operation are high, and meanwhile, a small amount of errors can be introduced in each step due to long operation flow, so that the fluctuation of the final result is large.

Disclosure of Invention

The invention mainly aims to provide a method for directed evolution of proteins, which aims to solve the problem of long directed evolution process of enzymes in the prior art.

In order to achieve the above object, according to one aspect of the present invention, there is provided a method for directed evolution of proteins, the method comprising: obtaining a PCR product with the gene mutation of the target protein through PCR amplification; placing the PCR product in a protein in-vitro expression system for gene expression to obtain a target protein with mutation; performing a trait test on the target protein with the mutation, wherein the trait test comprises detecting the activity of the target enzyme and/or the corresponding isomer excess percentage (namely ee value); wherein the target protein is a target enzyme; the protein in vitro expression system is an escherichia coli in vitro expression system, and the escherichia coli in vitro expression system comprises: basic components, energy-related components, additive components, cell extracts and RNase inhibitors, wherein in an in vitro expression system of escherichia coli, the basic components comprise: the concentration of each amino acid was 2mM of 19 amino acids, 2mM of tyrosine, 14 mM of magnesium acetate, 60 mM of potassium acetate, and 7mM of DDT; in E.coli in vitro expression systems, the energy-related components include: 1.2 mM AMP, 0.85 mM CMP, 0.85 mM GMP, 0.85 mM UMP, 15-83 mM PEP, 0.4-0.6 mM NAD, 4mM potassium oxalate, 90 mM potassium glutamate, 2.5-10 mM magnesium glutamate; in an in vitro expression system of the escherichia coli, the additive components comprise: 1.5 mM spermidine and 157.33 mM HEPES; in an in vitro expression system of the escherichia coli, the concentration of the RNase inhibitor is 150U/450 mu L; the volume content of the cell extract in an in-vitro expression system of the escherichia coli is 20-60%.

Further, a PCR product with the gene mutation of the target protein is obtained by PCR amplification using any one or more of the following methods: 1) Amplifying by a two-step PCR method to obtain a PCR product with the gene mutation of the target protein; 2 primer pairs were designed: 1) F1 and R1; 2) F2 and R2, introducing a mutation sequence containing mutation sites into 2 primer pairs, and performing a first PCR (polymerase chain reaction) by using 2 primer pairs to respectively obtain fragments L1 and L2 at two sides of the mutation sites, wherein the middle overlapping region of the fragments L1 and L2 is marked as L, and the mutation sites are positioned on the L; then taking the mixture of the fragments L1 and L2 as a template and F1 and R2 as primers, and performing a second step of PCR to obtain a full-length sequence, wherein the full-length sequence is a PCR product with the gene mutation of the target protein; or 2) constructing a plurality of PCR products with the gene mutation of the target protein by a PCR amplification mutation introducing method according to the principle of site-directed saturation mutation, and constructing a saturation mutant library of the target protein gene by the plurality of PCR products with the gene mutation of the target protein; or 3) site-directed mutagenesis by PCR amplification to obtain PCR products with the gene mutation of the target protein; or 4) carrying out full sequence random mutation by utilizing an error-prone PCR method, so as to obtain a plurality of PCR products with the gene mutation of the target protein, wherein the PCR products with the gene mutation of the target protein cover the random mutation of the full sequence of the gene of the target protein; or 5) a method of utilizing multiple point mutation to obtain PCR products with multiple mutation sites of the gene of the target protein.

Further, in the E.coli in vitro expression system, the concentration of PEP was 30 mM; preferably, the concentration of NAD is 0.4 mM; preferably, the concentration of magnesium glutamate is 7.5 mM; preferably, the cell extract is present in an in vitro expression system of E.coli in a volume content of 33.3%.

Further, the target enzyme is selected from any one of the following industrial proteases.

Further, the industrial protease is SEQ ID NO:1 or the amino acid sequence of SEQ ID NO:2 and a transaminase TA-1.

Further, the detection of the target enzyme with the gene mutation comprises the following steps: using a plurality of target enzymes with different gene mutations to catalyze the same substrate to react to generate the same product, and detecting the conversion rate of different target enzymes to catalyze the substrate and/or the corresponding isomer excess percentage of the product; the target enzyme with increased conversion and/or corresponding isomer excess is selected from a plurality of target enzymes, with reference to the conversion and/or isomer excess of the catalytic substrate of the initial control enzyme, and is designated as the initial +1 control enzyme.

Further, after obtaining the initial +1 control enzyme, the method further comprises: and iterating the initial +1 control enzyme into the initial control enzyme, and then repeatedly executing the steps S1 to S3, and so on, thereby obtaining a plurality of target enzymes after directed evolution.

In order to achieve the above object, according to a second aspect of the present invention, there is provided a protein in vitro expression system, which is an E.coli in vitro expression system comprising: basic components, energy-related components, additive components, cell extracts and RNase inhibitors, wherein in an in vitro expression system of escherichia coli, the basic components comprise: the concentration of each amino acid is 2mM of 19 amino acids; 2mM tyrosine; 14 mM magnesium acetate; 60 mM potassium acetate, DDT of 7 mM; in E.coli in vitro expression systems, the energy-related components include: 1.2 mM AMP, 0.85 mM CMP, 0.85 mM GMP, 0.85 mM UMP, 15-83 mM PEP, 0.4-0.6 mM NAD, 4mM potassium oxalate, 90 mM potassium glutamate, 2.5-10 mM magnesium glutamate; in an in vitro expression system of the escherichia coli, the additive components comprise: 1.5 mM spermidine and 157.33 mM HEPES; in an in vitro expression system of the escherichia coli, the concentration of the RNase inhibitor is 150U/450 mu L; the volume content of the cell extract in an in-vitro expression system of the escherichia coli is 20-60%.

Further, in the E.coli in vitro expression system, the concentration of PEP was 30 mM; preferably, the NAD content is 0.4 mM;

Preferably, the magnesium glutamate content is 7.5 mM; preferably, the cell extract is present in an in vitro expression system of E.coli in a volume content of 33.3%.

In order to achieve the above object, according to a third aspect of the present invention, there is provided a kit for directed evolution of a protein, the kit comprising an in vitro protein expression system, the in vitro protein expression system being an in vitro E.coli expression system, the in vitro E.coli expression system comprising: basic components, energy-related components, additive components, cell extracts and RNase inhibitors, wherein in an in vitro expression system of escherichia coli, the basic components comprise: the concentration of each amino acid is 2mM of 19 amino acids; 2mM tyrosine; 14 mM magnesium acetate; 60 mM potassium acetate, DDT of 7 mM; in E.coli in vitro expression systems, the energy-related components include: 1.2 mM AMP, 0.85 mM CMP, 0.85 mM GMP, 0.85 mM UMP, 15-83 mM PEP, 0.4-0.6 mM NAD, 4mM potassium oxalate, 90 mM potassium glutamate, 2.5-10 mM magnesium glutamate; in an in vitro expression system of the escherichia coli, the additive components comprise: 1.5 mM spermidine and 157.33 mM HEPES; in an in vitro expression system of the escherichia coli, the concentration of the RNase inhibitor is 150U/450 mu L; the volume content of the cell extract in an in-vitro expression system of the escherichia coli is 20-60%.

Further, in the E.coli in vitro expression system, the concentration of PEP was 30 mM; preferably, the NAD content is 0.4 mM; preferably, the magnesium glutamate content is 7.5 mM; preferably, the cell extract is present in an in vitro expression system of E.coli in a volume content of 33.3%.

By combining the process of introducing PCR into protein mutation evolution with protein in-vitro expression and directly utilizing the improved in-vitro expressed protein product to perform performance verification such as enzyme activity and/or corresponding isomer excess percentage and directed evolution screening, the invention proves that the obtained result of the method is consistent with the obtained result of the traditional method through a series of experiments, and proves the feasibility and effectiveness of the improved protein directed evolution method. From the viewpoints of efficiency and stability of results, the stability is high, and the time required for 2-3 weeks is shortened to about 7 hours, so that the evolution speed is greatly increased.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

FIG. 1 shows a standard graph of the invention plotted with different concentrations of sfGFP as model protein in example 1;

FIG. 2 is a graph showing the results of optimizing the concentration of Mg ²⁺ in a protein in vitro expression system in example 1 of the present invention;

FIG. 3 is a graph showing the results of optimizing PEP concentration in an in vitro protein expression system in example 1 of the present invention;

FIG. 4 is a graph showing the results of optimizing the ratio of cell extracts in an in vitro protein expression system in example 1 of the present invention;

FIG. 5 shows a graph of the results of optimizing NAD concentration in a protein in vitro expression system in example 1 of the present invention;

FIG. 6 is a graph showing the results of optimizing the concentration of glutamate in the in vitro protein expression system in example 1 according to the present invention;

FIG. 7 is a graph showing the results of in vitro expression of proteins with different amounts of PCR products of sfGFP gene in example 2 of the present invention;

FIG. 8 is a graph showing the results of in vitro protein expression of PCR products comprising sfGFP genes of different lengths upstream of the start codon in example 2 of the present invention;

FIG. 9 shows a diagram of SDS-PAGE electrophoresis of 8 randomly selected mutants in example 3 of the present invention;

FIG. 10 is a graph showing comparison of ee values of mutants directionally evolved by the method of the present application in example 3 of the present application with those of mutants directionally evolved by the conventional method.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The present application will be described in detail with reference to examples.

As mentioned in the background section, the existing protein directed evolution method has long flow, long time consumption and large fluctuation of results, and in order to improve the situation, the application tries to combine the protein mutation evolution process with the protein in-vitro expression and directly uses the in-vitro expressed protein product to perform performance verification, and the results obtained by the method are consistent with the results obtained by the traditional method through a series of experiments, thus proving the feasibility and effectiveness of the improved protein directed evolution method. From the viewpoints of efficiency and stability of results, the stability is high, and the time required for 2-3 weeks is shortened to about 7 hours, so that the evolution speed is greatly increased.

Based on the above research results, the applicant proposes a series of technical solutions of the present application. In a first exemplary embodiment, a method for directed evolution of a protein is provided, the method comprising: obtaining a PCR product with the gene mutation of the target protein through PCR amplification; placing the PCR product in a protein in-vitro expression system for gene expression to obtain a target protein with mutation; detecting the characteristics of the target protein with mutation, wherein the characteristics detection comprises detection of the activity of target enzyme and/or the excessive percentage of corresponding isomer; wherein the target protein is a target enzyme; the protein in vitro expression system is an escherichia coli in vitro expression system, and the escherichia coli in vitro expression system comprises: basic components, energy-related components, additive components, cell extracts and RNase inhibitors, wherein in an in vitro expression system of escherichia coli, the basic components comprise: the concentration of each amino acid was 2mM of 19 amino acids, 2mM of tyrosine, 14 mM of magnesium acetate, 60 mM of potassium acetate, and 7mM of DDT; in E.coli in vitro expression systems, the energy-related components include: 1.2 mM AMP, 0.85 mM CMP, 0.85 mM GMP, 0.85 mM UMP, 15-83 mM PEP, 0.4-0.6 mM NAD, 4 mM potassium oxalate, 90 mM potassium glutamate, 2.5-10 mM magnesium glutamate; in an in vitro expression system of the escherichia coli, the additive components comprise: 1.5 mM spermidine and 157.33 mM HEPES; in an in vitro expression system of the escherichia coli, the concentration of the RNase inhibitor is 150U/450 mu L; the volume content of the cell extract in an in-vitro expression system of the escherichia coli is 20-60%.

The application directly introduces mutation by a PCR method to obtain a PCR product with mutation, further uses the PCR product with mutation as a template, directly expresses the target protein with mutation by using the improved protein in-vitro expression system, and further directly detects and screens the required properties of the enzyme by using the protein product expressed in-vitro, thereby realizing the directed evolution of the target enzyme. The method has the advantages of simple and convenient flow, stable operation, high efficiency, greatly accelerated evolution speed and low cost, and is particularly suitable for the directed evolution of industrial enzymes.

It should be noted that the in vitro expression system of the protein can be an existing or commercial system. In the above preferred embodiment of the present application, in order to further increase the in vitro expression level, an E.coli in vitro expression system is preferably used, or it may be modified based on the existing E.coli in vitro expression system.

In the improved protein directed evolution method, after the PCR product with mutation is introduced, more than 10 steps of enzyme digestion, connection, transformation, monoclonal selection, monoclonal culture, transfer, induction, expression, centrifugal bacterial harvesting, resuspension, crushing and the like are not needed, and in-vitro expression synthesis is directly carried out, so that the expression product of the target protein with mutation is simply and rapidly obtained. Not only shortens the evolution flow, but also reduces pollution risk, has simple and convenient operation, is not easy to introduce errors, and has high stability of the final result.

In the step of detecting the target protein having a mutation, the detected properties are different depending on the biological properties of the target protein. In addition to the above-mentioned enzyme activity and/or isomer excess percentage, the substrate specificity, the catalytic efficiency and the catalytic reaction temperature may be different, and the reaction stability in different reaction solvents such as organic phase or water may be also different.

In some embodiments, performing a trait test on an enzyme of interest having a genetic mutation comprises: using a plurality of target enzymes with different gene mutations to catalyze the same substrate to react to generate the same product, and detecting the conversion rate of different target enzymes to catalyze the substrate and/or the corresponding isomer excess percentage of the product; the target enzyme with increased conversion and/or corresponding isomer excess is selected from a plurality of target enzymes, with reference to the conversion and/or isomer excess of the catalytic substrate of the initial control enzyme, and is designated as the initial +1 control enzyme.

In other embodiments, after obtaining the initial +1 control enzyme, the method of directed evolution of the enzyme further comprises: and iterating the initial +1 control enzyme into the initial control enzyme, and then repeatedly executing the steps S1 to S3, and so on, thereby obtaining a plurality of target enzymes after directed evolution.

The above-mentioned method for obtaining a PCR product carrying a gene mutation of a target protein by PCR amplification may employ all known methods for realizing mutation by PCR. In some preferred embodiments of the application, this is achieved by any one or more of the following methods:

1) Amplifying by a two-step PCR method to obtain a PCR product with the gene mutation of the target protein;

2 primer pairs were designed: 1) F1 and R1; 2) F2 and R2, introducing a mutation sequence containing mutation sites into 2 primer pairs, and performing a first PCR (polymerase chain reaction) by using 2 primer pairs to respectively obtain fragments L1 and L2 at two sides of the mutation sites, wherein the middle overlapping region of the fragments L1 and L2 is marked as L, and the mutation sites are positioned on the L; then taking the mixture of the fragments L1 and L2 as a template and F1 and R2 as primers, and performing a second step of PCR to obtain a full-length sequence, wherein the full-length sequence is a PCR product with the gene mutation of the target protein; or (b)

2) Constructing a plurality of PCR products with the gene mutation of the target protein by a PCR amplification mutation introducing method according to the principle of site-directed saturation mutation, and constructing a saturated mutant library of the target protein gene by the plurality of PCR products with the gene mutation of the target protein; or (b)

3) Introducing mutation at fixed points by a PCR amplification method, so as to obtain a PCR product with the gene mutation of the target protein; or (b)

4) Carrying out full sequence random mutation by utilizing an error-prone PCR method, so as to obtain a plurality of PCR products with the gene mutation of the target protein, wherein the PCR products with the gene mutation of the target protein cover the random mutation of the full sequence of the gene of the target protein; or (b)

5) PCR products with multiple mutation sites of the gene of the target protein are obtained by utilizing a multipoint mutation method.

The various methods for introducing mutations described above are not particularly modified in the present application, and specific operations are described with reference to the existing methods.

The preferred in vitro protein expression system is optimized, and compared with the in vitro protein expression system in the prior art, the in vitro protein expression system can improve the expression quantity of the protein. Wherein the above cell extract refers to a cell extract of E.coli, which mainly includes ribosomes, RNA polymerase, transcription and translation proteins, and enzymes and cofactors for energy metabolism.

In some preferred embodiments, the concentration of PEP in the e.coli in vitro expression system is 30 mM; preferably, the NAD content is 0.4 mM; preferably, the magnesium glutamate content is 7.5 mM; preferably, the cell extract is present in an in vitro expression system of E.coli in a volume content of 33.3%. The protein expression levels obtained under these preferred conditions are relatively higher.

The target protein in the application can be different proteins according to the actual research purposes. The present application is preferably an industrial protein, in particular an industrial protease. In some preferred embodiments, the industrial protease is selected from any one of the following proteins: the amino acid sequence of the ester protein is shown as SEQ ID NO. 1, the nucleotide sequence is shown as SEQ ID NO. 3) or the aminotransferase TA-1 (the amino acid sequence is shown as SEQ ID NO. 2, and the nucleotide sequence is shown as SEQ ID NO. 4).

SEQ ID NO:1 (amino acid sequence- - -264 aa):

MHSAANAKQQKHFVLVHGGCLGAWIWYKLKPLLESAGHKVTAVDLSAAGINPRRLDEIHTFRDYSEPLMEVMASIPPDEKVVLLGHSFGGMSLGLAMETYPEKISVAVFMSAMMPDPNHSLTYPFEKYNEKCPADMMLDSQFSTYGNPENPGMSMILGPQFMALKMFQNCSVEDLELAKMLTRPGSLFFQDLAKAKKFSTERYGSVKRAYIFCNEDKSFPVEFQKWFVESVGADKVKEIKEADHMGMLSQPREVCKCLLDISDS.

SEQ ID NO:3 (nucleotide sequence- -792 bp):

atgcacagcgctgcaaacgcaaaacaacagaagcacttcgtcctggtccacggtggttgtctgggtgcttggatctggtacaaactgaaacctctgctggagtctgcaggtcataaagtgactgcagttgatctgagcgcagctggtatcaacccacgtcgtctggatgaaattcacactttccgtgattacagcgagccactgatggaagtgatggctagcatcccgccggatgaaaaagtggttctgctgggtcattctttcggtggtatgtctctgggtctggctatggaaacctacccggagaaaatctctgttgctgtgttcatgtccgccatgatgccggatccgaaccactctctgacctatccgtttgaaaagtacaacgagaagtgcccggccgatatgatgctggactctcaattctctacgtacggcaacccggaaaatccgggcatgtctatgatcctgggcccgcagtttatggcgctgaaaatgtttcagaactgtagcgtagaagacctggaactggccaaaatgctgacccgtcctggctccctgtttttccaggacctggcgaaagcgaaaaagttcagcaccgaacgttatggctccgttaaacgcgcgtatattttctgcaacgaagacaaaagcttcccggttgaattccagaaatggttcgtagagtccgttggcgcggacaaagtaaaagaaatcaaagaagcggaccacatgggcatgctgtcccagccgcgcgaagtttgcaaatgcctgctggacatttccgactcc.

SEQ ID NO:2 (amino acid sequence- -341 aa):

MTISKDIDYSTSNLVSVAPGAIREPTPAGSVIQYSDYELDESSPFAGGAAWIEGEYVPAAEARISLFDTGFGHSDLTYTVAHVWHGNIFRLKDHIDRVFDGAQKLRLQSPLTKAEVEDITKRCVSLSQLRESFVNITITRGYGARKGEKDLSKLTSQIYIYAIPYLWAFPPEEQIFGTSAIVPRHVRRAGRNTVDPTVKNYQWGDLTAASFEAKDRGARTAILLDADNCVAEGPGFNVVMVKDGKLSSPSRNALPGITRLTVMEMADEMGIEFTLRDITSRELYEADELIAVTTAGGITPITSLDGEPLGDGTPGPVTVAIRDRFWAMMDEPSSLVEAIEY.

SEQ ID NO:4 (nucleotide sequence- -1035 bp):

atgaccattagcaaagacattgactatagcaccagcaacctggtgagtgtggccccgggtgcaatccgtgaacctaccccggcaggcagcgtgatccagtacagtgactacgagctggatgaaagcagcccgtttgccggtggtgcagcctggattgaaggtgagtatgttccggcagcagaggcccgtattagcctgtttgataccggcttcggccatagcgatctgacctacaccgttgcccatgtttggcacggcaacatctttcgcctgaaagaccacattgaccgcgtgtttgatggcgcccagaaactgcgtctgcagagcccgctgaccaaggccgaagtggaggatattaccaaacgctgcgtgagcctgagtcagctgcgcgagagcttcgtgaacatcaccattacccgcggttatggcgcccgcaaaggcgagaaagatctgagcaaattaaccagccagatctacatctacgccatcccgtacctgtgggcctttcctccggaagagcagatcttcggtacaagtgccattgtgccgcgtcatgttcgtcgcgcaggccgtaataccgttgatcctaccgttaagaactaccagtggggtgatctgaccgcagcttcttttgaagcaaaagatcgtggcgcccgcaccgcaatcctgctggatgcagacaactgtgtggccgagggtccgggctttaacgtggtgatggtgaaggatggcaaactgagtagcccgagccgtaatgccctgccgggtattacacgtctgaccgtgatggagatggccgatgaaatgggcatcgaattcaccctgcgcgatatcaccagccgtgagttatatgaggccgacgaactgatcgccgtgaccaccgcaggtggcattaccccgattaccagtctggatggcgaaccgctgggcgatggtacccctggtcctgtgacagtggccattcgcgatcgcttttgggccatgatggatgagccgagcagtctggtggaggccattgaatat.

In a second exemplary embodiment of the present application, there is provided a protein in vitro expression system, which is an in vitro expression system of E.coli, comprising: basic components, energy-related components, additive components, cell extracts and RNase inhibitors, wherein in an in vitro expression system of escherichia coli, the basic components comprise: the concentration of each amino acid is 2mM of 19 amino acids; 2mM tyrosine; 14 mM magnesium acetate; 60mM potassium acetate, DDT of 7 mM; in E.coli in vitro expression systems, the energy-related components include: 1.2 mM AMP, 0.85 mM CMP, 0.85 mM GMP, 0.85 mM UMP, 15-83 mM PEP, 0.4-0.6 mM NAD, 4mM potassium oxalate, 90 mM potassium glutamate, 2.5-10 mM magnesium glutamate; in an in vitro expression system of the escherichia coli, the additive components comprise: 1.5 mM spermidine and 157.33 mM HEPES; in an in vitro expression system of the escherichia coli, the concentration of the RNase inhibitor is 150U/450 mu L; the volume content of the cell extract in an in-vitro expression system of the escherichia coli is 20-60%.

In some preferred embodiments, the concentration of PEP in the above-described escherichia coli in vitro expression system is 30 mM; preferably, the NAD content is 0.4 mM; preferably, the magnesium glutamate content is 7.5 mM; preferably, the cell extract is present in an in vitro expression system of E.coli in a volume content of 33.3%.

In the prior art, there are also in vitro protein expression systems reported in individual literature, whose protein expression levels are generally low, and in addition to model proteins such as green fluorescent protein GFP or variants thereof, other biocatalytic enzyme proteins are reported to be less. The yield of proteins expressed by the in vitro expression kits currently on the market is also at the level of tens of mg/mL, and such proteins are mostly used in proteomics research, rarely in catalytic application research of industrial enzymes. The optimized in-vitro protein expression system can reach the level of 1-2 mg/mL for a plurality of enzyme proteins under the condition of no specific optimization, and is expected to obtain a higher protein expression level through further optimization, so that the requirement of high-throughput screening of enzymes can be met.

In a third exemplary embodiment of the present application, there is provided a kit for directed evolution of a protein, the kit comprising an in vitro protein expression system, the in vitro protein expression system being an in vitro E.coli expression system comprising: basic components, energy-related components, additive components, cell extracts and RNase inhibitors, wherein in an in vitro expression system of escherichia coli, the basic components comprise: the concentration of each amino acid is 2mM of 19 amino acids; 2mM tyrosine; 14 mM magnesium acetate; 60 mM potassium acetate, DDT of 7 mM; in E.coli in vitro expression systems, the energy-related components include: 1.2 mM AMP, 0.85 mM CMP, 0.85 mM GMP, 0.85 mM UMP, 15-83 mM PEP, 0.4-0.6 mM NAD, 4mM potassium oxalate, 90 mM potassium glutamate, 2.5-10 mM magnesium glutamate; in an in vitro expression system of the escherichia coli, the additive components comprise: 1.5 mM spermidine and 157.33 mM HEPES; in an in vitro expression system of the escherichia coli, the concentration of the RNase inhibitor is 150U/450 mu L; the volume content of the cell extract in an in-vitro expression system of the escherichia coli is 20-60%.

The advantageous effects of the present application will be further described below in connection with specific examples.

In the following examples, unless otherwise specified, the in vitro protein expression systems used were all those of E.coli, and the experiments were carried out using sfGFP (superfolder Green fluorescent protein) as an example.

Example 1:

Establishment and optimization of cell-free protein synthesis system (also called in vitro protein expression system in the present application) (1 mL system)

Table 1:

Final concentration of Solution a in table 1: 1.2mM ATP, 0.85 mM GMP, 0.85 mM UMP,0.85 mM CMP, 31.50 ug/mL folinic acid, 170.60 ug/mL tRNA,0.40 mM NAD, 0.27 mM Coprotein A (CoA), 4mM oxalic acid, 1mM Ding Eran, 1.50 mM spermidine, 57.33 mM HEPES buffer.

Final concentration of Solution B: 10 mM Mg (Glu) ₂, 10 mM NH₄ (Glu), 130 mM K (Glu), 2mM 20 amino acids, 0.03M phosphoenolpyruvate (PEP).

The system was reacted at 30 ℃,220 rpm, 16: 16 h.

The optimization of the expression system in table 1 was performed with reference to the in vitro expression systems (systems 1 to 3) disclosed in table 2 using sfGFP as a model protein, specifically including optimization of Mg ²⁺ concentration, optimization of PEP concentration, optimization of the cell extract ratio in the whole reaction system, optimization of the NAD usage amount, and optimization of the glutamate concentration.

Fluorescence intensity detection: excitation light 485 nm, emission light 525 nm, detection of 50 μl of the system in 96 well plates. The standard curve is plotted as shown in figure 1. The detection results under different optimization conditions of each parameter are shown in fig. 2 to 6 respectively.

1) As shown in FIG. 2, the optimization result of the concentration of Mg ²⁺ in the system shows that the in vitro expression system of the Mg ²⁺ protein with the concentration of 2.5 mM to 19.5 mM can generate sfGFP, and the effect is better when the concentration of Mg ²⁺ is about 2.5 mM to 10 mM, and the effect is about 7.5 mM.

2) As shown in FIG. 3, the concentration optimization result of the system PEP shows that the concentration of the PEP is 5mM to 83mM, the in-vitro protein expression system can generate sfGFP, the protein synthesis amount is better when the concentration of the PEP is 15 to 83mM, and the protein synthesis amount of sfGFP is highest when the concentration of the PEP is 30 to mM.

3) The optimization result of the ratio of the cell extract in the whole reaction system is shown in fig. 4, the ratio of the cell extract in the whole reaction system is 20-60%, the target protein can be produced, and better results can be obtained after the ratio exceeds 33%.

4) As shown in FIG. 5, the optimization of NAD usage plays an important role in energy cycle, and NAD concentration is best expressed at 0.6 mM and is not much different from 0.4 mM.

5) The concentration optimization result of the glutamate is shown in fig. 6, and the glutamate has no effect in the whole experimental process and better effect without addition from the experimental result.

Thus, the in vitro expression system of the application in the last column in Table 2 was obtained after optimization of the above parameters. The pH of the system is 7-8, typically pH7.5. The reaction time is 3-16 h, typically 4 h.

Table 2:

The cell extracts in the above table were obtained by the following method:

Activated strain BL21 Star (DE 3), streaked to give single colonies. BL21 Star (DE 3) after inoculation and activation was monoclone into 50ml LB liquid medium and cultured at 37℃over 200 rpm overnight. BL21 Star (DE 3) was grown overnight in 400 ml 2 XYT medium to an initial OD600 = 0.1. IPTG was added to a final concentration of 0.5mm and incubated at 37 ℃ until od600=3.8 to 4.0. Collecting bacterial mud: 5000 g,10 min by centrifugation at 10 ℃. The supernatant was slowly poured and the bacterial sludge was transferred to a 50ml pre-chilled centrifuge tube. Cells were resuspended by adding 30ml of S30 buffer to a 50ml centrifuge tube. 5000 g, centrifuging at 10 ℃ for 10 min, removing supernatant, and drying the water in the centrifuge tube by using clean filter paper. Every 0.6 gram of bacterial mud is added with 1 ml precooled S30 buffer. The cells were resuspended, sonicated, and 65. Mu.l of 1M DTT to 5ml cell lysates were added. 12000 rpm, centrifuge at 4℃10 min. Storing at-80deg.C for use.

Example 2

The mutation is introduced by a two-step PCR method, so that the gene fragment with the mutation site can be directly obtained and can be directly used for in-vitro expression of the protein. The method omits a series of operations such as intermediate PCR product recovery, protein cutting, connection, transformation, monoclonal culture, sequencing, shake flask culture or pore plate culture, induction, centrifugal bacterial collection, crushing, centrifugal supernatant taking and the like, and can directly obtain mutant protease from the PCR product with mutation.

Introducing mutation points by a two-step PCR method, designing primers near the mutation points, introducing mutation sequences containing the mutation points into the primer sequences, respectively carrying out PCR to obtain fragments at two sides of the mutation points by a first-step PCR, then mixing two products of the first-step PCR to be used as templates during the second-step PCR, adding the primers at two ends, and carrying out PCR to obtain full-length sequences which can be used as templates of a protein in-vitro expression system to carry out in-vitro expression.

In a 450 mu L protein in-vitro expression system, PCR products of sfGFP genes are used as DNA templates, and different amounts of PCR products are respectively added to carry out protein in-vitro expression. From FIG. 7, it can be found that the amount of the protein produced by the protein in-vitro expression system is relatively equal after the PCR product exceeds 22.5 mu L until the addition amount of the PCR product reaches 90 mu L, which indicates that the addition amount of the PCR product fluctuates within the range, and the target protein can be produced in a relatively parallel manner.

Further, the PCR products of sfGFP gene are used as DNA templates, different PCR products respectively comprise different lengths upstream of the initiation codon, and respectively comprise 0bp, 50bp, 100bp, 115bp, 130bp and 140bp, when the PCR products are used as the DNA templates in a reaction system, the influence of the length upstream of the initiation codon on the sfGFP expression result is shown in figure 8, the influence of the length 50 bp-140 bp upstream and the expression quantity of sfGFP protein is not great, wherein the protein expression quantity is highest when the length upstream of the initiation codon is 50-100 bp.

Example 3

Site-directed saturation mutagenesis is a means for constructing mutants commonly used in directed evolution of proteins, and is used in various mutation means such as semi-rational design, random mutation, even rational mutation, or simplified codon mutation.

In this example, site-directed saturation mutagenesis was performed using a protein in vitro expression system. The amino acid sequence of the para-amino acid is shown as SEQ ID NO:1, and the catalytic proteolytic reaction of the ester protein Asym-503029 is shown in the reaction formula:

(reaction formula I)

Asymchem-503029 is active on the target substrate, however its stereoselectivity is not good enough and ee is about 61%. And (3) selecting a G19 locus to carry out saturation mutation according to a computer structure simulation result, respectively synthesizing 19 primers of the G19 locus, introducing mutation by using a PCR method, directly using a PCR product to carry out in-vitro synthesis of the protein in an in-vitro protein expression system, and directly using the in-vitro protein expression system to verify a protein catalytic reaction.

FIG. 9 shows the results of electrophoresis of 8 mutants randomly picked therein (molecular markers, G19D, G19A, G19Y, G19H, G19N, G19M, G F and G19S, in order from left to right). As can be seen from SDS-PAGE of FIG. 9, the different mutants produced good uniformity of protein, and protein concentration was calculated by using a Bio-Rad gel imaging system, and the protein production amounts of the mutants were 1.5.+ -. 0.1 mg/mL.

Table 3: the in vitro protein expression system is used for the reaction result of G19 site saturation mutation.

As can be seen from the reaction results shown in the above table, the mutant G19S greatly improved the ee value of the reaction to 75.73%, and the conversion rate was doubled to 33.96%. Meanwhile, the comparison reaction is carried out by using the traditional method of PCR-protein cleavage connection-transformation-selection monoclonal-shake flask culture-ultrasonic disruption, and as can be seen from FIG. 10, the reaction result of the protein in-vitro expression system of the application and the reaction result of the shake flask are well correlated, which shows that the method is applicable to directed evolution of the protein.

However, the two methods differ significantly in efficiency, 3h for PCR, 3h for in vitro expression of the protein, and 1h for catalytic reaction of the protein, and a total of 7 hours have achieved the complete process from gene to protein to performance detection, which typically takes 2-3 weeks with conventional methods. Therefore, the protease evolution method can greatly improve the efficiency of evolution screening.

Example 4:

In this example, site-directed mutagenesis was performed using a protein in vitro expression system. Site-directed mutagenesis is the most commonly used method in directed evolution of proteins for rational design, semi-rational design, superposition of mutation sites, and the like.

As shown in example 3, asymchem-503029 was active on the target substrate, however its stereoselectivity was not good enough and ee was about 61%. For SEQ ID NO:1, and site-directed mutagenesis was performed at the G19S site of the ester protein Asym-503029.

Synthesizing a G19S primer, introducing mutation by using a PCR method, and simultaneously amplifying a Asymchem-503029 female parent fragment by using a conventional primer (primer without the G19S mutation) as a control, wherein products of the two PCR are directly used for synthesizing a protein by using a protein in-vitro expression system, and the protein in-vitro expression system after the protein synthesis is directly used for verifying a protein catalytic reaction.

As a result, the ee value of the parent was 61.1% and the conversion was 15.4% as shown in example 3. The ee value of the mutant G19S was 75.7%, and the conversion was 33.9%.

In terms of efficiency, in this example, PCR time is 3 h, protein in vitro expression is 3 h, protein catalytic reaction is 1h, and complete process from gene to protein to trait is achieved in 7 hours in total, whereas site-directed mutagenesis generally requires 1-2 weeks by conventional methods.

Example 5:

random mutation and multipoint mutation of the whole sequence are carried out by using an in vitro protein expression system.

The method is used for random mutation, the highest probability is that the error-prone PCR is used, the stock building capacity of the error-prone PCR is large, but the screening flux is limited, the mutants screened by the error-prone PCR are generally about 1000-2000, many of the mutants are repeated mutation due to probability distribution, and the mutation positions and the mutation amino acids are not uniformly distributed due to the preference of the PCR on bases.

For industrial proteins, the number of amino acids in the protein is typically between about 300. 5 primers are designed at each site, the site is mutated into 5 amino acids representing different properties, alanine A (if the site is originally A, the mutation is G), serine S, lysine K, aspartic acid D and phenylalanine F respectively represent 5 types of less steric hindrance amino acids, polar amino acids, positive electric amino acids, negative electric amino acids and aromatic amino acids, so that error-prone PCR can be solved by global PCR.

SEQ ID NO:2, the aminotransferase protein TA-1 shown in the formula II has higher selectivity to a substrate (shown in the reaction formula II), but has poorer activity and larger protein usage amount.

(Reaction formula II)

3 Mutants, L76A, S125A, A226G, were obtained by measuring the activity using global PCR instead of error-prone PCR, and the protein activity was improved (see Table 4).

Then, a 3-point mutant is directly constructed by a multi-point mutation method, four PCR product fragments from T7 to L76A, L A to S125A, S A to A226G and from A226G to T7 terminal are obtained by PCR respectively, the four PCR product fragments are used for over-lap PCR of the second step, the product of the second step is directly used for an in vitro protein expression system to obtain a mutant L76A+S125A+A226G of the target 3-point mutation, and the mutant L76A+S125A+A226G is used for protein activity determination.

In this example, the combination of the first round of irrational evolution plus the second round of mutation sites was shared for 1 week, whereas the conventional method required 1.5 to 2 months.

Table 4:

From the above description, it can be seen that the above embodiments of the present application achieve the following technical effects: the application uses the protein in vitro synthesis system to accelerate the directed evolution of the protein, introduces mutation by using a PCR method, combines the protein in vitro expression system to carry out the directed evolution of the protein, and can be used for all the current protein evolution means including irrational design, rational design and semi-rational design. The method has the advantages that the steps are simple and convenient, the target mutant can be obtained in total within 6-8 h only by using PCR and directly using the PCR product for in vitro expression of the protein in 2 steps, in addition, the operation of microorganisms is not involved, the operation difficulty and risk are greatly reduced, and the experimental result has high parallelism and good robustness.

Compared with the existing directed evolution method, the method has the following advantages:

1) The protein in vitro expression system has higher synthesized protein amount, and can be used for the evolution of protein.

2) The protein evolution effect data of the invention is consistent with the effect data obtained by the traditional method.

3) Compared with the traditional experiment, the protein evolution method provided by the invention saves more than 80% of time and greatly accelerates the protein evolution speed.

4) The evolution method provided by the invention has the advantages of simple steps and better parallelism of the obtained data.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for directed evolution of proteins, the method comprising:

S1, obtaining a PCR product with the gene mutation of target protein through PCR amplification;

s2, placing the PCR product in a protein in-vitro expression system for gene expression to obtain target protein with the gene mutation;

s3, detecting the characteristics of the target protein with the gene mutation, wherein the characteristics detection comprises detection of the activity and/or the corresponding isomer excess percentage of the target enzyme;

Wherein the target protein is a target enzyme;

the in vitro expression system of the protein is an in vitro expression system of escherichia coli, and the in vitro expression system of the escherichia coli comprises: basic components, energy related components, additive components, cell extract and RNase inhibitor, wherein,

In the escherichia coli in-vitro expression system, the basic components comprise: the concentration of each amino acid was 2mM of 19 amino acids, 2mM of tyrosine, 14 mM of magnesium acetate, 60 mM of potassium acetate, and 7 mM of DDT;

In the E.coli in vitro expression system, the energy-related components include: 1.2 mM AMP, 0.85 mM CMP, 0.85 mM GMP, 0.85 mM UMP, 15-83 mM PEP, 0.4-0.6 mM NAD, 4 mM potassium oxalate, 90 mM potassium glutamate, 2.5-10 mM magnesium glutamate;

in the escherichia coli in-vitro expression system, the additive components comprise: 1.5 mM spermidine and 157.33 mM HEPES;

In the in vitro expression system of the escherichia coli, the concentration of the RNase inhibitor is 150U/450 mu L;

the volume content of the cell extract in the in-vitro expression system of the escherichia coli is 20-60%.

2. The method according to claim 1, wherein the PCR product with the mutation of the gene of the target protein is obtained by PCR amplification using any one or more of the following methods:

2 primer pairs were designed: f1 and R1; f2 and R2, introducing mutation sequences containing mutation sites into 2 primer pairs, and performing first-step PCR (polymerase chain reaction) by using 2 primer pairs to respectively obtain fragments L1 and L2 on two sides of the mutation sites, wherein an overlapping region between the fragments L1 and L2 is marked as L, and the mutation sites are positioned on the L; then taking the mixture of the fragments L1 and L2 as a template and F1 and R2 as primers, and performing a second step of PCR to obtain a full-length sequence, wherein the full-length sequence is the PCR product with the gene mutation of the target protein; or (b)

2) Constructing a plurality of PCR products with target protein gene mutation by a PCR amplification mutation introducing method according to the principle of site-directed saturation mutation, and constructing a saturation mutant library of the target protein gene by the plurality of PCR products with target protein gene mutation; or (b)

4) Carrying out random mutation on the whole sequence of the gene by using an error-prone PCR method, thereby obtaining a plurality of PCR products with the gene mutation of the target protein, wherein the PCR products with the gene mutation of the target protein cover the random mutation of the whole sequence of the gene of the target protein; or (b)

5) And obtaining PCR products with a plurality of mutation sites of the target protein gene by utilizing a multipoint mutation method.

3. The method of claim 1, wherein the PEP is present in a concentration of 30 mM in the escherichia coli in vitro expression system;

the concentration of NAD is 0.4 mM;

the concentration of the magnesium glutamate is 7.5 mM;

the volume content of the cell extract in the in-vitro expression system of the escherichia coli is 33.3 percent.

4. The method of claim 1, wherein the enzyme of interest is selected from the group consisting of industrial proteases.

5. The method of claim 4, wherein the industrial protease is SEQ ID NO:1 or the amino acid sequence of SEQ ID NO:2 and a transaminase TA-1.

6. The method of claim 1, wherein performing a trait test on the enzyme of interest having the genetic mutation comprises:

Using a plurality of said target enzymes with different said genetic mutations to catalyze the same substrate reaction to produce the same product, detecting the conversion rate of different said target enzymes catalyzing said substrate and/or the percent of enantiomeric excess of said product;

the target enzyme with increased conversion and/or corresponding percent isomer excess is selected from a plurality of the target enzymes, with reference to the conversion and/or percent isomer excess of the substrate catalyzed by the initial control enzyme, and is designated as the initial +1 control enzyme.

7. The method of claim 6, wherein after obtaining the initial +1 control enzyme, the method further comprises: and (3) iterating the initial +1 control enzyme into the initial control enzyme, and then repeatedly executing the steps S1 to S3, and so on, so as to obtain a plurality of target enzymes after directed evolution.