CN119296640A - Method, device and related equipment for screening mutant proteins - Google Patents
Method, device and related equipment for screening mutant proteins Download PDFInfo
- Publication number
- CN119296640A CN119296640A CN202411836877.4A CN202411836877A CN119296640A CN 119296640 A CN119296640 A CN 119296640A CN 202411836877 A CN202411836877 A CN 202411836877A CN 119296640 A CN119296640 A CN 119296640A
- Authority
- CN
- China
- Prior art keywords
- protein
- simulated
- value
- mutant
- screening
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012216 screening Methods 0.000 title claims abstract description 153
- 102000008300 Mutant Proteins Human genes 0.000 title claims abstract description 135
- 108010021466 Mutant Proteins Proteins 0.000 title claims abstract description 135
- 238000000034 method Methods 0.000 title claims abstract description 86
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 209
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 209
- 238000011156 evaluation Methods 0.000 claims abstract description 109
- 238000010606 normalization Methods 0.000 claims abstract description 105
- 238000004364 calculation method Methods 0.000 claims abstract description 70
- 230000035772 mutation Effects 0.000 claims abstract description 67
- 238000012545 processing Methods 0.000 claims abstract description 64
- 150000001413 amino acids Chemical class 0.000 claims abstract description 42
- 230000003044 adaptive effect Effects 0.000 claims abstract description 33
- 230000012846 protein folding Effects 0.000 claims abstract description 33
- 238000004458 analytical method Methods 0.000 claims abstract description 8
- 238000012163 sequencing technique Methods 0.000 claims description 86
- 230000008859 change Effects 0.000 claims description 21
- 101710093543 Probable non-specific lipid-transfer protein Proteins 0.000 claims description 17
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 10
- 238000002759 z-score normalization Methods 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 238000012935 Averaging Methods 0.000 claims description 4
- 101710141454 Nucleoprotein Proteins 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 239000012633 leachable Substances 0.000 claims 4
- 230000000694 effects Effects 0.000 abstract description 26
- 238000013461 design Methods 0.000 abstract description 15
- 230000009466 transformation Effects 0.000 abstract description 8
- 239000000463 material Substances 0.000 abstract description 3
- 238000005259 measurement Methods 0.000 abstract 1
- 238000004590 computer program Methods 0.000 description 16
- 230000006870 function Effects 0.000 description 16
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 15
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 15
- 230000008569 process Effects 0.000 description 12
- 239000003446 ligand Substances 0.000 description 8
- 238000011160 research Methods 0.000 description 8
- 238000004088 simulation Methods 0.000 description 8
- 238000003860 storage Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- 230000006872 improvement Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 238000003672 processing method Methods 0.000 description 4
- 230000009145 protein modification Effects 0.000 description 4
- 239000002131 composite material Substances 0.000 description 3
- 229910052739 hydrogen Inorganic materials 0.000 description 3
- 239000001257 hydrogen Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 238000006664 bond formation reaction Methods 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012268 genome sequencing Methods 0.000 description 2
- 238000013468 resource allocation Methods 0.000 description 2
- 238000004904 shortening Methods 0.000 description 2
- 239000002904 solvent Substances 0.000 description 2
- 238000011425 standardization method Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 101100272279 Beauveria bassiana Beas gene Proteins 0.000 description 1
- 230000004543 DNA replication Effects 0.000 description 1
- 230000006820 DNA synthesis Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 238000002425 crystallisation Methods 0.000 description 1
- 230000008025 crystallization Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000013401 experimental design Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 229920002521 macromolecule Polymers 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000000455 protein structure prediction Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 238000012772 sequence design Methods 0.000 description 1
- 238000013097 stability assessment Methods 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 238000002424 x-ray crystallography Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a mutein screening method, a mutein screening device and related equipment, and relates to the technical field of material measurement and analysis. The method comprises the steps of respectively carrying out simulated single mutation on a plurality of amino acid sites in a protein sequence of target protein to obtain a plurality of simulated mutant proteins, determining adaptive evaluation values F of the plurality of simulated mutant proteins based on a target protein language model, determining structural stability values S of the plurality of simulated mutant proteins based on a protein folding energy calculation tool, respectively carrying out normalization processing on the adaptive evaluation values F and the structural stability values S, respectively carrying out single index sorting and comprehensive index sorting on normalization processing results, and screening the plurality of simulated mutant proteins according to target screening quantity, single index sorting results and comprehensive index sorting results to obtain the target mutant proteins. The invention can reduce the difficulty of protein design transformation, improve the screening effect of mutant proteins, reduce the calculation amount required by protein transformation and screening and reduce the cost.
Description
Technical Field
The invention relates to the technical field of material determination and analysis, in particular to a mutant protein screening method, a mutant protein screening device and relevant equipment thereof.
Background
Proteins are one of the most diverse and important macromolecules in living bodies, and play a vital role in the fields of environment, industrial production, medicine, materials and the like. However, in practical application, natural proteins often cannot meet complex application requirements, and thus methods and means are required to make specific mutant engineering designs for proteins. In view of the problems of high experimental design cost, long experimental period, large difficulty in physical and chemical property analysis and the like, the protein mutation optimization transformation method designed by simulation calculation is considered to have great potential in the field of protein engineering.
However, how to screen out muteins with potentially improved properties is a current urgent problem to be solved by the large number of muteins obtained by simulation. The mutant protein screening scheme in the related technology has the problems of high mutant protein design and transformation difficulty, poor mutant protein screening effect, large calculation amount required by protein transformation and screening, long test period, high cost and the like.
In view of the above problems in the related art, no effective solution has been proposed at present.
Disclosure of Invention
The mutein screening method and the related equipment at least solve the problems of high mutein design and transformation difficulty, poor mutein screening effect, large calculation amount required by protein transformation and screening, long test period and high cost in the related technology.
In order to solve the above problems, according to one aspect of the embodiments of the present invention, there is provided a mutant protein screening method comprising:
Respectively carrying out simulated single mutation on a plurality of amino acid sites in a protein sequence of a target protein to obtain a plurality of simulated mutant proteins, and carrying out adaptability prediction on simulated single mutant sequences corresponding to the simulated mutant proteins based on a target protein language model to obtain an adaptability evaluation value F of each simulated single mutant sequence, wherein any one of the simulated single mutant sequences comprises a single mutant protein sequence corresponding to the amino acid sites;
Determining simulated single mutant structures respectively corresponding to the simulated mutant proteins, respectively calculating folding energy change values of the simulated single mutant structures before and after protein mutation based on a protein folding energy calculation tool, and determining a structural stability value S of each simulated single mutant structure based on the folding energy change values;
respectively carrying out normalization processing on the adaptability evaluation value F and the structural stability value S, and respectively carrying out single index sequencing and comprehensive index sequencing on normalization processing results;
and screening the plurality of simulated muteins according to the target screening quantity, the single index sequencing result and the comprehensive index sequencing result to obtain the target mutein.
In some embodiments, the step of respectively performing normalization processing on the adaptability evaluation value F and the structural stability value S, and respectively performing single index ranking and comprehensive index ranking on the normalization processing result includes:
respectively carrying out normalization treatment on the adaptability evaluation value F and the structural stability value S based on a minimum-maximum normalization method to obtain a first adaptability evaluation normalization value F 'and a first structural stability normalization value S';
Sorting the first adaptability evaluation normalization value F 'and the first structural stability normalization value S' respectively to obtain a single index sorting result comprising a first adaptability sorting value RF and a first structural stability sorting value RS;
And calculating a sorting average value of a first adaptive sorting value RF and a first structural stability sorting value RS corresponding to each simulated mutant protein, and sorting the sorting average values of the simulated mutant proteins to obtain a first comprehensive index sorting result R.
In some of these embodiments, the method further comprises:
Respectively carrying out normalization treatment on the adaptability evaluation value F and the structural stability value S based on a Z-Score normalization method to obtain a second adaptability evaluation normalization value F 'and a second structural stability normalization value S';
calculating the comprehensive index value of each simulated mutant protein according to the second adaptability evaluation normalization value F ', the second structural stability normalization value S', the adaptability index weight coefficient and the structural stability index weight coefficient;
and sequencing the comprehensive index values to obtain a second comprehensive index sequencing result K.
In some embodiments, the number of protein folding energy calculation tools is N, the number of the structural stability values S calculated by the simulated muteins based on the protein folding energy calculation tools is N, and the step of normalizing the structural stability values S corresponding to the simulated muteins comprises:
respectively carrying out normalization treatment on N groups of structural stability values Sn obtained by calculation of N protein folding energy calculation tools, wherein N is E [1, N ];
And carrying out averaging treatment on N structural stability normalization values Sn' corresponding to any simulated mutant protein to obtain a structural stability normalization treatment result corresponding to the simulated mutant protein.
In some embodiments, the target screening number comprises a single index screening number and a composite index screening number, and the step of screening the plurality of simulated muteins according to the target screening number, the single index ranking result and the composite index ranking result to obtain the target mutein comprises:
determining a first target mutant protein according to the single index screening quantity, the first adaptive ranking value RF and the first structural stability ranking value RS;
And removing the first target mutant protein in the comprehensive index sequencing result, and determining a second target mutant protein according to the comprehensive index screening quantity and the comprehensive index sequencing result, wherein the first target mutant protein and the second target mutant protein are collected to obtain the target mutant protein, and the comprehensive index sequencing result comprises a first comprehensive index sequencing result R and/or a second comprehensive index sequencing result K.
In some of these embodiments, prior to the step of determining the mutein of interest, the method further comprises:
and removing the simulated mutant proteins with the adaptability evaluation value F smaller than 0 and the structural stability value S larger than 0 from the comprehensive index sequencing result.
In some embodiments, the method further comprises the step of constructing a target protein language model and optimizing parameters:
The method comprises the steps of obtaining a basic protein language model, replacing a position coding layer of the basic protein language model with a learnable embedded code, and constructing a fully-connected neural network layer in the basic protein language model to obtain a target protein language model, wherein the learnable embedded code is used for capturing distance information in an amino acid sequence;
And performing unsupervised training on the target protein language model to realize model parameter tuning processing on the target protein model.
In order to solve the above problems, according to one aspect of the embodiments of the present invention, there is provided a mutein screening apparatus comprising:
The system comprises an adaptability evaluation value calculation module, a target protein language model and a target protein analysis module, wherein the adaptability evaluation value calculation module is used for respectively carrying out simulated single mutation on a plurality of amino acid sites in a protein sequence of the target protein to obtain a plurality of simulated mutant proteins, and carrying out adaptability prediction on simulated single mutant sequences corresponding to the simulated mutant proteins based on the target protein language model to obtain an adaptability evaluation value F of each simulated single mutant sequence, wherein any one of the simulated single mutant sequences comprises a single mutant protein sequence corresponding to the amino acid site;
the structure stability value calculation module is used for determining simulated single mutant structures corresponding to the simulated mutant proteins respectively, calculating folding energy change values of the simulated single mutant structures before and after protein mutation based on the protein folding energy calculation tool respectively, and determining a structure stability value S of each simulated single mutant structure based on the folding energy change values;
The data processing module is used for respectively carrying out normalization processing on the adaptability evaluation value F and the structural stability value S, and respectively carrying out single index sequencing and comprehensive index sequencing on normalization processing results;
And the screening module is used for screening the plurality of simulated muteins according to the target screening quantity, the single index sequencing result and the comprehensive index sequencing result to obtain the target muteins.
In order to solve the above problems, in one aspect of the embodiments of the present invention, there is provided an electronic device including a processor, and a memory storing a program including instructions that when executed by the processor cause the processor to perform any one of the mutein screening methods described above.
To solve the above problems, in one aspect of the embodiments of the present invention, there is provided a non-transitory machine-readable medium storing computer instructions for causing a computer to execute any one of the mutant protein screening methods described above.
The method has the advantages that multiple simulated single mutant proteins are obtained by adopting simulated single mutation of multiple amino acid loci in a protein sequence of target protein, the simulated single mutant sequences corresponding to the multiple simulated mutant proteins are subjected to adaptability prediction based on a target protein language model to obtain the adaptability evaluation value F of each simulated single mutant sequence, any one of the simulated single mutant sequences comprises a single mutant protein sequence corresponding to the amino acid loci, the simulated single mutant structures corresponding to the multiple simulated mutant proteins are determined, folding energy change values of the multiple simulated single mutant structures before and after protein mutation are calculated based on a protein folding energy calculation tool, structural stability values S of the multiple simulated single mutant structures are determined based on the folding energy change values, the adaptability evaluation value F and the structural stability values S are subjected to normalization processing respectively, the single index sequencing and the comprehensive index sequencing are respectively carried out on the normalization processing results, the multiple simulated mutant proteins are screened according to the target screening quantity, the single index sequencing result and the comprehensive index sequencing result, the adaptability evaluation value of the simulated mutant proteins is determined based on the protein language model, the protein structure stability of the simulated single mutant structures is calculated, the protein stability evaluation value is calculated based on the protein folding energy calculation tool, the protein structure improvement effect is reduced, the protein improvement effect is achieved, the protein improvement is achieved, and the protein modification effect is reduced, and the protein modification effect is improved, and the protein modification is calculated.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the invention, from which other embodiments can be obtained for a person skilled in the art without inventive effort.
FIG. 1 is a schematic flow chart of a method for screening a mutein according to an embodiment of the invention;
FIG. 2 is a schematic flow chart of a method for screening a mutant protein according to still another embodiment of the present invention;
FIG. 3 is a schematic diagram of performing data processing based on a min-max normalization method according to one embodiment of the invention;
FIG. 4 is a schematic diagram of performing data processing based on a Z-Score normalization method according to one embodiment of the present invention;
FIG. 5 is a schematic illustration of a target mutein screening performed based on determined data according to an embodiment of the invention;
FIG. 6 is a schematic diagram showing the main frame of a mutein screening apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural view of the electronic device of the present invention.
Detailed Description
Embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the invention is susceptible of embodiment in the drawings, it is to be understood that the invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the invention. It should be understood that the drawings and embodiments of the invention are for illustration purposes only and are not intended to limit the scope of the present invention.
Related studies have shown that the function of a protein is determined by the structure of the protein, which is formed by folding its amino acid sequence, and thus, in the design of mutant modifications of the protein, the design is often based on the structure of the protein or on the sequence of the protein. However, in the related art, the design of mutation is not always sufficient to consider the properties of the whole protein, only from the viewpoint of the protein structure or only from the viewpoint of the protein sequence. For example, mutant engineering is only based on protein structure, and most studies can only consider certain critical local areas because of the high computational cost and long time consumption caused by the complete structural simulation optimization process. The mutation engineering design is only based on the protein sequence, and the characteristics of all amino acid sites in the global scope can be rapidly considered, but the key structural information causing the final function of the protein is deleted to a certain extent. While the design factors to be considered are very complex if both the protein structure information and the protein sequence information are taken into consideration, many of the solutions provided by the related art require designers to have a rich prior knowledge of the physicochemical properties of the protein to be designed, and also require a rich practical experience in the protein design method. Moreover, since the designed proteins are diverse, it is difficult to sufficiently understand the properties of each designed protein. Therefore, how to combine the calculated key important protein structural feature information with the protein sequence design method capable of considering the global situation has very important research significance for the engineering and the reconstruction of the computer-aided protein engineering.
In order to solve the above problems, an embodiment of the present invention provides a method for screening a mutant protein, as shown in fig. 1, which mainly includes:
step S101, respectively carrying out simulated single mutation on a plurality of amino acid sites in a protein sequence of a target protein to obtain a plurality of simulated mutant proteins, carrying out adaptability prediction on simulated single mutant sequences corresponding to the simulated mutant proteins based on a target protein language model to obtain an adaptability evaluation value F of each simulated single mutant sequence, wherein any one of the simulated single mutant sequences comprises a single mutant protein sequence corresponding to the amino acid site.
The invention firstly considers the problem of protein performance optimization (screening out proper mutant proteins) as the problem of protein adaptability search, namely, protein mutants with high adaptability (namely, the mutant proteins) are assumed to be mutants with potential functional improvement.
By performing simulated single mutation on a plurality of amino acid sites in a protein sequence of a target protein (the target protein is an object of protein engineering research, and the simulated mutant protein is a mutant possibly occurring in the evolution process of the target protein), all possible single mutants can be systematically generated and evaluated, and the influence of the mutation of each amino acid site on the function and stability of the protein can be comprehensively known. And then carrying out adaptability prediction on the simulated single mutant sequence through a protein language model to obtain an adaptability evaluation value S, so that mutation which is more likely to be reserved in the evolution process can be identified (the higher the value of the adaptability evaluation value S is, the higher the possibility that the simulated mutant is reserved in the evolution process is, thereby providing valuable candidate mutant proteins for protein engineering.
The target protein provided by the embodiment of the invention is DNA polymerase, preferably Phi29 DNA polymerase or any other DNA polymerase with similar structure and function.
This is because functional optimization of DNA polymerase is an important goal in biotechnology and molecular biology research, and mutants with higher activity, higher fidelity, or longer read length can be screened by single point mutation. Meanwhile, the DNA polymerase has higher stability under extreme conditions such as high temperature, high salt concentration and the like, and single point mutation can help to screen out mutants more stable under the conditions, so that the application range of the DNA polymerase is widened.
According to a specific implementation of the examples of the present invention, phi29 DNA polymerase is known for its high fidelity and long read length (meaning that Phi29 DNA polymerase is capable of continuously synthesizing longer DNA fragments without frequent termination or error during DNA synthesis). It has high fidelity in DNA replication, and can synthesize thousands of bases continuously without error. This property makes it very useful in applications such as genome sequencing, cloning and amplification. Through single point mutation, accurate changes can be carried out at specific amino acid sites, so that the influence of the changes on the activity, stability and other functions of the enzyme is studied, and the accurate mutation method provided by the embodiment of the invention is helpful for understanding the structure-function relationship of the enzyme.
The mutant protein screening method provided by the embodiment of the invention is applied to screening of mutants of DNA polymerase, especially to screening of mutants of Phi29 DNA polymerase or DNA polymerase with similar structure and function, and can screen the mutants of the Phi29 DNA polymerase with higher fidelity, thereby reducing error rate in sequencing and cloning processes, and also can screen the mutants which can still keep high activity and stability at high temperature based on the stability of the DNA polymerase at high temperature, and can screen the mutants of the Phi29 DNA polymerase with longer reading length based on the high fidelity and long reading length characteristics of the Phi29 DNA polymerase, thereby improving the coverage and accuracy of genome sequencing.
According to a specific implementation manner of the embodiment of the invention, all amino acid positions of the target protein can be subjected to single substitution mutation of the rest 19 natural amino acids except the self amino acid, so that 19 xL (L is the length of the target protein sequence) single mutant protein sequences can be obtained and used as input of subsequent prediction. The self amino acid refers to the original amino acid at a specific position in the target protein sequence, and the elimination of the self amino acid is helpful for avoiding redundancy, reducing the analog calculation amount and simplifying the subsequent data analysis. In the above embodiment, the number of amino acids involved in mutation is not limited to the present invention, and the amino acid type to be subjected to simulated mutation may be adjusted according to the mutation design modification requirement of the actual protein engineering. For example, according to one embodiment of the present invention, a simulated single mutation is performed on all amino acid positions of a target protein, so that the mutation effect of all positions is considered in a global scope, and comprehensive adaptability prediction is provided.
In some embodiments, the method further comprises the steps of constructing and parameter tuning a target protein language model, namely acquiring a basic protein language model, replacing a position coding layer of the basic protein language model with a learnable embedded code, constructing a full-connection neural network layer in the basic protein language model to obtain the target protein language model, wherein the learnable embedded code is used for capturing distance information in an amino acid sequence, the full-connection neural network layer is used for carrying out weighted calculation on the type and the corresponding position of each amino acid, and carrying out unsupervised training on the target protein language model to realize model parameter tuning processing on the target protein model.
According to a specific implementation of the embodiment of the present invention, tranception models are selected as basic protein language models, and the basic protein language models use a transducer as an underlying structure, and compared with other large-scale protein language models, tranception models focus on protein adaptability prediction tasks, which is helpful for improving accuracy of calculating obtained adaptability values. Meanwhile, in order to further improve the adaptive prediction effect of the model, in the construction of the target protein language model, besides Tranception model bodies, a learning embedded code is used for replacing a 'Grouped ALiBi' mechanism of the basic model at a position coding layer of the basic model, and the learning embedded code can better capture distance information in an amino acid sequence. In order to fully consider the distance information of all amino acid sites, a layer of fully connected neural network is also constructed for carrying out weighted calculation on the type code and the corresponding site of each amino acid (specific steps are that the type code (for example, one-hot codes of 20 natural amino acids) of each amino acid and the position information thereof in the sequence are input, the input is weighted calculated through the fully connected layer to generate a new embedded vector, the generated embedded vector is used as the input of a transducer model), and the expression capability of the model is further enhanced. And aiming at parameter tuning, training and optimizing the target protein language model based on the zero sample data set and the partial sample data set by acquiring a basic data set which comprises functional labels (suitable for evaluating the performance of the protein language model) of various proteins, wherein different weight coefficients are configured for the prediction results of the zero sample data set and the partial sample data set. For example, "ProteinGym" may be selected as the reference dataset, and in multiple rounds of zero sample cases (in which the model does not see any data for a particular protein, and only depending on pre-training knowledge, a larger weight coefficient may be set to maximize the predictive effect under unsupervised conditions) and in partial sample cases (in which the model may access a small amount of data for a particular protein for fine tuning the model, may be set smaller to supplement the predictive result for zero sample cases), training and prediction may be performed using the target protein language model, and optimal parameters may be obtained based on the predictive result, wherein the predictive result weight obtained for zero sample cases may be set to 0.8 in order to maximize the predictive effect of the target protein language model under unsupervised conditions.
Step S102, determining simulated single mutant structures corresponding to the simulated mutant proteins respectively, calculating folding energy change values of the simulated single mutant structures before and after protein mutation based on the protein folding energy calculation tool respectively, and determining structural stability values S of the simulated single mutant structures based on the folding energy change values.
The influence of each mutation on the structural stability of the protein can be quantitatively evaluated by calculating folding energy change values of the simulated single mutant structure before and after mutation through a protein folding energy calculation tool (such as molecular modeling software Discovery Studio and FoldX), and the structural stability value S can be used for screening out mutants which still maintain or improve the structural stability after mutation, thereby being beneficial to rapidly screening out candidate mutants with stable structure, reducing the workload of experimental verification and improving the efficiency and the precision rate of mutant protein screening.
According to one embodiment of the present invention, the virtual amino acid mutation function of the Discovery Studio can be used to specify single mutation at all amino acid sites, and then the structural stability value S of each simulated single mutant structure is performed based on the design calculation formula in the Discovery Studio software (i.e., in the following formula) Is calculated by the computer. The calculation formula is as follows:
Wherein,
In the aboveIs the free energy of the protein in the folded state,To be free energy when the protein is unfolded (or folding is disrupted), wild type indicates the wild type structure of the protein of interest, mut indicates the mutant structure of the mimicking mutant protein. Wherein, will beAs the structural stability value S, a negative value indicates that the mutant is more stable than the wild type, i.e., the mutation has a stabilizing effect, and a positive value indicates that the mutant is more unstable than the wild type, i.e., the mutation has an unstable effect.
In some embodiments, determining the simulated single mutant structures corresponding to each of the plurality of simulated mutant proteins comprises performing a simulated single mutation on all amino acid positions of the target protein using molecular modeling software Discovery Studio to generate the plurality of simulated single mutant structures.
In some embodiments, before the step of determining the simulated single mutant structures corresponding to the plurality of simulated mutant proteins respectively, the method further comprises the steps of obtaining a wild-type protein structure file of the target protein, performing preliminary screening on the protein structure file based on the structure evaluation elements to obtain the target protein structure file, wherein the wild-type protein structure file comprises a three-dimensional structure of the target protein in a natural state (the three-dimensional structure of the target protein can be obtained through an existing experimental method (such as X-ray crystallography or NMR) first, if the structure observed through the experimental method is unavailable, a high-quality protein structure prediction model (such as AlphaFold) is used for generating the wild-type protein structure file), and performing pretreatment on the target protein structure file, wherein the pretreatment comprises the steps of simulating to remove crystal water and simulate to endow force field parameters.
In accordance with an embodiment of the present invention, in the preliminary screening of protein structure files based on structure evaluation elements, mainly several structure evaluation elements are considered, namely resolution (resolution is an important indicator for measuring the accuracy of the protein structure, high resolution means more accurate atomic position information), RMSD value (RMSD (root mean square deviation) is used for measuring the similarity between two structures, lower RMSD value means smaller difference between two structures, usually the structure with lower RMSD is selected to ensure the consistency and reliability of the structure), presence or absence of ligand (the function of certain proteins is closely related to the ligand to which it binds; if the modification target involves the binding of a specific ligand, it is necessary to select a structure file containing the correct ligand) and ligand quality evaluation (even if the structure contains a ligand, it is necessary to evaluate the position and conformation of the ligand, the error position or conformation of the ligand may affect the subsequent calculation and analysis results), etc., after determining the proper protein structure file (i.e. the target protein structure file) the RMSD structure file is usually selected to ensure the consistency and reliability of the subsequent calculation of the protein crystallization field (the rmm) by using the discover template protein, and the subsequent calculation of the phase field parameters such as the channel map is performed.
In some embodiments, since the folding energy calculation modes for the simulated mutant proteins are different, the structural stability values S calculated by using different protein folding energy calculation tools are slightly different, and in order to further ensure the accuracy of the calculated structural stability values, another protein folding energy calculation tool FolfX is selected in a specific implementation manner of the embodiment of the present invention to perform the structural stability values of the mutant (i.e., the following formula) The expression is as follows:
Wherein, The sum of the van der Waals interaction contributions of all atoms relative to the solvent; Is the energy item Corresponding weight coefficients.AndThe difference in the solubility energy of the nonpolar and polar groups, respectively, from the unfolded state to the folded state; And Respectively the energy termsAndCorresponding weight coefficients.Refers to the additional stable free energy provided by one water molecule forming multiple hydrogen bonds with the protein, known as water bridges.Refers to the difference in free energy between intramolecular hydrogen bond formation and intermolecular hydrogen bond formation (with solvent).By electrostatic contribution of charged groups, including the helical dipole moment.The entropy cost of fixing the main chain in the folded state,Is the energy itemCorresponding weight coefficients.By entropy cost of fixing the side chains in a particular conformation,Is the energy itemCorresponding weight coefficients.
The Discovery Studio and FoldX software are used to perform a simulation calculation of the structural stability of the target protein, thus providing high-precision structural information, helping to evaluate the mutation effect more accurately. Specifically, based on the molecular modeling software, the wild type structure of the target protein is changed into a mutant type, and the folding free energy change of the protein structure before and after mutation is calculated to determine the stability change of the mutant, so as to obtain a structural stability value S (which represents the structural stability effect of the mutant and is used as a reference basis for screening mutation sites with improved stability), and the influence of the mutation on the three-dimensional structure and the stability of the protein can be more accurately evaluated through structural simulation.
It will be appreciated that other protein folding energy calculation means with higher accuracy may be employed or the number of protein folding energy calculation means may be increased in order to further increase the calculated structural stability value S of the simulated mutant.
Step S103, respectively carrying out normalization processing on the adaptability evaluation value F and the structural stability value S, and respectively carrying out single index sequencing and comprehensive index sequencing on normalization processing results.
The adaptability evaluation value F and the structural stability value S generally have different dimensions and numerical ranges, and indexes of the different dimensions can be unified into the same interval (such as 0 to 1) through normalization processing, so that weights among different indexes are more balanced, and the excessive influence of a certain index on comprehensive scores due to larger numerical values is avoided, so that comparison and comprehensive evaluation are facilitated. The single index sorting and the comprehensive index sorting are respectively carried out on the normalization processing results, wherein the single index sorting can help researchers to quickly identify mutants with excellent performance on a specific index, and the comprehensive index sorting combines information of multiple aspects, provides a more comprehensive evaluation standard, and is beneficial to screening out mutants with excellent performance on multiple aspects.
Meanwhile, the normalization processing and sequencing method can automatically process a large amount of data, reduce the workload of manual analysis, improve the screening efficiency, and can rapidly screen out the optimal mutant by the systematic sequencing method so as to accelerate the progress of protein engineering projects. And the comprehensive evaluation is carried out by combining a plurality of indexes, so that the deviation caused by a single index can be reduced, and the robustness and reliability of the screening result are improved. The multi-index comprehensive evaluation can better reflect the performance of the mutant in practical application, and the practicability and the credibility of the screening result are improved.
In some embodiments, the step of respectively normalizing the adaptive evaluation value F and the structural stability value S and respectively performing single index ranking and comprehensive index ranking on the normalization processing results includes respectively normalizing the adaptive evaluation value F and the structural stability value S based on a minimum-maximum normalization method to obtain a first adaptive evaluation normalization value F 'and a first structural stability normalization value S', respectively ranking the first adaptive evaluation normalization value F 'and the first structural stability normalization value S' to obtain a single index ranking result including a first adaptive ranking value RF and a first structural stability ranking value RS, calculating a ranking average value of the first adaptive ranking value RF and the first structural stability ranking value RS corresponding to each analog mutant protein, and ranking the ranking average values of the plurality of analog mutant proteins to obtain a first comprehensive index ranking result R.
And (3) for the adaptability evaluation value F corresponding to the plurality of simulated muteins under the adaptability index or the structural stability value S corresponding to the plurality of simulated muteins under the structural stability index. Taking X (referring to an adaptability index or a structural stability index) as an example, when the min-max normalization method is applied, the minimum value Xmin and the maximum value Xmax of the index are found first, and then normalization is carried out by using the following formula:
Wherein, Is the normalized value.
The above steps are a specific implementation manner of performing normalization processing on the adaptability evaluation index and the structural stability evaluation index, and performing single index sorting and comprehensive index sorting on the normalization processing result respectively. Wherein the maximum and minimum normalization method is a data preprocessing method for scaling data between [0,1] by using the maximum and minimum values in the data column.
And normalizing the adaptability evaluation value F and the structural stability value S to be between 0 and 1 by a minimum-maximum normalization method, so that the numerical ranges of different indexes (namely the sequence adaptability evaluation index and the structural stability index) are consistent, and the performances of the different indexes are compared on the same scale. The first adaptability evaluation value F 'and the first structural stability value S' which are obtained after normalization processing are respectively sequenced, so that a single index sequencing result (namely a first adaptability sequencing value RF and a first structural stability sequencing value RS) is obtained, and the performance of each mutant in the aspects of adaptability evaluation and structural stability can be intuitively displayed. Further, sequencing is carried out again through sequencing average values of a plurality of single-index sequencing results corresponding to each simulated mutant protein to obtain a first comprehensive index sequencing result R, the method combines information of a plurality of indexes, provides a more comprehensive evaluation standard, reduces deviation caused by single index, can better reflect the performance of mutants in practical application, improves the practicability and reliability of screening results, improves the robustness and reliability of the screening results, and can help researchers to screen out mutants with good performances in the aspects of adaptability and stability.
In some embodiments, the method further comprises the steps of respectively carrying out normalization processing on the adaptability evaluation value F and the structural stability value S based on a Z-Score standardization method to obtain a second adaptability evaluation normalization value F 'and a second structural stability normalization value S', calculating comprehensive index values of the simulated mutant proteins according to the second adaptability evaluation normalization value F ', the second structural stability normalization value S', the adaptive index weight coefficient and the structural stability index weight coefficient, and sorting the comprehensive index values to obtain a second comprehensive index sorting result K.
According to a specific implementation of the embodiment of the invention, the specific formula of the Z-Score normalization calculation is as follows:
Wherein, Is the average value of each index,X is the index value of each simulated mutant, which is the standard deviation of each index.
The above steps are still another specific implementation manner of carrying out normalization processing on the adaptability evaluation index and the structural stability evaluation index and carrying out comprehensive index sequencing on the normalization processing result according to the embodiment of the invention. The Z-Score standardization method is also a data normalization processing method, and the data comparability is improved by converting two or more groups of data into a unitless Z-Score value so as to unify the data standards.
The Z-Score standardization processing is carried out on the adaptability evaluation value F and the structural stability value S, the comprehensive index value is calculated based on the standardized value, the adaptability index weight coefficient and the structural stability index weight coefficient, and the comprehensive index value is sequenced to obtain a second comprehensive index sequencing result K, so that the comparability among different indexes can be improved, and comprehensive evaluation can be carried out. The Z-Score standardization not only unifies the scale, but also considers the distribution characteristics of the data, and the standardized values reflect the relative positions of each mutant on respective indexes, so that the performance of each mutant can be evaluated more accurately.
In some embodiments, the number of the protein folding energy calculation tools is N, the number of the structural stability values S calculated by the simulated muteins based on the protein folding energy calculation tools is N, the step of normalizing the structural stability values S corresponding to the simulated muteins comprises respectively normalizing N groups of structural stability values Sn calculated by the N protein folding energy calculation tools, wherein N is E [1, N ], and the N structural stability normalization values Sn' corresponding to any simulated mutein are averaged to obtain the structural stability normalization processing result corresponding to the simulated mutein.
Different protein folding energy calculation tools may be based on different algorithms and models, and thus they may have different advantages and limitations in calculating structural stability. By using multiple tools, the respective deficiencies can be complemented, improving the robustness and reliability of the calculation results, and further providing a more accurate stability assessment. And the structural stability value of each simulated mutant protein is calculated by N tools respectively, which is equivalent to multiple verification, and is helpful for identifying mutants which are consistent in different tools, so that the reliability of results is improved.
And (3) averaging N normalized structural stability values Sn' of each simulated mutant protein to obtain a final structural stability normalization processing result, so that the calculation results of a plurality of tools are integrated, random noise in calculation of a single tool can be reduced, and the stability and accuracy of the evaluation result are improved.
Step S104, screening the plurality of simulated muteins according to the target screening number, the single index sequencing result and the comprehensive index sequencing result to obtain the target muteins.
By setting specific target screening quantity, the quantity of the finally screened mutants can be ensured to meet the research requirement, which is helpful for controlling the experimental scale and avoiding resource waste. Based on the single index ranking results, mutants that are excellent in a particular aspect can be quickly identified. By combining comprehensive sequencing results of a plurality of indexes, the overall performance of each mutant can be comprehensively evaluated, and mutants with good performances in multiple aspects can be screened out.
In some embodiments, the target screening number includes a single index screening number and a comprehensive index screening number, the step of screening the plurality of simulated muteins according to the target screening number, the single index sorting result and the comprehensive index sorting result to obtain the target mutein includes determining a first target mutein according to the single index screening number, the first adaptive sorting value RF and the first structural stability sorting value RS, removing the first target mutein in the comprehensive index sorting result, determining a second target mutein according to the comprehensive index screening number and the comprehensive index sorting result, and collecting the first target mutein and the second target mutein to obtain the target mutein, wherein the comprehensive index sorting result includes a first comprehensive index sorting result R and/or a second comprehensive index sorting result K.
The steps provide a specific embodiment for screening the target mutant protein based on the target screening quantity, the single index sequencing result and the comprehensive index sequencing result, and the two screening methods of single index screening and comprehensive index screening are adopted to evaluate the mutant from different angles, so that the screening precision and accuracy are improved, and the mutant with excellent performance on the single index and the mutant with good performance on the comprehensive index can be obtained simultaneously, thereby realizing diversified screening. In particular, single index screening may miss mutants that perform well in combination, whereas combination index screening may ignore certain mutants that are particularly prominent in a particular attribute. Through two-step screening, the likelihood of omission can be reduced.
On the other hand, based on different choices of the data processing modes (normalization processing method and sorting mode), a plurality of comprehensive sorting results (such as the first comprehensive index sorting result R and the second comprehensive index sorting result K) may be finally obtained, and after single index screening, the comprehensive index sorting results may be screened again for each comprehensive index sorting result, and only the same target mutant protein may be subjected to deduplication after screening is completed.
In some embodiments, the method further comprises, prior to the step of determining the mutant protein of interest, rejecting the simulated mutant protein having an fitness evaluation value F less than 0 and a structural stability value S greater than 0 from the composite index ranking result.
The adaptability evaluation value F obtained based on the protein language model prediction represents the possibility that the mutation of the protein can be kept in the evolution process, and the higher the value is, the more the mutation is enabled to adapt to the biological environment, and the adaptability evaluation value F smaller than 0 generally indicates that the mutation is unfavorable in the evolution process and the function or stability of the protein can be reduced. The structural stability value S calculated based on the simulation of the folding energy effect of the protein represents the effect of the protein on structural stability after mutation, and a generally negative value indicates that the mutation stabilizes the structure, i.e. the energy required for folding is smaller than that for deconstruction, the smaller the value is, the smaller the energy required for folding the protein is, i.e. the more stable the protein structure is in theory, and the structural stability value S is greater than 0, the structure of the protein after mutation becomes unstable. Therefore, the simulated mutant protein with the adaptability evaluation value F smaller than 0 and the structural stability value S larger than 0 can be judged to be the mutant with the defect, and researches show that the characterization result of the mutant with the defect probability caused by the mutation in any index in the actual experiment is not good.
Therefore, the simulation mutants with positive values in the protein stability calculation result and the mutants with negative values in the adaptability prediction are removed, so that the screening quality is improved (bad mutants are removed), the reliability of the screening result is improved, the resource allocation is optimized, the guidance experiment design is improved, and the success rate is improved.
The embodiment of the invention also realizes the following technical effects of 1 providing a protein mutation effect evaluation method combining calculation of protein folding energy effect (stable value index of protein structure) and protein adaptability prediction (adaptability index of protein sequence) at the same time, providing an effective tool for judging possible influence brought by protein mutation, providing a basis for basic research of researching protein mutation, 2 providing protein mutation screening by utilizing protein stability and protein adaptability, realizing screening of protein mutants by integrating protein structure information, protein sequence and evolution information, solving the problem that the existing information of target protein can not be fully utilized by the existing method, and 3 providing a protein multi-scale evaluation index normalization screening method, and realizing comprehensive screening of protein mutants with potential function improvement under multi-dimension and multi-system by a maximum-minimum normalization method, a Z-score normalization method and a means of selecting features and removing. 4. The problem of insufficient robustness of a single method at the present stage is solved through a plurality of protein stability calculation tools, the problem of insufficient consideration of folding energy items is made up through different calculation modes, and the screening precision is improved.
The method for screening mutant proteins comprises the steps of performing simulated single mutation on a plurality of amino acid sites in a protein sequence of a target protein to obtain a plurality of simulated mutant proteins, performing adaptability prediction on the simulated single mutant sequences corresponding to the simulated mutant proteins based on a target protein language model to obtain an adaptability evaluation value F of each simulated single mutant sequence, wherein any one of the simulated single mutant sequences comprises a single mutant protein sequence corresponding to the amino acid site, determining simulated single mutant structures corresponding to the simulated mutant proteins, respectively calculating folding energy change values of the simulated single mutant structures before and after protein mutation based on a protein folding energy calculation tool, respectively determining structural stability values S of the simulated single mutant structures based on the folding energy change values, respectively performing normalization processing on the adaptability evaluation values F and the structural stability values S, respectively performing single index sequencing and comprehensive index sequencing on the normalization processing results, and obtaining a technical means of screening the target mutant proteins according to the target screening quantity, the single index sequencing result and the comprehensive index sequencing result, and determining the adaptability evaluation values of the simulated single mutant structures based on the protein sequences, and determining the protein folding energy change values before and after protein mutation, respectively, and reducing the structural stability of the protein, improving the protein structure, and the protein structure stability, and reducing the structural stability, and improving the protein structure, and the protein screening and improving the function The technical effects of shortening the test period and reducing the cost.
The embodiment of the invention also provides a mutant protein screening method, as shown in fig. 2, the mutant protein screening method mainly comprises the following steps:
firstly, carrying out simulated single mutation on all amino acid sites in a protein sequence of target protein to respectively obtain protein sequences and protein structures corresponding to a plurality of simulated mutant proteins. Aiming at the protein sequence of the simulated mutant protein, an unsupervised protein language model is adopted to conduct adaptability prediction, and an adaptability evaluation value F is obtained. And aiming at the protein sequence of the simulated mutant protein, molecular modeling software Discovery Studio and FoldX are adopted to respectively execute stability calculation of the protein folding energy effect, so as to obtain structural stability values S1 and S2. The numerical processing was then performed using two numerical normalization algorithms (min-max normalization method and Z-Score normalization method). And finally, determining the required quantity (target screening quantity), selecting a special length value (namely screening according to a single index sequencing result) to obtain a first target mutant protein, and screening the comprehensive index sequencing result to obtain a second target mutant protein, wherein the first target mutant protein and the second target mutant protein form a mutant library finally obtained by screening.
According to an embodiment of the present invention, the data processing steps performed based on the min-max normalization method are shown in fig. 3. First, an adaptability evaluation value F (may be also referred to as an adaptability predicted value F) obtained based on a protein language model, a structural stability value S1 (may be also referred to as a stability value calculated value S1) obtained based on a Discovery Studio calculation, and a structural stability value S2 (may be also referred to as a stability value calculated value S2) obtained based on FoldX calculation are normalized by a minimum-maximum normalization method to obtain a first adaptability evaluation normalized value F ', and first structural stability normalized values S1' and S2', respectively, wherein S1' and S2 'are structural stability indexes, and S' is obtained by arithmetically averaging the two values. And respectively sorting the first adaptability evaluation normalization value F 'and the first structural stability normalization value S' to obtain a single index sorting result comprising a first adaptability sorting value RF and a first structural stability sorting value RS. And then calculating the sequencing average value of the first adaptive sequencing value RF and the first structural stability sequencing value RS corresponding to each simulated mutant protein, and sequencing the sequencing average values of the simulated mutant proteins to obtain a first comprehensive index sequencing result R.
According to an embodiment of the present invention, the data processing steps performed based on the Z-Score normalization method are shown in FIG. 4. Firstly, respectively carrying out normalization processing on an adaptability evaluation value F (also called an adaptability predicted value F) obtained based on a protein language model, a structural stability value S1 (also called a stability value calculated value S1) obtained based on a Discovery Studio, a structural stability value S2 (also called a stability value calculated value S2) obtained based on FoldX calculation by a Z-Score normalization method to obtain a second adaptability evaluation normalization value F ', second structural stability normalization values S1' and S2', and then calculating the comprehensive index value FS of each simulated mutant protein according to the second adaptability evaluation normalization value F ', the second structural stability normalization value S ', the adaptability index weight coefficient and the structural stability index weight coefficient, and sequencing the comprehensive index values to obtain a second comprehensive index sequencing result K. Wherein, the expression for calculating the comprehensive index value FS is as follows:
wherein, the weight coefficient of the adaptive index is set to 0.6, and the structural index is set to 0.2. However, the specific numerical setting of the weight coefficient is not limited to the present invention, and may be adaptively adjusted according to the actual test conditions and the protein modification requirements.
As shown in FIG. 5, the invention also provides a specific method for screening to obtain the target mutant protein, which comprises determining the target screening quantity (i.e. the total requirement S in FIG. 5) according to the quantity of the simulated mutant protein in the simulated mutant library (in the above example, assuming that all amino acid sites of the target protein are subjected to single substitution mutation of the rest 19 natural amino acids except the amino acid, 19 xL (L is the length of the target protein sequence) single mutant protein sequences), and then screening from the single index sequencing result, i.e. the characteristic, specifically, the mutant with the top ranking in each index and the small index difference from the next mutant is regarded as the mutant with the characteristic, and the partial mutant only considers the single index and is about 20% of the total screening quantity S. And screening from the comprehensive sorting indexes, for example, 30% of the total screening amount S can be screened from the first comprehensive sorting result R, and 30% of the total screening amount S can be screened from the second comprehensive sorting result K. Meanwhile, the fact that repeated target mutant proteins obtained through screening in the first comprehensive sequencing result R and the second comprehensive sequencing result K can occur is considered, and after deduplication is carried out, the total number of the screened target mutant proteins does not meet the target screening number (namely, the total requirement is not met). At this time, the simulated mutant proteins with the adaptability evaluation value F smaller than 0 and the structural stability value S larger than 0 in the rest simulated mutant proteins can be removed, and then about 10% of the total screening amount S is screened out again from the first comprehensive sequencing result R and the second comprehensive sequencing result K. The above numerical values are only examples, and are not intended to limit the present invention.
Based on the above-mentioned mutein screening method provided by the embodiment of the present invention, the embodiment of the present invention further provides a mutein screening device, as shown in fig. 6, the mutein screening device 600 comprises:
The adaptability evaluation value calculation module 601 is configured to perform simulated single mutation on a plurality of amino acid sites in a protein sequence of a target protein to obtain a plurality of simulated mutant proteins, and perform adaptability prediction on simulated single mutant sequences corresponding to the plurality of simulated mutant proteins based on a target protein language model to obtain an adaptability evaluation value F of each simulated single mutant sequence, where any one of the simulated single mutant sequences includes a single mutant protein sequence corresponding to the amino acid site.
By performing simulated single mutation on a plurality of amino acid sites in a protein sequence of a target protein (the target protein is an object of protein engineering research, and the simulated mutant protein is a mutant possibly occurring in the evolution process of the target protein), all possible single mutants can be systematically generated and evaluated, and the influence of the mutation of each amino acid site on the function and stability of the protein can be comprehensively known. And then carrying out adaptability prediction on the simulated single mutant sequence through a protein language model to obtain an adaptability evaluation value S, so that mutation which is more likely to be reserved in the evolution process can be identified (the higher the value of the adaptability evaluation value S is, the higher the possibility that the simulated mutant is reserved in the evolution process is, thereby providing valuable candidate mutant proteins for protein engineering.
In some embodiments, the mutein screening apparatus 600 further includes a model construction and parameter tuning module configured to acquire a basic protein language model, replace a position coding layer of the basic protein language model with a learning embedded code, and construct a full-connected neural network layer in the basic protein language model to obtain a target protein language model, where the learning embedded code is used to capture distance information in an amino acid sequence, the full-connected neural network layer is used to perform weighted calculation on a type and a corresponding site of each amino acid, and perform unsupervised training on the target protein language model to implement model parameter tuning processing on the target protein model.
The structural stability value calculation module 602 is configured to determine simulated single mutant structures corresponding to the plurality of simulated mutant proteins, calculate folding energy variation values of the plurality of simulated single mutant structures before and after protein mutation based on the protein folding energy calculation tool, and determine structural stability values S of the respective simulated single mutant structures based on the folding energy variation values.
The influence of each mutation on the structural stability of the protein can be quantitatively evaluated by calculating folding energy change values of the simulated single mutant structure before and after mutation through a protein folding energy calculation tool (such as molecular modeling software Discovery Studio and FoldX), and the structural stability value S can be used for screening out mutants which still maintain or improve the structural stability after mutation, thereby being beneficial to rapidly screening out candidate mutants with stable structure, reducing the workload of experimental verification and improving the efficiency and the precision rate of mutant protein screening.
It will be appreciated that other protein folding energy calculation means with higher accuracy may be employed or the number of protein folding energy calculation means may be increased in order to further increase the calculated structural stability value S of the simulated mutant.
The data processing module 603 is configured to perform normalization processing on the adaptability evaluation value F and the structural stability value S, and perform single index ranking and comprehensive index ranking on the normalization processing result.
The adaptability evaluation value F and the structural stability value S generally have different dimensions and numerical ranges, and indexes of the different dimensions can be unified into the same interval (such as 0 to 1) through normalization processing, so that weights among different indexes are more balanced, and the excessive influence of a certain index on comprehensive scores due to larger numerical values is avoided, so that comparison and comprehensive evaluation are facilitated. The single index sorting and the comprehensive index sorting are respectively carried out on the normalization processing results, wherein the single index sorting can help researchers to quickly identify mutants with excellent performance on a specific index, and the comprehensive index sorting combines information of multiple aspects, provides a more comprehensive evaluation standard, and is beneficial to screening out mutants with excellent performance on multiple aspects.
The normalization processing method provided by the embodiment of the invention can comprise a minimum-maximum normalization method, a Z-Score normalization method and other types of data scale unification methods.
In some embodiments, the data processing module 603 is configured to normalize the adaptive evaluation value F and the structural stability value S based on a minimum-maximum normalization method to obtain a first adaptive evaluation normalization value F 'and a first structural stability normalization value S', sort the first adaptive evaluation normalization value F 'and the first structural stability normalization value S' to obtain a single-index sorting result including a first adaptive sorting value RF and a first structural stability sorting value RS, calculate a sorting average value of the first adaptive sorting value RF and the first structural stability sorting value RS corresponding to each simulated mutant protein, and sort the sorting average values of the plurality of simulated mutant proteins to obtain a first comprehensive-index sorting result R.
And normalizing the adaptability evaluation value F and the structural stability value S to be between 0 and 1 by a minimum-maximum normalization method, so that the numerical ranges of different indexes (namely the sequence adaptability evaluation index and the structural stability index) are consistent, and the performances of the different indexes are compared on the same scale. The first adaptability evaluation value F 'and the first structural stability value S' which are obtained after normalization processing are respectively sequenced, so that a single index sequencing result (namely a first adaptability sequencing value RF and a first structural stability sequencing value RS) is obtained, and the performance of each mutant in the aspects of adaptability evaluation and structural stability can be intuitively displayed. Further, sequencing is carried out again through sequencing average values of a plurality of single-index sequencing results corresponding to each simulated mutant protein to obtain a first comprehensive index sequencing result R, the method combines information of a plurality of indexes, provides a more comprehensive evaluation standard, reduces deviation caused by single index, can better reflect the performance of mutants in practical application, improves the practicability and reliability of screening results, improves the robustness and reliability of the screening results, and can help researchers to screen out mutants with good performances in the aspects of adaptability and stability.
In some embodiments, the data processing module 603 is further configured to normalize the adaptive evaluation value F and the structural stability value S based on a Z-Score normalization method to obtain a second adaptive evaluation normalized value f″ and a second structural stability normalized value S ', calculate a comprehensive index value of each simulated mutant protein according to the second adaptive evaluation normalized value F ', the second structural stability normalized value S ', the adaptive index weight coefficient, and the structural stability index weight coefficient, and sort the comprehensive index values to obtain a second comprehensive index sorting result K.
The Z-Score standardization processing is carried out on the adaptability evaluation value F and the structural stability value S, the comprehensive index value is calculated based on the standardized value, the adaptability index weight coefficient and the structural stability index weight coefficient, and the comprehensive index value is sequenced to obtain a second comprehensive index sequencing result K, so that the comparability among different indexes can be improved, and comprehensive evaluation can be carried out. The Z-Score standardization not only unifies the scale, but also considers the distribution characteristics of the data, and the standardized values reflect the relative positions of each mutant on respective indexes, so that the performance of each mutant can be evaluated more accurately.
And a screening module 604, configured to screen the plurality of simulated muteins according to the target screening number, the single index ranking result and the comprehensive index ranking result, so as to obtain the target mutein.
By setting specific target screening quantity, the quantity of the finally screened mutants can be ensured to meet the research requirement, which is helpful for controlling the experimental scale and avoiding resource waste. Based on the single index ranking results, mutants that are excellent in a particular aspect can be quickly identified. By combining comprehensive sequencing results of a plurality of indexes, the overall performance of each mutant can be comprehensively evaluated, and mutants with good performances in multiple aspects can be screened out.
In some embodiments, the screening module 604 is further configured to determine a first target mutein according to the single-index screening number, the first adaptive ranking value RF, and the first structural stability ranking value RS, reject the first target mutein in the comprehensive-index ranking result, and determine a second target mutein according to the comprehensive-index screening number and the comprehensive-index ranking result, where the first target mutein and the second target mutein are the target mutein, and the comprehensive-index ranking result includes a first comprehensive-index ranking result R and/or a second comprehensive-index ranking result K.
Through the two screening methods of single index screening and comprehensive index screening, the assessment of mutants from different angles is realized, the screening precision and accuracy are improved, and the mutants with excellent performance on the single index and the mutants with good performance on the comprehensive index can be obtained simultaneously, so that the diversified screening is realized. In particular, single index screening may miss mutants that perform well in combination, whereas combination index screening may ignore certain mutants that are particularly prominent in a particular attribute. Through two-step screening, the likelihood of omission can be reduced.
In some embodiments, the mutein screening apparatus further comprises a rejection module for rejecting the simulated muteins having an adaptability evaluation value F of less than 0 and a structural stability value S of greater than 0 in the comprehensive index ranking result before the step of determining the target mutein.
The simulation mutants with positive values in the protein stability calculation result and the mutants with negative values in the adaptability prediction are removed, so that the screening quality is improved (bad mutants are removed), the reliability of the screening result is improved, the resource allocation is optimized, the guidance experiment design is improved, and the success rate is improved.
The mutant protein screening device provided by the embodiment of the invention is used for respectively carrying out simulated single mutation on a plurality of amino acid sites in a protein sequence of a target protein to obtain a plurality of simulated mutant proteins, carrying out adaptive prediction on the simulated single mutant sequences corresponding to the simulated mutant proteins based on a target protein language model to obtain an adaptive evaluation value F of each simulated single mutant sequence, wherein any one of the simulated single mutant sequences comprises a single mutant protein sequence corresponding to the amino acid site, a structural stability value calculation module used for determining simulated single mutant structures corresponding to the simulated mutant proteins respectively, respectively calculating folding energy change values of the simulated single mutant structures before and after protein mutation based on a protein folding energy calculation tool, respectively determining structural stability values S of the simulated single mutant structures based on the folding energy change values, and carrying out normalization processing on the adaptive evaluation value F and the structural stability values S respectively and carrying out single index sequencing and comprehensive index sequencing on normalization processing results. The method has the advantages that the adaptive evaluation value of the simulated protein sequence is determined based on the protein language model, the structural stability value of the simulated mutant protein structure is determined based on the protein folding energy calculation tool, and the mutant protein with potential function improvement is screened out by combining with the protein multi-scale evaluation index normalization screening method, so that the technical effects of reducing the protein design transformation difficulty, improving the mutant protein screening effect, reducing the calculation amount required by protein transformation and screening, shortening the test period and reducing the consumption cost are realized.
The embodiments of the present invention also provide a non-transitory machine-readable medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform the method of the embodiments of the present invention.
Embodiments of the present invention also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform the method of the embodiments of the present invention. The computer program product is understood to be a software product in which the above-described method of the invention is realized mainly by means of a computer program.
The embodiment of the invention also provides electronic equipment which comprises at least one processor and a memory which is in communication connection with the at least one processor. The memory stores a computer program executable by the at least one processor, which when executed by the at least one processor is adapted to cause an electronic device to perform a method of an embodiment of the invention.
With reference to fig. 7, a block diagram of an electronic device that may be a server or a client of an embodiment of the present invention will now be described, which is an example of a hardware device that may be applied to aspects of the present invention. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the electronic device are connected to the I/O interface 705, including an input unit 706, an output unit 707, a storage unit 708, and a communication unit 709. The input unit 706 may be any type of device capable of inputting information to an electronic device, and the input unit 706 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 707 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 708 may include, but is not limited to, magnetic disks, optical disks. The communication unit 709 allows the electronic device to exchange information/data with other devices through computer networks, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, and/or wireless communication transceivers, such as bluetooth devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a CPU, a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing units, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), as well as any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above. For example, in some embodiments, method embodiments of the present invention may be implemented as a computer program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device via the ROM 702 and/or the communication unit 709. In some embodiments, the computing unit 701 may be configured to perform the methods described above by any other suitable means (e.g., by means of firmware).
A computer program for implementing the methods of embodiments of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of embodiments of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable signal medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, or infrared system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It should be noted that the term "comprising" and its variants as used in the embodiments of the present invention are open-ended, i.e. "including but not limited to". The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment," another embodiment "means" at least one additional embodiment, "and" some embodiments "means" at least some embodiments. References to "one or more" modifications in the examples of the invention are intended to be illustrative rather than limiting, and it will be understood by those skilled in the art that "one or more" is intended to be interpreted as "one or more" unless the context clearly indicates otherwise.
The information and data (including but not limited to data for analysis, stored data, displayed data, etc.) related to the embodiment of the invention are all information and data authorized by a user or fully authorized by each party, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions, and are provided with corresponding operation entries for users to select authorization or rejection.
The steps described in the method embodiments provided in the embodiments of the present invention may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.
The term "embodiment" in this specification means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. The various embodiments in this specification are described in a related manner, with identical and similar parts being referred to each other. In particular, for apparatus, devices, system embodiments, the description is relatively simple as it is substantially similar to method embodiments, see for relevant part of the description of method embodiments.
The above examples merely represent a few embodiments of the present invention, which are described in more detail and are not to be construed as limiting the scope of protection. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of the invention should be assessed as that of the appended claims.
Claims (9)
1. A method for screening a mutant protein, comprising the steps of:
Respectively carrying out simulated single mutation on a plurality of amino acid sites in a protein sequence of a target protein to obtain a plurality of simulated mutant proteins, and carrying out adaptive prediction on simulated single mutant sequences corresponding to the simulated mutant proteins based on a target protein language model to obtain an adaptive evaluation value F of each simulated single mutant sequence, wherein any one of the simulated single mutant sequences comprises a single mutant protein sequence corresponding to the amino acid site;
Determining simulated single mutant structures respectively corresponding to the simulated mutant proteins, respectively calculating folding energy change values of the simulated single mutant structures before and after protein mutation based on a protein folding energy calculation tool, and determining a structural stability value S of each simulated single mutant structure based on the folding energy change values;
respectively carrying out normalization processing on the adaptability evaluation value F and the structural stability value S, and respectively carrying out single index sequencing and comprehensive index sequencing on normalization processing results;
Screening the plurality of simulated muteins according to the target screening quantity, the single index sequencing result and the comprehensive index sequencing result to obtain target muteins;
The method also comprises the steps of constructing a target protein language model and adjusting parameters:
The method comprises the steps of obtaining a basic protein language model, replacing a position coding layer of the basic protein language model with a leachable embedded code, and constructing a fully connected neural network layer in the basic protein language model to obtain the target protein language model, wherein the leachable embedded code is used for capturing distance information in an amino acid sequence;
and performing unsupervised training on the target protein language model to realize model parameter tuning processing on the target protein model.
2. The screening method according to claim 1, wherein the step of normalizing the adaptability evaluation value F and the structural stability value S, respectively, and performing single index ranking and comprehensive index ranking on the normalization processing results, respectively, comprises:
respectively carrying out normalization processing on the adaptability evaluation value F and the structural stability value S based on a minimum-maximum normalization method to obtain a first adaptability evaluation normalization value F 'and a first structural stability normalization value S';
Sorting the first adaptability evaluation normalization value F 'and the first structural stability normalization value S' respectively to obtain a single index sorting result comprising a first adaptability sorting value RF and a first structural stability sorting value RS;
And calculating a sequencing average value of a first adaptive sequencing value RF and a first structural stability sequencing value RS corresponding to each simulated mutant protein, and sequencing the sequencing average values of the simulated mutant proteins to obtain a first comprehensive index sequencing result R.
3. The screening method according to claim 2, further comprising the step of:
Respectively carrying out normalization processing on the adaptability evaluation value F and the structural stability value S based on a Z-Score normalization method to obtain a second adaptability evaluation normalization value F 'and a second structural stability normalization value S';
Calculating the comprehensive index value of each simulated mutant protein according to the second adaptability evaluation normalization value F ', the second structural stability normalization value S', the adaptability index weight coefficient and the structural stability index weight coefficient;
And sequencing the comprehensive index values to obtain a second comprehensive index sequencing result K.
4. The method according to claim 2, wherein the number of the protein folding energy calculation means is N, the number of the structural stability values S calculated by the simulated mutein based on the protein folding energy calculation means is N, and the step of normalizing the structural stability values S corresponding to the plurality of the simulated muteins comprises:
respectively carrying out normalization treatment on N groups of structural stability values Sn obtained by calculation of N protein folding energy calculation tools, wherein N is E [1, N ];
and carrying out averaging treatment on N structural stability normalization values Sn' corresponding to any simulated mutant protein to obtain a structural stability normalization treatment result corresponding to the simulated mutant protein.
5. The method according to claim 1, wherein the target screening number includes a single index screening number and a comprehensive index screening number, and wherein the step of screening the plurality of simulated muteins based on the target screening number, the single index ranking result and the comprehensive index ranking result to obtain the target mutein comprises:
determining a first target mutant protein according to the single index screening quantity, the first adaptive ranking value RF and the first structural stability ranking value RS;
And removing the first target mutant protein from the comprehensive index sorting result, and determining a second target mutant protein according to the comprehensive index screening quantity and the comprehensive index sorting result, wherein the first target mutant protein and the second target mutant protein are collected to form the target mutant protein, and the comprehensive index sorting result comprises a first comprehensive index sorting result R and/or a second comprehensive index sorting result K.
6. The method of screening according to claim 5, further comprising the step of, prior to the step of determining the mutant protein of interest:
and removing the simulated mutant proteins with the adaptability evaluation value F smaller than 0 and the structural stability value S larger than 0 from the comprehensive index sequencing result.
7. A device for screening mutant proteins, characterized by comprising the following steps:
The system comprises an adaptability evaluation value calculation module, a target protein language model and a target protein analysis module, wherein the adaptability evaluation value calculation module is used for respectively carrying out simulated single mutation on a plurality of amino acid sites in a protein sequence of the target protein to obtain a plurality of simulated mutant proteins, and carrying out adaptability prediction on simulated single mutant sequences corresponding to the simulated mutant proteins based on the target protein language model to obtain an adaptability evaluation value F of each simulated single mutant sequence;
The structure stability value calculation module is used for determining simulated single mutant structures corresponding to the simulated mutant proteins respectively, calculating folding energy change values of the simulated single mutant structures before and after protein mutation based on a protein folding energy calculation tool, and determining a structure stability value S of each simulated single mutant structure based on the folding energy change values;
the data processing module is used for respectively carrying out normalization processing on the adaptability evaluation value F and the structural stability value S, and respectively carrying out single index sequencing and comprehensive index sequencing on normalization processing results;
The screening module is used for screening the plurality of simulated muteins according to the target screening quantity, the single index sequencing result and the comprehensive index sequencing result to obtain target muteins;
the construction and parameter tuning steps of the target protein language model comprise:
The method comprises the steps of obtaining a basic protein language model, replacing a position coding layer of the basic protein language model with a leachable embedded code, and constructing a fully connected neural network layer in the basic protein language model to obtain the target protein language model, wherein the leachable embedded code is used for capturing distance information in an amino acid sequence;
and performing unsupervised training on the target protein language model to realize model parameter tuning processing on the target protein model.
8. An electronic device comprising a processor and a memory storing a program, characterized in that the program comprises instructions that when executed by the processor cause the processor to perform the screening method according to any of claims 1-6.
9. A non-transitory machine readable medium storing computer instructions for causing the computer to perform the screening method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411836877.4A CN119296640A (en) | 2024-12-13 | 2024-12-13 | Method, device and related equipment for screening mutant proteins |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411836877.4A CN119296640A (en) | 2024-12-13 | 2024-12-13 | Method, device and related equipment for screening mutant proteins |
Publications (1)
Publication Number | Publication Date |
---|---|
CN119296640A true CN119296640A (en) | 2025-01-10 |
Family
ID=94158071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411836877.4A Pending CN119296640A (en) | 2024-12-13 | 2024-12-13 | Method, device and related equipment for screening mutant proteins |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN119296640A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190259470A1 (en) * | 2018-02-19 | 2019-08-22 | Protabit LLC | Artificial intelligence platform for protein engineering |
US20220375539A1 (en) * | 2019-08-23 | 2022-11-24 | Geaenzymes Co. | Systems and methods for predicting proteins |
CN115472221A (en) * | 2022-10-21 | 2022-12-13 | 重庆邮电大学 | Protein fitness prediction method based on deep learning |
CN115620831A (en) * | 2022-10-09 | 2023-01-17 | 深圳瑞德林生物技术有限公司 | Method for generating sequence mutation fitness through loop iteration optimization and related device |
CN118197408A (en) * | 2024-03-27 | 2024-06-14 | 嘉兴欣贝莱生物科技有限公司 | Enzyme thermal stability mutant prediction method and device, electronic equipment and storage medium |
CN118398079A (en) * | 2024-06-25 | 2024-07-26 | 中国人民解放军军事科学院军事医学研究院 | Computer device, method and application for predicting amino acid mutation effect or carrying out design modification on protein |
CN118813658A (en) * | 2024-07-25 | 2024-10-22 | 中国水产科学研究院黄海水产研究所 | Method for improving the thermal stability of bacteriophage, Qβ bacteriophage and virus-like particles |
-
2024
- 2024-12-13 CN CN202411836877.4A patent/CN119296640A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190259470A1 (en) * | 2018-02-19 | 2019-08-22 | Protabit LLC | Artificial intelligence platform for protein engineering |
US20220375539A1 (en) * | 2019-08-23 | 2022-11-24 | Geaenzymes Co. | Systems and methods for predicting proteins |
CN115620831A (en) * | 2022-10-09 | 2023-01-17 | 深圳瑞德林生物技术有限公司 | Method for generating sequence mutation fitness through loop iteration optimization and related device |
CN115472221A (en) * | 2022-10-21 | 2022-12-13 | 重庆邮电大学 | Protein fitness prediction method based on deep learning |
CN118197408A (en) * | 2024-03-27 | 2024-06-14 | 嘉兴欣贝莱生物科技有限公司 | Enzyme thermal stability mutant prediction method and device, electronic equipment and storage medium |
CN118398079A (en) * | 2024-06-25 | 2024-07-26 | 中国人民解放军军事科学院军事医学研究院 | Computer device, method and application for predicting amino acid mutation effect or carrying out design modification on protein |
CN118813658A (en) * | 2024-07-25 | 2024-10-22 | 中国水产科学研究院黄海水产研究所 | Method for improving the thermal stability of bacteriophage, Qβ bacteriophage and virus-like particles |
Non-Patent Citations (3)
Title |
---|
RAMIN DEHGHANPOOR ET AL: "Predicting the Effect of Single and Multiple Mutations on Protein Structural Stability", 《MOLECULES》, vol. 23, 31 December 2018 (2018-12-31), pages 1 - 18 * |
YANG QU ET AL: "Ensemble Learning with Supervised Methods Based on Large-Scale Protein Language Models for Protein Mutation Effects Prediction", 《 INT. J. MOL. SCI.》, vol. 24, 31 December 2023 (2023-12-31), pages 1 - 15 * |
李岩 等: "基于深度学习的蛋白质设计研究综述", 《创新科技与应用》, no. 20, 31 December 2023 (2023-12-31), pages 1 - 5 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022206320A1 (en) | Prediction model training and data prediction methods and apparatuses, and storage medium | |
Vlasblom et al. | Markov clustering versus affinity propagation for the partitioning of protein interaction graphs | |
Maillet et al. | COMMET: comparing and combining multiple metagenomic datasets | |
CN110136773A (en) | A method for constructing plant-protein interaction network based on deep learning | |
CN111627494B (en) | Protein property prediction method and device based on multidimensional features and computing equipment | |
CN114093515A (en) | Age prediction method based on intestinal flora prediction model ensemble learning | |
Zhou et al. | TransVAE-DTA: Transformer and variational autoencoder network for drug-target binding affinity prediction | |
Hassan et al. | DLSCORE: A deep learning model for predicting protein-ligand binding affinities | |
Sriwastava et al. | Predicting protein-protein interaction sites with a novel membership based fuzzy SVM classifier | |
Zhang et al. | Pareto dominance archive and coordinated selection strategy-based many-objective optimizer for protein structure prediction | |
Hu et al. | Improving protein-protein interaction site prediction using deep residual neural network | |
EP4272216A1 (en) | Protein structure prediction | |
CN111180021B (en) | Prediction method of protein structure potential energy function | |
CN119296640A (en) | Method, device and related equipment for screening mutant proteins | |
Qu et al. | P (all-atom) Is Unlocking New Path For Protein Design | |
Rahman et al. | Equivariant encoding based gvae (eqen-gvae) for protein tertiary structure generation | |
CN117079741A (en) | Molecular insulation strength prediction method, device and medium based on neural network | |
Walsh et al. | Ab initio and homology based prediction of protein domains by recursive neural networks | |
CN115631787A (en) | Virtual screening system and method based on 3D protein structure convolution neural network | |
Jing et al. | Protein inter-residue contacts prediction: methods, performances and applications | |
Kurniawan et al. | Predictive Performance Evaluation of ARIMA and Hybrid ARIMA-LSTM Models for Particulate Matter Concentration | |
WO2022146631A1 (en) | Protein structure prediction | |
CN114300035A (en) | Personalized parameter generation method for protein force field simulation | |
Sun et al. | PMSPcnn: Predicting protein stability changes upon single point mutations with convolutional neural network | |
Yao et al. | A two-step ensemble learning for predicting protein hot spot residues from whole protein sequence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |