Background
The ant colony algorithm and the artificial fish colony algorithm are called as two emerging colony intelligent algorithms, although the birth time is short, the progress speed is high, and the ant colony algorithm and the artificial fish colony algorithm are tried to be applied to a plurality of fields by broad scholars and achieve good results. While a single algorithm has many advantages, it has various disadvantages.
The ant colony algorithm has good parallelism and high accuracy, which is a prominent advantage of the ant colony algorithm. The artificial ants can search the whole search space, and can exchange information among the artificial ants, so that the purpose of optimizing is finally achieved. In the ant colony algorithm, each artificial ant actually interacts with the environment, and each artificial ant is connected with each other by means of pheromone 'bridging'. For example: an artificial ant finds a food source, the artificial ant does not obviously yell other ant partners to say that food is provided at the position, but the artificial ant leaves pheromone on a path, and the 'polite' mode leads other ants to perceive the existence of the pheromone at the position when passing through nearby zones so as to successfully find the food. That is, when the ant does not find the food in the beginning, the pheromone left on the path is useless pheromone, which slows down the initial optimization speed, and is a prominent disadvantage of the ant colony algorithm, and the ant needs to find the food as soon as possible by the moving rule. Specifically, the method comprises the following steps: ants first need to have some inertia and keep a tendency to move forward, rather than move around in place. Secondly, ants keep a certain randomness after fixing the motion trend, and abstain from crazy linear motion like particles in the particle swarm optimization. By doing so, ants can try to move according to the original trend as much as possible, and can also try to generate a new search range. But ants also have errors and make mistakes, do not move to the place with high pheromone, and walk blindly, so that local optimal conditions are generated, which is also a disadvantage in the ant colony algorithm.
The artificial fish swarm algorithm is a swarm intelligent bionic algorithm for simulating the behavior of fish in nature. Local optimization is carried out by constructing four underwater behaviors of a single artificial fish, and finally global optimization is achieved. In the food searching behavior of the fish, fewer times of attempts are set, the probability of artificial fish random swimming is increased, the capability of local optimal solution can be avoided, and global optimal is obtained. The use of the crowdedness factor can limit the scale of fish shoal gathering, so that the artificial fish can be more freely optimized. The occurrence of the trailing behavior can enable the artificial fish to act towards a better state, and also enable the artificial fish trapped in the local optimum to escape from the local optimum area and move towards the global optimum direction. The design idea of the artificial fish swarm algorithm is different from that of the traditional optimization algorithm, but can be combined with the traditional design idea and applied to the NP problem. The algorithm has no special requirement on initial values, and is not very sensitive to parameter selection, which is also carefully introduced and studied in the above. In the algorithm, only the function value of the objective function is used, and the algorithm is not limited by other special information. This is also an advantage of the artificial fish swarm algorithm. The typical disadvantage of the artificial fish swarm algorithm is that the accuracy is not high.
Through the above description, the ant colony algorithm and the artificial fish colony algorithm have complementary relationship, and the two colony intelligent algorithms are improved, so that the performance and the efficiency of the algorithm can be improved through effective fusion. The improved group intelligent algorithm can avoid the defects of respective single algorithms, strengthen the advantages of the algorithms and form a more stable and comprehensive group intelligent algorithm.
Protein folding has been a focus of great concern. According to statistics of relevant departments, the protein of about 1/3 normal cells of a human body generates folding errors, but intracellular mechanisms can find the existence of the wrong protein in time and clear the wrong protein quickly, but when the folding errors of the protein are too large to be processed, various diseases can be caused, and the diseases are similar to Alzheimer disease (commonly known as senile dementia), spongiform hemangioma, familial hereditary cholesterol altitude disease, rabies and the like. Therefore, the study on protein folding is not only of great significance but also is very slow.
Protein folding refers to the process of folding a protein molecule from a loose chain into a high-order structure with activity. During folding, it is necessary to go through a number of intermediate structural states, i.e. partial folding of the protein, also called intermediate folds. By protein folding, we can understand how the amino acid sequence determines the spatial structure of the protein. The research shows that: the process of protein folding is very cumbersome and quite fast. Many theories are involved in the folding process, and the main theories are thermodynamic and kinetic theories. The thermodynamic theory of protein folding was first to say that all the information that stabilizes the native conformation of a protein is displayed in the amino acid sequence of the polypeptide chain, and that the protein itself can fold into the native conformation without the need for external energy induction. As research continues, more intensive studies have shown that in vitro proteins are not all reversible, and that in some species the proteins have very low recoverability, and the theory of kinetics has played a role here. Kinetic theory suggests that during protein folding, there is some force that prevents the protein from successfully completing its native conformation, thus leaving the protein structure in a second stable state. In summary, the protein folding process is kinetically controlled, but follows a thermodynamically speaking transition from a high energy state to a low energy state.
The theoretical significance of protein folding is that it uncovers the second genetic code in the organism. The reason for this is that amino acids are the basic unit of proteins, the primary structure of proteins is also the amino acid sequence, amino acid residues interact and entangle, and the structure can be changed to a higher order structure in a short time, but researchers cannot judge the structure of proteins in a short time according to the order of amino acid arrangement.
The results achieved at present are evident for protein folding. Strict corrections were made to the statement that the new peptide fragment folded spontaneously; protein structure equilibrium was analyzed in conjunction with spectroscopy; discovery of molecular chaperone recognition mechanism, etc. Meanwhile, there are many unknown and guesses for protein folding, there are about 170000 primary protein structures in the database, only 12000 proteins have been detected, the 12000 primary structures also include many homologous proteins, their structures are very similar, and the protein structures with completely different structures are only detected about 1000. This data is sufficient to show that the protein folding study requires more attention and also that the subject matter of the study is of great importance and significance.
The theory of protein structure prediction has been well developed for decades, but the prediction results have not been comparable to those collected in the protein structure database (PDB) by means of biochemical experiments. The ultimate goal of protein structure prediction is the protein folding problem. The problems of low precision and long calculation time of the structure of the protein predicted at present exist, and some search algorithms are easy to be trapped in local extreme points.
Disclosure of Invention
The invention aims to solve the problems of low precision and long calculation time of protein structure prediction in the prior art, and discloses a protein folding prediction method based on an ant colony fish swarm algorithm.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the method comprises the following steps: initializing and setting an ant colony algorithm, and initializing and setting an artificial fish colony algorithm to form a new initial population;
step two: alternately iterating to generate a new initial solution;
step three: judging the degree of congestion; if the population is crowded, the population keeps the original state, one step is carried out randomly, the step two is repeatedly executed, if the population is not crowded, the step one step is carried out towards the central position of the species group, and the step four is executed by the population;
step four: updating pheromone concentration and bulletin board information;
step five: calculating the fitness;
step six: judging whether the conditions are met; and if the conditions are met, outputting the result, and if the conditions are not met, returning to the step two.
Further, in the first step, initializing the number m of the artificial ants, a heuristic factor alpha, a hope heuristic factor beta, an information persistence factor rho and information intensity Q; the number M of the artificial fish, the step length step, the visual field range of the artificial fish and the crowding factor delta are initialized.
Further, in the second step, the ant colony algorithm and the artificial fish colony algorithm are alternately iterated, and the superiority or inferiority of the solution is determined by comparison; in the same problem, if the ant colony algorithm searches a preferred solution first, the preferred solution is randomly placed at the position of an artificial fish in the artificial fish colony; if the artificial fish swarm algorithm searches a preferred solution first, the preferred solution is randomly placed at the position of one artificial ant in the artificial ant swarm.
Further, in step three, if
If the center of the population has more food and is not crowded, the population is carried out in the direction of the central position of the population, the step four is carried out, and if the population is not crowded, the step four is carried out
If the population is crowded, keeping the population in the original state, randomly moving one step, and repeatedly executing the second step, wherein n in the formula
fIndicating the number of groups in the current domain, y
cFitness value, y, representing the center position
iA fitness value representing the current state.
Further, in step four according to
To update the pheromone density and to update the bulletin board information, where τ
ij(t) pheromones from node i to j for the tth ant; when an ant k passes j from node i in this cycle,
otherwise, the value is 0, Q represents the information intensity, namely the amount of the released information of the artificial ants, l
kRepresents the path length of the kth ant in the cyclic search process.
Further, in step five, the formula y is usedi=f(xi) Finding the fitness where xiRepresenting the position vector, y, of the i-th artificial fishiExpressing the fitness value of the i-th artificial fish, wherein f (x)i) The method is characterized in that the optimal value of a single fish is obtained by taking the food concentration of the position of the artificial fish as the adaptive value of an objective function and taking the state of the single fish as the optimal value and bringing the optimal value into the optimal value.
Further, in the sixth step, whether a condition for ending the algorithm process is met is judged, the condition for ending the algorithm is that the maximum iteration number is reached or the error reaches the target precision requirement, if the condition is met, the optimal solution, namely the lowest value of the energy function of the three-dimensional Toy model, is output, the algorithm is ended, and if the condition is not met, the second step is returned to continue to execute the alternate iteration to generate a new initial solution.
Compared with the prior art, the invention has the beneficial effects that: the invention records a group intelligent algorithm which has better global convergence performance and faster rapid convergence capability, is easy to escape from local optimization and achieves global optimization. The convergence rate of the algorithm is greatly improved, the minimum energy value of the protein sequence can be searched under the conditions of high operation precision and short search time, and the obtained conformation diagram can basically reflect the physical properties of the real protein, so that the structure of the protein can be effectively predicted.
Detailed Description
Example 1
A protein folding prediction method based on an ant colony fish swarm algorithm comprises the following steps:
the method comprises the following steps: initializing and setting an ant colony algorithm, and initializing and setting an artificial fish colony algorithm to form a new initial population;
step two: alternately iterating to generate a new initial solution;
step three: judging the degree of congestion; if the population is crowded, the population keeps the original state, one step is carried out randomly, the step two is repeatedly executed, if the population is not crowded, the step one step is carried out towards the central position of the species group, and the step four is executed by the population;
step four: updating pheromone concentration and bulletin board information;
step five: calculating the fitness;
step six: judging whether the conditions are met; and if the conditions are met, outputting the result, and if the conditions are not met, returning to the step two.
Further, in the first step, initializing the number m of the artificial ants, a heuristic factor alpha, a hope heuristic factor beta, an information persistence factor rho and information intensity Q; the number M of the artificial fish, the step length step, the visual field range of the artificial fish and the crowding factor delta are initialized.
Further, in the second step, the ant colony algorithm and the artificial fish colony algorithm are alternately iterated, and the superiority or inferiority of the solution is determined by comparison; in the same problem, if the ant colony algorithm searches a preferred solution first, the preferred solution is randomly placed at the position of an artificial fish in the artificial fish colony; if the artificial fish swarm algorithm searches a preferred solution first, the preferred solution is randomly placed at the position of one artificial ant in the artificial ant swarm.
Further, in step three, if
If the center of the population has more food and is not crowded, the population is carried out in the direction of the central position of the population, the step four is carried out, and if the population is not crowded, the step four is carried out
If the population is crowded, keeping the population in the original state, randomly moving one step, and repeatedly executing the second step, wherein n in the formula
fIndicating the number of groups in the current domain, y
cFitness value, y, representing the center position
iA fitness value representing the current state.
Further, in step four according to
To update the pheromone density and to update the bulletin board information, where τ
ij(t) denotes the t th leechPheromones when ants go from node i to node j; when an ant k passes j from node i in this cycle,
otherwise, the value is 0, Q represents the information intensity, namely the amount of the released information of the artificial ants, l
kRepresents the path length of the kth ant in the cyclic search process.
Further, in step five, the formula y is usedi=f(xi) Finding the fitness where xiRepresenting the position vector, y, of the i-th artificial fishiExpressing the fitness value of the i-th artificial fish, wherein f (x)i) The method is characterized in that the optimal value of a single fish is obtained by taking the food concentration of the position of the artificial fish as the adaptive value of an objective function and taking the state of the single fish as the optimal value and bringing the optimal value into the optimal value.
Further, in the sixth step, whether a condition for ending the algorithm process is met is judged, the condition for ending the algorithm is that the maximum iteration number is reached or the error reaches the target precision requirement, if the condition is met, the optimal solution, namely the lowest value of the energy function of the three-dimensional Toy model, is output, the algorithm is ended, and if the condition is not met, the second step is returned to continue to execute the alternate iteration to generate a new initial solution.
By analyzing the advantages and the disadvantages of the ant colony algorithm and the fish colony algorithm, the method adopts the complementary principle, improves and fuses the two colony intelligent algorithms, and calls the improved colony intelligent algorithm as an improved ant colony fish colony algorithm (ACO-AFSA for short). The improved ant colony fish swarm algorithm is applied to protein folding prediction, the performance of the colony intelligent algorithm is better than that of a single algorithm, and the optimal effect of a target problem can be achieved better.
The ant colony algorithm and the artificial fish colony algorithm have many commonalities. First, the artificial ants in the ant colony algorithm always return the food to the ant hole by the shortest path, and the artificial fish in the artificial fish colony algorithm is also the most abundant zone that likes to gather the food in water. Furthermore, the search behavior of artificial ants and the feeding behavior of artificial fish are also extremely striking similarities. Besides, the pheromone concentration in the ant colony algorithm and the bulletin board function in the artificial fish colony algorithm are the same.
The greatest difference between the artificial fish swarm algorithm and the ant swarm algorithm is the appearance of the crowdedness factor in the artificial fish swarm algorithm, and because the greatest difference is similar in many places, two independent swarm intelligence algorithms are improved and fused to form a new improved swarm intelligence algorithm, namely the ant swarm fish swarm algorithm (ACO-AFSA for short). The improved algorithm references the global convergence performance of the ant colony algorithm and the rapid convergence capability of the artificial fish colony, so that local optimization is easier to escape, global optimization is achieved, and the precision of the result is guaranteed.
In the calculation process, the ant colony algorithm and the artificial fish colony algorithm are alternately iterated, and whether the solution is excellent or not is determined through comparison. In the same problem, if the ant colony algorithm searches a preferred solution first, the preferred solution is randomly placed at the position of an artificial fish in the artificial fish colony, so that the aim of enriching the colony, improving the efficiency and improving the precision of the artificial fish colony algorithm is fulfilled. If an optimal solution is searched by the artificial fish swarm algorithm, the optimal solution is randomly placed at the position of an artificial ant in the artificial ant swarm, so that the purposes of enriching the swarm and accelerating the optimization are achieved.
The improved ant colony fish swarm algorithm has the capability of fast searching, maintains the global convergence and the result precision of the ant colony algorithm, successfully overcomes the defect that the ant colony algorithm is easy to fall into local optimum, greatly improves the convergence speed, and is satisfactory to the improved ACO-AFSA algorithm.
Protein folding prediction faces two major problems, one is how to convert a fussy protein structure into a simple mathematical model, so that the problem becomes simplified; another is how to find an efficient search method for efficient protein conformation searches. The current simplified protein structure models mainly comprise Toy models, HP lattice point models, three-dimensional cubic lattice models, hexagonal lattice point models and the like. The invention applies the improved ACO-AFSA group intelligent algorithm to the protein folding prediction problem of the three-dimensional Toy model in an important way to verify the effectiveness of the intelligent algorithm.
In IIIIn the dimension Toy model, not only the bond angle θ is involved, but also the included angle between two planes formed by three consecutive peptide bonds is considered, which is called the torsion angle and is marked as β3,β4,...,βN-1,βiE (-pi, pi). When the two-plane torsion angle formed by three peptide bonds is 0, the three peptide bond positions are shown to be in the same plane. The positive and negative values of β represent the clockwise and counterclockwise directions of the rotation. Fig. 2 shows the structure of the three-dimensional Toy model in space.
Regardless of the length of the amino acid sequence, their energy functions are the same, being the sum of the van der Waals attractive potential and the bending potential. The length of a protein sequence is N, and the van der Waals potential energy between separated residues is E1Comprises the following steps:
ξ
iand xi
jRepresents the kind of amino acid residue, and when the specified residue is A, the xi value is 1; when residue is B, ξ has a value of-1.
Its values are only three cases, 1/2, -1/2, 1. These three values represent strong attraction between AA, mutual repulsion between AB and weak attraction between BB, respectively, showing the properties of the protein from the side; r is
ijIndicates the distance between residues.
As can be seen from the formula: the van der waals attractive potential is determined by the hydrophobicity, polarity, and near-far distance of non-adjacent residues.
The two-dimensional Toy model is the basis of the three-dimensional Toy model, and the three-dimensional Toy model has one more torsion angle than the two-dimensional Toy model, so that the energy function is added with a dihedral angle potential energy on the basis of the deformation of the two-dimensional Toy model.
Therefore, the energy function of the three-dimensional Toy model is:
in the formula, k1And k2As a coefficient, it indicates the strength of the inter-residue effect, and the coefficient value is defined as 0.5 or-1.
In the three-dimensional Toy model, the study of the protein folding prediction can be translated into when the energy value of the evaluation function E is minimal, which can be expressed as:
although there are many similarities between the two-dimensional Toy model and the three-dimensional Toy model, it is clear that they are of both types. The three-dimensional Toy model creates a stereo network in space and becomes more complex. However, the protein folding prediction of the three-dimensional Toy model can more intuitively embody the spatial structure of the protein and has high exploration value.
Today, few researchers have tested true sequences using swarm intelligence algorithms for prediction of protein folding. To further illustrate that the improved algorithm of the design is prominent, this part will use the real protein sequence simulation test. For comparative and convincing purposes, the energy values obtained with the existing algorithm were compared, as shown in table 1.
TABLE 1 true protein sequences
Of the three true protein sequences, Q, R, D, N, S, K, E, Y, F, H, K, O, T is a polar residue; G. v, L, A, I, C, M, V, P is a hydrophobic group. In the prediction of protein folding in the three-dimensional Toy model, the actual protein sequence was reduced to a sequence containing B and a according to the method of the present application, i.e., the internationally widely used Fibonacci (Fibonacci) sequence, as shown in table 2.
TABLE 2 reduced A, B-containing sequences
The conversion of the true protein sequence to an AB aligned sequence can be predicted using the improved algorithm of the present application. The three sequences were tested using MATLAB7.0 software, still on an Intel (R) Core i32.40GHz CPU,2.00GB RAM and 32-bit operating system. Important parameters in the algorithm are set as follows: α is 2.0, β is 3.0, ρ is 0.65, Q is 100, and δ is 0.60. The maximum number of iterations is 80 and the number of attempts is 80. The selected M value for the first two sequences is 15, the selected M value for the last three sequences is 35, and the test results are shown in Table 3. In this table, the present application compares the test results with the existing results, and the comparison results are that the improved algorithms used in the present application all yield lower energy minima than the existing methods. The results of the comparison are shown in line 3. Especially for the protein sequence with the length of 13, the energy value obtained by the algorithm in the text is slightly better than that of the existing method; especially for the length 38 sequence, the effect is more obvious. The comparison of these visual data is significant enough to explain the improved algorithm designed by this application.
TABLE 3 test results
The application applies an improved group intelligence algorithm to a three-dimensional Toy model, tests a real protein sequence, compares the obtained result with the existing result, and gives a comparison line chart shown in figure 3. The result of the application is more excellent, and the conformation diagram can basically reflect the physical properties of the real protein, thereby further explaining the feasibility and the effectiveness of the group intelligent algorithm applied by the application.
In the present application, specific information of documents [65] and [66] is as follows:
[65] protein folding structure prediction [ C ] of Wuhan university of science and technology based on AB nong's model and genetic annealing algorithm 2007
[66] Zhanxiaong, litting, reed. multiple population particle population optimization algorithm based on TOY model protein folding prediction study [ J ]. computer science.2008, 35 (10): 230-235.