CN114242169A

CN114242169A - Antigen epitope prediction method for B cells

Info

Publication number: CN114242169A
Application number: CN202111537519.XA
Authority: CN
Inventors: 羊红光; 周云飞; 成彬
Original assignee: Institute Of Applied Mathematics Hebei Academy Of Sciences
Current assignee: Institute Of Applied Mathematics Hebei Academy Of Sciences
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-03-25
Anticipated expiration: 2041-12-15
Also published as: CN114242169B

Abstract

A method for epitope prediction of B cells, said method comprising first forming a pre-training set PT; in each epsilon of the Q _ learning algorithm, the Q-agent takes any 8 consecutive amino acid residues in the primary sequence of the protein as states, and selects k residues from the 12 consecutive residues following each state to incorporate into the state as a first action; selecting one of n complementary classifiers as a second action option, searching in PT according to a continuous action search method, giving instant reward to the searched amino acid sequence by a tendency reward rule, calculating a Q value and updating until the change of the cost function is less than 1 percent, and finishing training; then, the amino acid sequence is searched in the primary protein sequence by using the trained strategy, and the primary protein sequence is classified by the selected classifier. The invention greatly enhances the prediction capability of the B cell epitope through automatic iteration and improves the accuracy of epitope classification.

Description

Antigen epitope prediction method for B cells

Technical Field

The invention relates to a B cell epitope prediction method, which can accurately predict B cell epitopes and belongs to the technical field of artificial intelligent detection of microorganisms.

Background

The accurate determination of B cell antigen epitope is an important basis for the design of bioactive drugs and epitope vaccines, is a key step in the research and development of disease kits, and is a basic technology for researching immunodiagnosis and immunotherapy. Machine learning-based B cell epitope prediction is an important technical route for determining the epitope, and has the advantages of greatly saving time, money and labor cost compared with other technical routes.

SEPPA is an epitope prediction software recommended in an Immune Epitope Database (IEDB) built by national institute of allergy and infectious diseases, and is updated to 3.0 edition in 2019. The scholars responsible for the development of SEPPA 3.0 have noted in their paper that the progress of conformational epitope prediction has progressed smoothly but slowly over the last decade.

The existing epitope prediction adopts a supervised learning strategy, and a class predictor is obtained by learning epitope samples and non-epitope samples. Although new epitope prediction methods are continuously generated, the prediction accuracy is continuously improved, and the problems of low universality of the model, low classification accuracy, slow updating of the prediction model and the like exist. In particular, the number of amino acids in the prediction result is preset as an integer before prediction by a common window method, which is highly artificial and difficult to predict an epitope with an optimal length.

With the revolutionary breakthrough of AlphaFold in the field of protein structure prediction as AlphaGo defeats the strongest human player in go wars, these successes have greatly inspired us. The two breakthroughs have a common characteristic that an automatic learning mechanism is introduced, so that the model can continuously self-iterate to gradually generate strong recognition capability.

However, the existing methods are non-automatic learning and cannot enhance the prediction capability through automatic iteration, so that an automatic mechanism is introduced into the prediction of the epitope of the B cell, and a method capable of accurately determining the epitope of the B cell is necessary to design.

Disclosure of Invention

The invention aims to provide a B cell epitope prediction method aiming at the defects of the prior art so as to improve the accuracy of B cell epitope prediction.

The problems of the invention are solved by the following technical scheme:

a method for predicting B cell epitope, the method searches B cell epitope sequence data from IEDB database to form set EPT, extracts corresponding protein primary sequence from uniport database to form pre-training set PT; based on a Q _ learning algorithm, changing one action of the algorithm into two actions for training; in each epsilon, the Q-agent takes any 8 consecutive amino acid residues in the primary sequence of the protein as states, and selects k residues from the last 12 consecutive residues of each state to incorporate into that state as a first action; selecting one of n complementary classifiers as a second action option, searching in a primary protein sequence in PT according to a continuous action search method, giving instant reward to the searched amino acid sequence by a tendency reward rule, calculating a Q value and updating until the change of a value function is less than 1 percent, and finishing training; then, the amino acid sequence is searched in the primary protein sequence by using the trained strategy, and the selected classifier is used for classifying, so that the B cell epitope prediction is realized.

The method for predicting the epitope of the B cell comprises the following steps:

a. b cell epitope sequence data are searched from an IEDB database to form a set EPT, corresponding protein primary sequences are extracted from a uniport database to form a pre-training set PT, and a set containing n more than or equal to 2 complementary classifiers is selected as a second action;

b. taking any 8 continuous amino acid residues in the primary sequence of the protein as states, and selecting k residues from 12 continuous residues after each state to be incorporated into the state as a first action; selecting one of n complementary classifiers as a second action option, initializing Q values corresponding to all the states and actions to 0, setting the learning rate alpha to be an arbitrary number between 0 and 1, setting the discount factor gamma to be an arbitrary number between 0 and 1, setting the value of the epamode, and setting the initial state s₀Is any 8 amino acid residues of the pre-training set;

c. in each epsilon, the Q-agent performs a search in the primary sequence of the proteins in the set PT according to a continuous motion search method: in the tth step, the Q agent selects an action from the first set of actionsMaking

Then selecting an action in the set of second actions

Awarding a reward R according to a tendency reward law after the two actions have been performed_tAnd the next observation state s_t+1Then updating the Q value, updating the state and the action table at the same time, and ending the search training process when the change of the value function is less than 1%;

d. and searching out amino acid combinations in each protein primary sequence by utilizing a strategy obtained by training, classifying by using a selected classifier, and if the result of the classifier shows that the searched amino acid sequences are epitopes, considering the amino acid sequences as B cell epitopes, otherwise, not considering the amino acid sequences as B cell epitopes.

The epitope prediction method for B cells described above, the specific search process of the continuous motion search method is:

taking any 8 amino acid residues of each protein primary sequence as initial state s₀And the corresponding amino acid sequence is marked as X₁X₂…X₈, wherein X_jDenotes the j-th amino acid, j ═ 1,2, …,8, in order to change from the initial state s₀Selecting k residues from the next 12 consecutive residues to incorporate into the state as a first action, 1 ≦ k ≦ 12, to select one of the n complementary classifiers as a second action option; according to the corresponding

Value selection first action and second action, where a¹,a²All possible actions in the first action and all possible actions in the second action are referred to respectively, then the reward is calculated by the tendency reward rule for the above two actions, and the cost function is calculated according to the following formula:

wherein ,V_π(s) is the cost function at state s, π is the strategy,

is that R is_tIs the yield of step t, V(s)_t+1) Is the next state s_t+1A cost function of;

the Q value was calculated as follows:

wherein Q_π(s,a¹,a²) Is a cost function that performs two consecutive actions in state s,

is the next state s_t+1Perform two consecutive actions

A cost function of (a);

the Q value is also updated as follows:

then changing the state, repeating the above steps, and updating the Q value.

The epitope prediction method for B cells comprises the following steps:

extracting the characteristics of the amino acid sequence searched by the first action as the input of a classifier selected by the second action, and calculating the classification score SC of the amino acid sequence by the classifier_tIn the collective EPT, the probability of occurrence of each amino acid and the probability of occurrence of an amino acid pair comprising two consecutive amino acids are calculated, and any one of the amino acids as is_iProbability of occurrence

The calculation was performed as follows:

wherein as_iRepresents any one of the 20 amino acids, num (as)_i) Represents as_iThe number of times the EPT set occurs, maxnum (as)₁,as₂,…,as₂₀) Indicates the maximum number of occurrences of 20 amino acids in the pool EPT, minnum (as)₁,as₂,…,as₂₀) Represents the minimum number of occurrences of 20 amino acids in the set EPT;

any one amino acid pair AA_iProbability of occurrence of

The calculation was performed as follows:

wherein AA_iRepresents one of 400 amino acid pairs, num (AA)_i) Represents AA_iThe number of occurrences of the set EPT, maxnum (AA)₁,AA₂,…,AA₄₀₀) Represents the maximum number of occurrences of 400 amino acid pairs in the EPT pool, minnum (AA)₁,AA₂,…,AA₄₀₀) Represents the minimum number of occurrences of 400 amino acids in the set EPT;

the timely reward for the amino acid sequence generated in step t is calculated by:

wherein len (sq)_t) Representing the sequence sq_tContains the number of amino acids, len (sq)_t) Indicates the number of consecutive amino acid pairs.

Advantageous effects

The method combines the Q _ learning algorithm, the continuous action search method and the tendency reward method together, and simultaneously introduces the complementary classifier, so that the method greatly enhances the prediction capability of the B cell epitope through automatic iteration and improves the accuracy of epitope classification.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings.

FIG. 1 is a diagram of the Q _ learning algorithm.

Detailed Description

The invention provides an epitope prediction method for B cells, which adopts an epitope prediction method of continuous action search based on a Q-learning method, one action selects sequence length, and one action selects a complementary classifier, thereby realizing the independent selection of sequence length and the selection of the optimal classifier for classification.

In the Q-Learning algorithm, each state-action has a corresponding Q value. Therefore, the Learning process of the Q-Learning algorithm is to iteratively calculate the Q value of the Learning state-action pair. And finally, the optimal action strategy obtained by the learner is the action corresponding to the maximum Q value in the selected state s. The Q value Q (s, a) based on action a in state s is defined as the cumulative reward value obtained after the learner executes action a in state s and then executes it according to a certain action strategy. The basic equation for updating the Q value is as follows:

Q(s_t,a_t)＝Q(s_t,a_t)+α[Rs_t+γmaxQ(s_t+1，a)-Q(s_t，at)]

in the above formula (wherein, a is an optional action in the state; Rs)_tAn immediate reward given to the environment at time t state s; α is the learning rate; q(s)_t,a_t) At time t, the evaluation value of the state-action (s, a).

The Q-Learning algorithm pseudo code is shown in Table 1:

TABLE 1Q-Learning Algorithm pseudocode

FIG. 1 is a diagram of the Q _ learning algorithm.

Interpretation of professional terms

(1) The primary sequence of the protein is a sequence consisting of 20 amino acids, such as ADFCEGHIKLST.

(2) A B cell epitope, which is a portion of the primary sequence of a protein, may consist of a partial sequence.

(3) "epamode" is the process by which an Agent (Agent) executes a policy within an environment from start to finish.

(4) Agent (Agent): a software and hardware mechanism. It takes corresponding measures through interaction with the surrounding environment. Agent chinese is also called Agent, often called Q Agent in Q _ learning algorithm, which is the subject to exploration and learning in the environment.

(5) Action (Action): various possible actions that the agent may take. Although the actions themselves are somewhat self-explanatory, we still need to have agents that can choose from a series of discrete, and possible actions.

(6) Environment (Environment): there is an interaction between the external environment and the agent, and a responsive relationship. The environment takes as input the current state and actions of the agent and as output the reward and next state of the agent. The context is all present except the agent.

(7) State (State): the state is a specific and direct case that the agent finds itself, including: a particular location, time of day, and an instantaneous configuration that associates the agent with other important things.

(8) Reward (Reward): rewards are feedback by which we can measure the success or failure of various actions of an agent in a given state.

(9) Discount factor (Discount factor): the discount factor is a multiplier. The future rewards discovered by the agent are multiplied by the factor to attenuate the cumulative impact of such rewards on the agent's current action selection. This is the core of reinforcement learning, namely by gradually reducing the value of future rewards in order to give more weight to recent actions. This is crucial for the paradigm based on the principle of "delayed action".

(10) Policy (Policy): is a function of the input state observation and the output action. Is the policy that the agent uses to determine the next action based on the current state. It can map different states to various actions to promise the highest reward.

(11) Value (Value): it is defined as a long-term expected reward (not a short-term reward) with a discount on the current state under a particular policy. A short-term reward is an instant reward that an agent gets in a certain state and takes a certain action. The value is the total amount of rewards an agent expects to receive from a certain state until the future.

(12) Q value (Q-value) or action-value (action-value): the difference from "value" is that the Q value requires an additional parameter, i.e. the current action. It refers to a long-term reward that an action generates from the current state under a particular policy.

(13) Bellman equation: it is a set of equations that decompose the value function into an instant prize plus a discount future value.

(14) Iteration of values (Value iteration): this is an algorithm that iteratively refines the estimate of the value to compute a function with the best state value. The algorithm initializes the value function to an arbitrary random value and then repeatedly updates the values of the Q value and the value function until they converge.

(15) Policy iteration (Policy iteration): since the agent is only interested in finding the optimal strategy, the optimal strategy will sometimes converge before the cost function. Thus, a policy iteration should not repeatedly improve the evaluation of the value function, but rather, need to redefine the policy at each step and compute values from the new policy until the policy converges.

(16) Q learning (Q-learning): as an example of a model-less learning algorithm, it does not assume that the agent has mastered the transition to state and the reward model, but rather "thinks" that the agent will find the correct action through trial and error. Therefore, the basic idea of Q learning is: during agent and environment interaction, the Q function of the state-action pairs is approached by observing the sample of the Q value function.

Most of epitope prediction methods based on machine learning adopt a marked sequence as a positive sample and an unmarked sequence as a negative sample, utilize amino acid physicochemical characteristics, statistical properties, structural information and the like as characteristic input, train a classifier by utilizing a common classification learning algorithm, and then classify the sequence by using the classifier. This prediction method generally sets a window in advance, and the window size is how many amino acids the result contains. Because the epitope sequence has larger difference, the epitope can not be predicted well by using a trained classifier, so the integration method has more advantages.

Firstly, searching B cell epitope sequence data from an IEDB database to form a set EPT, extracting a corresponding protein primary sequence from a uniport database to form a pre-training set PT; in each epsilon, the Q-agent searches for a combination of residues in the PT according to a continuous action search method, taking any 8 consecutive amino acid residues in the PT as a state, and selecting k residues from the 12 consecutive residues following each state to incorporate into the state as a first action; selecting one of n complementary classifiers as a second action option, giving instant reward to the searched amino acid sequence by a tendency reward rule, calculating a Q value, updating, ending training until the change of a cost function is less than 1%, and then predicting the epitope of the B cell by using the trained classifier.

The specific search steps are as follows:

firstly, B cell epitope sequence data are searched from an IEDB database to form a set EPT, a corresponding protein primary sequence is extracted from a uniport database to form a pre-training set PT, and a set containing n ≧ 2 complementary classifiers is selected as a second action.

Secondly, initializing Q values corresponding to all states and actions to be 0, setting the learning rate alpha to be an arbitrary number between 0 and 1, setting the discount factor gamma to be an arbitrary number between 0 and 1, setting the numerical value of the epamode, and setting the state s₀Is an arbitrary of a pre-training setMeaning 8 amino acid residues.

Third, in each epsilon, the Q agent searches in the set PT according to a continuous action search method, and in the t step, the Q agent selects an action from the first set of actions

Then selecting an action in the set of second actions

The two actions are performed continuously, and after the two actions are performed, a reward R is given according to a tendency reward rule_tAnd the next observation state s_t+1. The Q value is then updated, along with the state and action tables. When the variance of the cost function is less than 1%, the search training process ends.

And fourthly, searching out amino acid combinations in each protein primary sequence by utilizing a strategy obtained by training, classifying by a selected classifier, and considering the searched amino acid sequences as epitopes if the results of the classifier show that the amino acid sequences are the epitopes, or not considering the amino acid sequences as the epitopes.

The "continuous search method" in the third step is a method in which any 8 amino acid residues in the primary sequence of each protein are used as the initial state s₀And the corresponding sequence is marked as X₁X₂…X₈By s₀The latter i residues are combined into action, i ═ 1,2, …, 12. According to the corresponding

Value and state s₀Selecting a first action and a second action, wherein a¹,a²Respectively, all possible actions in the first action and all possible actions in the second action, performing the action

Corresponding to the initial state s₀The first action, which is the selection of k residues from the next 12 consecutive residues to incorporate into this state, results in a first amino acid sequence of length k +8Column fragment X₁X₂…X₈…X_k+8Performing an action

Selecting the mth classifier in the second action set, wherein m is more than or equal to 1 and less than or equal to n, calculating the reward for the two actions by the tendency reward rule, calculating the cost function according to the formula (1), calculating the Q value according to the formula (2), and observing the next state s₁And meanwhile, updating the Q value according to the formula (3). Then, in

And state s₁Selection actions

And

calculating a cost function according to formula (1), calculating a Q value according to formula (2), and observing a next state s₂And meanwhile, updating the Q value according to the formula (3). Thereafter, training is performed in the same manner as described above.

The value function under each set of successive actions in the learning network is:

wherein ,V_π(s) is the cost function at state s, π is the strategy,

is that R is_tIs the yield of step t, V(s)_t+1) Is the next state s_t+1The following cost function.

The Q value calculation formula of each group of actions under the step t is as follows:

is the next state s_t+1Perform two consecutive actions

A cost function of (2).

The updating calculation method of the Q value comprises the following steps:

the tendency reward rule in the third step is calculated by the classification score given to the sequence by the classifier and the occurrence probability of the sequence. Extracting the features of the amino acid sequence searched for in the first action as the input of the classifier selected in the second action, calculating the classification score of the amino acid sequence by the classifier, for example, in step t, the score calculated by the classifier selected in the second action for the amino acid sequence obtained in the first action is recorded as SC_t. In the set EPT, the number of occurrences of each amino acid and the occurrence of pairs of amino acids comprising two consecutive amino acids are counted. For any one amino acid as_iProbability of occurrence

According to the formula

Calculation of, wherein as_iRepresents any one of the 20 amino acids, num (as)_i) Represents as_iThe number of times the EPT set occurs, maxnum (as)₁,as₂,…,as₂₀) Indicates the maximum number of occurrences of 20 amino acids in the pool EPT, minnum (as)₁,as₂,…,as₂₀) Represents the minimum number of occurrences of 20 amino acids in the set EPT. Any one amino acid pairAA_iProbability of occurrence of

According to the formula

Performing a calculation in which AA_iRepresents one of 400 amino acid pairs, num (AA)_i) Represents AA_iThe number of occurrences of the set EPT, maxnum (AA)₁,AA₂,…,AA₄₀₀) Represents the maximum number of occurrences of 400 amino acid pairs in the EPT pool, minnum (AA)₁,AA₂,…,AA₄₀₀) Represents the minimum number of occurrences of 400 amino acids in the set EPT.

The timely reward obtained by the amino acid sequence generated in step t is given by the formula

Calculation of len (sq) therein_t) Representing the sequence sq_tContains the number of amino acids, len (sq)_t) Indicates the number of consecutive amino acid pairs.

Compared with the traditional epitope prediction scheme based on a window method, the epitope prediction method based on continuous action search realizes autonomous selection of the epitope sequence, eliminates the influence of human factors, and improves the classification accuracy by introducing a complementary classifier.

The invention adopts a 'continuous action search method', which takes the number of amino acids in a common epitope sequence of 8-20 as a design basis and meets the calculation requirement of a complementary classifier under the condition of covering the length of the common epitope.

The invention adopts a tendency reward rule, takes the calculated score of the classifier on the search sequence and the statistical characteristics of the sequence into consideration, comprehensively judges the score of the classifier, the occurrence probability of the amino acid and the occurrence probability of the amino acid pair, and calculates the timely reward of two continuous actions in each state.

Claims

1. A method for predicting B cell epitope is characterized in that firstly B cell epitope sequence data are searched from an IEDB database to form a set EPT, and corresponding protein primary sequences are extracted from a uniport database to form a pre-training set PT; based on a Q _ learning algorithm, changing one action of the algorithm into two actions for training; in each epsilon, the Q-agent takes any 8 consecutive amino acid residues in the primary sequence of the protein as states, and selects k residues from the last 12 consecutive residues of each state to incorporate into that state as a first action; selecting one of n complementary classifiers as a second action option, searching in a primary protein sequence in PT according to a continuous action search method, giving instant reward to the searched amino acid sequence by a tendency reward rule, calculating a Q value and updating until the change of a value function is less than 1 percent, and finishing training; then, the amino acid sequence is searched in the primary protein sequence by using the trained strategy, and the selected classifier is used for classifying, so that the B cell epitope prediction is realized.

2. The method of predicting an epitope for a B cell according to claim 1, comprising the steps of:

b. taking any 8 continuous amino acid residues in the primary sequence of the protein as states, and selecting k residues from 12 continuous residues after each state to be incorporated into the state as a first action; selecting one of n complementary classifiers as a second action option, initializing Q values corresponding to all the states and actions to 0, setting the learning rate alpha to be an arbitrary number between 0 and 1, setting the discount factor gamma to be an arbitrary number between 0 and 1, setting the value of the epamode, and setting the initial state s₀Is any 8 amino acid residues of the pre-training set; c. in each epsilon, the Q-agent performs the proteins in the set PT according to a continuous motion search methodSearch in the primary sequence: in the tth step, the Q-agent selects an action from the first set of actions

Then selecting an action in the set of second actions

3. The method of claim 2, wherein the specific search process of the continuous motion search method is:

Value selection first action and second action, where a¹,a²All possible actions in the first action and all possible actions in the second action are referred to separately, and then the two actions are counted by the tendency reward ruleCalculating the reward and calculating the cost function according to the following formula:

wherein ,V_π(s) is the cost function at state s, π is the strategy,

the Q value was calculated as follows:

is the next state s_t+1Perform two consecutive actions

A cost function of (a);

the Q value is also updated as follows:

then changing the state, repeating the above steps, and updating the Q value.

4. The method of claim 3, wherein the trend reward algorithm is:

extraction of features from the amino acid sequence searched for in the first actionThe input of the classifier selected for the second action, the classifier calculates the classification score SC of the amino acid sequence_tIn the collective EPT, the probability of occurrence of each amino acid and the probability of occurrence of an amino acid pair comprising two consecutive amino acids are calculated, and any one of the amino acids as is_iProbability of occurrence

The calculation was performed as follows:

any one amino acid pair AA_iProbability of occurrence of

The calculation was performed as follows: