[go: up one dir, main page]

CN116504331A - A Frequency Score Prediction Method for Drug Side Effects Based on Multimodality and Multitask - Google Patents

A Frequency Score Prediction Method for Drug Side Effects Based on Multimodality and Multitask Download PDF

Info

Publication number
CN116504331A
CN116504331A CN202310479801.XA CN202310479801A CN116504331A CN 116504331 A CN116504331 A CN 116504331A CN 202310479801 A CN202310479801 A CN 202310479801A CN 116504331 A CN116504331 A CN 116504331A
Authority
CN
China
Prior art keywords
drug
side effects
matrix
similarity
similarity matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310479801.XA
Other languages
Chinese (zh)
Other versions
CN116504331B (en
Inventor
李洋
汪国华
刘武勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Forestry University
Original Assignee
Northeast Forestry University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Forestry University filed Critical Northeast Forestry University
Priority to CN202310479801.XA priority Critical patent/CN116504331B/en
Publication of CN116504331A publication Critical patent/CN116504331A/en
Application granted granted Critical
Publication of CN116504331B publication Critical patent/CN116504331B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/40Searching chemical structures or physicochemical data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

基于多模态和多任务的药物副作用的频率分数预测方法,本发明涉及深度学习技术预测药物副作用的频率分数方法。本发明的目的是为了解决现有的计算方法对药物和副作用关联关系判别准确率低,以及对药物和副作用的频率分数预测准确率低的问题。过程为:一、获得药物分子的化学结构语义特征、药物分子的化学序列语义特征、药物的生物医学文本特征和副作用的生物医学文本特征;得到药物副作用对;二、计算药物的相似性信息和副作用的相似性信息;得到药物副作用对;三、将学习到的药物副作用对串联送入多层感知机进行预测,预测药物和副作用间是否存在关联并且存在关联时药物和副作用的频率分数。本发明属于药物与副作用之间的频率预测技术领域。

A frequency score prediction method for drug side effects based on multi-modality and multi-task, the present invention relates to a frequency score method for predicting drug side effects by deep learning technology. The purpose of the present invention is to solve the problem of low accuracy rate of the existing calculation method for discriminating the relationship between drugs and side effects, and low accuracy rate of frequency score prediction for drugs and side effects. The process is: 1. Obtain the chemical structure semantic features of the drug molecule, the chemical sequence semantic features of the drug molecule, the biomedical text features of the drug and the biomedical text features of the side effects; obtain the drug side effect pair; 2. Calculate the similarity information of the drug and The similarity information of side effects; obtain the drug side effect pair; 3. Send the learned drug side effect pair into the multi-layer perceptron in series to predict whether there is a relationship between the drug and the side effect and the frequency score of the drug and the side effect when there is an association. The invention belongs to the technical field of frequency prediction between drugs and side effects.

Description

Frequency score prediction method for drug side effects based on multiple modes and multiple tasks
Technical Field
The invention belongs to the technical field of frequency prediction between medicines and side effects, and particularly relates to a frequency score method for predicting the side effects of medicines by a deep learning technology.
Background
The side effect is an unexpected reaction of the drug in the human body. Most of the side effects are harmful and may put a great burden on public health and even endanger life. Meanwhile, side effects are also a main cause of drug development failure, and in the process of drug development, researchers can conduct a large number of animal tests and clinical tests to determine the side effects of drugs in production. Therefore, at an early stage of drug development, it is important to identify possible side effects of the drug and solve problems related to safety. It is very time consuming and expensive to explore accurate drug side effect signatures at the experimental stage, and thus predicting side effects not found with new and existing drugs using existing signature information becomes a critical issue.
In recent years, with the development of calculation methods, many researchers have utilized calculation methods to predict side effects, which gives researchers a deeper understanding of the mechanism of drug side effect interactions, which is expected to guide the development of safer, more effective drugs. In addition, the side effects related to the medicine are identified by utilizing the computer technology, so that the screening success rate of medicine research and development is improved, and biological explanation is provided for medicine repositioning and medicine pathology development.
Disclosure of Invention
The invention aims to solve the problems that the existing calculation method has low accuracy in judging the association relation between the medicine and the side effect and has low accuracy in predicting the frequency fraction of the medicine and the side effect, and the frequency fraction predicting method of the association relation between the medicine and the side effect.
The frequency fraction prediction method based on the multi-mode and multi-task drug side effects comprises the following specific processes:
step one, obtaining chemical structure semantic features of drug molecules, chemical sequence semantic features of drug molecules, biomedical text features of drugs and biomedical text features of side effects;
obtaining a first drug side effect pair based on chemical structure semantic features of the drug molecules, chemical sequence semantic features of the drug molecules, biomedical text features of the drug and biomedical text features of side effects;
step two,
Step two, calculating the similarity information of the medicines and the similarity information of side effects through Jacquard similarity and cosine similarity, and mapping the similarity information of the medicines and the similarity information of the side effects to the same dimension;
the similarity information of the medicines is a medicine-disease similarity matrix and a medicine-medicine similarity matrix;
the similarity information of the side effects is a similarity matrix between the side effects, the word vectors between the side effects represent the similarity matrix between the medicine and the side effects;
step two, based on the similarity matrix between the medicine and the disease, the similarity matrix between the medicine and the similarity matrix between the side effects, the word vector between the side effects represents, and the similarity matrix between the medicine and the side effects obtains a second medicine side effect pair;
and thirdly, predicting the side effects of the medicines respectively learned in the first step and the second step by serially feeding the medicines into a multi-layer perceptron, and predicting whether the medicines are associated with the side effects and the frequency fraction of the medicines and the side effects when the medicines are associated with the side effects.
The beneficial effects of the invention are as follows:
most existing supervision models treat drug side effects as binary predictive tasks, excessively simplifying the complexity of drug-related side effects. Research on the frequency of side effects of drugs may provide a deeper explanation than otherwise possible. The work can predict the association between the two, and also predict the frequency score between the medicine and the side effect, thereby realizing the multi-task learning.
A method for predicting relevant side effect frequency scores by learning characteristics among different modes of a drug and similarity characteristics among drug side effects is provided. The model can implicitly establish complex relations among a plurality of modes by using different mode data and different deep learning methods for learning corresponding features. In addition, extraction and fusion of drug and side effect features using biomedical pre-trained models has been innovatively proposed. In order to improve the accuracy of the prediction score, a similarity characteristic interaction module is also designed, and local and fine-grained characteristics of drug side effect pairs are learned based on similarity information between drugs and side effects. The multi-modal feature learning method and the similarity feature learning method have great potential to be expanded to other tasks such as drug target association prediction and drug disease association prediction, and a novel research angle is provided in the field of association prediction research of bioinformatics.
Drawings
FIG. 1 is a schematic diagram of predicting the frequency of side effects of a drug based on fusing various external knowledge.
Detailed Description
The first embodiment is as follows: the frequency fraction prediction method based on the multi-mode and multi-task drug side effects in the embodiment comprises the following specific processes:
step one, obtaining chemical structure semantic features of drug molecules, chemical sequence semantic features of drug molecules, biomedical text features of drugs and biomedical text features of side effects;
obtaining a first drug side effect pair based on chemical structure semantic features of the drug molecules, chemical sequence semantic features of the drug molecules, biomedical text features of the drug and biomedical text features of side effects;
step two, a similarity feature interactive learning module:
compared with the extraction of information from the biochemical semantic information of abundant drugs and side effects, the similarity information between the drugs and the side effects can be learned, so that the deep relationship between the drugs and the side effects is obtained;
step two, calculating the similarity information of the medicines and the similarity information of side effects through Jacquard similarity and cosine similarity, and mapping the similarity information of the medicines and the similarity information of the side effects to the same dimension;
the similarity information of the medicines is a medicine-disease similarity matrix and a medicine-medicine similarity matrix;
the similarity information of the side effects is a similarity matrix between the side effects, the word vectors between the side effects represent the similarity matrix between the medicine and the side effects;
step two, based on the similarity matrix between the medicine and the disease, the similarity matrix between the medicine and the similarity matrix between the side effects, the word vector between the side effects represents, and the similarity matrix between the medicine and the side effects obtains a second medicine side effect pair;
step three, fusing two modules and predicting frequency;
and (3) predicting the side effects of the medicines respectively learned in the first step and the second step by serially feeding the medicines into a multi-layer perceptron, and predicting whether the medicines are associated with the side effects and the frequency fraction of the medicines and the side effects when the medicines are associated with the side effects.
The second embodiment is as follows: the first difference between this embodiment and the specific embodiment is that: obtaining chemical structure semantic features of drug molecules, chemical sequence semantic features of drug molecules, biomedical text features of drugs and biomedical text features of side effects in the first step;
obtaining a first drug side effect pair based on chemical structure semantic features of the drug molecules, chemical sequence semantic features of the drug molecules, biomedical text features of the drug and biomedical text features of side effects;
a multi-modal semantic representation learning module: the chemical structure semantics, chemical sequence semantics and biomedical semantic information of the drug molecules can all represent the biological properties of the drug, so that we learn the corresponding characterization from three modes of the drug;
the specific process is as follows:
step one, selecting a graph attention neural network GAT to process the drug molecules, and obtaining chemical structure semantic features (learning molecular graph representation) of the drug molecules;
step two, a transducer module is selected to process the drug molecules, so that chemical sequence semantic characteristics of the drug molecules are obtained;
step three, acquiring biomedical text characteristics of medicines and biomedical text characteristics of side effects;
step four, the chemical structure semantic features of the medicine molecules extracted in step one, the chemical sequence semantic features of the medicine molecules extracted in step two and the biomedical text features of the medicines extracted in step three are respectively reduced to the same dimension through the full-connection layer, so that the chemical structure semantic features of the medicine molecules after dimension reduction, the chemical sequence semantic features of the medicine molecules after dimension reduction, the biomedical text features of the medicines after dimension reduction and the biomedical text features of the side effects after dimension reduction are obtained;
step one five, and in order to obtain a fine-grained fusion between drug and side effects;
calculating the chemical structure semantic characteristics of the drug molecules after dimension reduction and the representation 1 of biomedical text characteristics of side effects by using element-level product operation;
calculating the chemical sequence semantic features of the drug molecules after dimension reduction and the representation 2 of biomedical text features of side effects by using the product operation of element levels;
calculating a representation 3 of biomedical text characteristics of the reduced-dimension drug and biomedical text characteristics of side effects using element-level product operations;
and adding the characterization 1, the characterization 2 and the characterization 3, and then sending the added characterization 1, the characterization 2 and the characterization 3 into a full-connection layer, wherein the output characteristics of the full-connection layer sequentially pass through an activation function and normalize the layers in batches to obtain a drug side effect pair learned by a first module.
Other steps and parameters are the same as in the first embodiment.
And a third specific embodiment: this embodiment differs from the first or second embodiment in that: selecting a graph attention neural network GAT to process the drug molecules one by one to obtain chemical structure semantic features (learning molecular graph representation) of the drug molecules; the specific process is as follows:
collecting the SMILES sequence of the drug molecule, and converting the SMILES sequence of the drug molecule into an undirected molecular graph G through an RDkit tool;
undirected molecular graph g= (V, E);
where V represents an atom set, shown as v= { C, H, O … Sr }, E represents a set of chemical bonds between atoms;
constructing a feature matrix of the drug molecules by using unique chemical properties of atoms to represent one-hot vectors;
constructing an adjacent matrix of the drug molecules by utilizing a two-dimensional structure of the drug molecules, wherein each atom of the drug is expressed as a node, if bonds exist in two atoms, setting rows and columns corresponding to the two atom nodes as 1 in a neighbor matrix, and setting the rows and columns corresponding to the two atom nodes as 0 if bonds exist between the two atoms; (each drug has an unique structure in a two-dimensional plane, each atom (C, H, O, etc.) in the structure learns the representation of each atom in the drug by polymerizing surrounding neighbors,
the characteristic matrix of the drug molecules and the adjacent matrix of the drug molecules are input into the graph attention neural network GAT, the graph attention neural network GAT output characteristics are input into the maximum pooling layer, and the maximum pooling layer outputs the chemical structure semantic characteristics (learning molecular graph representation) of the drug molecules.
Other steps and parameters are the same as in the first or second embodiment.
The specific embodiment IV is as follows: this embodiment differs from one of the first to third embodiments in that: in the first step, a transducer module is selected to process the drug molecules, so that chemical sequence semantic characteristics of the drug molecules are obtained; the specific process is as follows:
obtaining sub-sequences in an existing corpus (using the corpus provided in the MolTrans paper of yellow et al 2021, which works to mine millions of drug molecular sequences from multiple unlabeled data sources to extract high frequency drug sub-sequences from all molecular sequences, thereby extracting a sub-sequence corpus)
Collecting the SMILES sequence of the drug molecule;
decomposing the SMILES sequence of a drug molecule into subsequences by BPE algorithm and corpus(after a molecular sequence is subjected to BPE algorithm, the whole molecular sequence is divided according to the subsequence with high frequency of occurrence in the corpus, so that every subsequence with reasonable division is obtained, the atomic number of some medicine molecules is up to five-six hundred, the chemical sequence representing the medicine is overlong, and the medicine is used for general purpose)The general deep learning method is not good in understanding the semantic effect of a long sequence, and the effect of the drug in the human body is not a single element, but is removed in the form of a functional group, so that the decomposition of the sequence into individual functional groups is of practical significance. Word lists of frequent subsequences of counted functional groups are utilized. Most frequent subsequence word segmentation of sequences of drug molecules in a dataset
Wherein d is i A SMILES sequence that is the i-th drug molecule;
s j SMILES sequence d for the jth drug molecule i Is selected from the group consisting of a sub-sequence of (a),
then sending the subsequence into a transducer module, and extracting chemical sequence semantic features of the drug molecules (the chemical sequence semantics of the drug molecules are numerical values) through multi-head attention, residual error connection, regularization and a last linear layer; the specific process is as follows:
setting up
a 1 =ma 2
MultiHead(Q,K,V)=Concat(h 1 ,…,h m )W
Wherein Attention represents Attention weight; q represents a matrix to be queried, K is an index matrix, and V is a matrix obtained after weighting according to the attention weight; multiHead represents a matrix obtained by concatenating m attention headers; concat means that the multi-head attention mechanism results are spliced; w represents a learnable parameter matrix; h is a m Representing the result of the mth attention head learning; a, a 1 Representing dimension size, a 2 Representing the feature dimension of the setting;is a parameter matrix, m is the number of attention heads; softmax represents the mapping of the inner product of Q and K to [0,1 ]]Probability distribution between, representing the attention weight; t represents a transpose;Is the vector dimension.
Other steps and parameters are the same as in one to three embodiments.
Fifth embodiment: this embodiment differs from one to four embodiments in that: acquiring biomedical text characteristics of the medicine and biomedical text characteristics of side effects in the step one; the specific process is as follows:
collecting biomedical text (such as Alprostadil is a medication used to treat erectile dysfunction) of the drug from WIKI or PubChem;
collecting biomedical text (e.g., ascites is the abnormal build-up of fluid in the abdomen) of the set of side effects from WIKI or PubCHem;
in order to avoid possible data leakage, there is no interaction between the drug and the side effects in the collected biomedical text information. For example, the inability to allow "etoposide to frequently cause nausea, vomiting, and about failure" in collected biomedical text data;
respectively inputting biomedical text information of the medicine and biomedical text information of side effects into a BioBert pre-training model in the biological field to extract biomedical text characteristics of the medicine and the side effects; expressed as:
where N is the number of drugs or side effects and f is the output dimension of the BioBert pre-training model; r is a real number, and the R is a real number,biomedical article being a drugThe present feature is->Is biomedical text feature of side effects;Is->The medical text of the individual drugs is presented,is->Medical text for individual side effects; BERT is a BioBert pre-training model.
Other steps and parameters are the same as those of embodiments one to four to one.
Specific embodiment six: this embodiment differs from one of the first to fifth embodiments in that: the step one five expression is:
the fine-grained fusion of the drug and side effects is extracted by the elemental-level product operation to capture the deep and comprehensive relationship between drug and side effects.
Where σ is the activation function, W is the matrix of learnable parameters,is the chemical structural semantic feature of the drug molecule, +.>Is the chemical sequence semantic feature of the drug molecule, +.>Is a medicineBiomedical text character of the substance, +.A is expressed as p +.A-> Respectively and->Element-level multiplication operation is performed between every two (for +.>And->Performing element-level product operation between every two pairs, and performing +.>And->Performing element-level product operation between every two pairs, and performing +.>And->Element-level product operation between two pairs), and +.>Biomedical text features, P, which are side effects 1 Is the first pair of module learned drug side effects, sum is the vector addition operation.
Other steps and parameters are the same as those of embodiments one to five to one.
Seventh embodiment: this embodiment differs from one of the first to sixth embodiments in that: the process for obtaining the similarity matrix between the medicine and the disease in the second step is as follows:
extracting association relations between medicines and diseases from a Comparative Toxicology Database (CTD) (330397 association relations between all medicines and 6808 diseases are in the comparative toxicology database), obtaining a medicine-disease association matrix based on the association relations between medicines and diseases, and performing cosine similarity calculation on the medicine-disease association matrix to obtain a medicine-disease similarity matrix;
for example, i collect 6808 diseases and have 750 drugs in total, i can construct a matrix of 750 x 6808, so i can fill in the 330397 association relations according to the abscissa, and only obtain an association matrix between the drugs and the diseases, and cosine operation is performed on the matrix to obtain a similarity matrix of 750 x 750.
The medicine-medicine similarity matrix obtaining process in the step two is as follows:
querying a drug-drug similarity score through a STITCT database;
the similarity score of each group of medicines is 0-1000, and then the score of 0-1000 is compressed to 0-1 in the same proportion;
the drug-drug similarity scores of all groups constitute a drug-drug similarity matrix.
For example, i collect similarity scores of m drugs and m drugs, i can construct an m×m matrix, and the compressed similarity scores are input into the matrix.
Other steps and parameters are the same as in one of the first to sixth embodiments.
Eighth embodiment: this embodiment differs from one of the first to seventh embodiments in that: the similarity matrix acquisition process between the side effects in the step two is as follows:
obtaining side effect information from the ADReCS database (each side effect is a node on the tree, the side effect information is a node position);
the adrcs database is defined as a four-level tree dataset, each Adverse Drug Reaction (ADR) item being assigned a unique ID; for example, in the adrcs dataset, the ID of polycythemia is 14.12.01.002.
Constructing a matrix based on the obtained side effect information;
if the side effects have no common father node, the similarity of the side effects is 0;
if pairwise side effects have a common parent node, the pairwise side effects are similar to μ (μ=0.5);
if parent nodes of side effects are at a higher level, the similarity between side effects is μ 22 =0.5×0.5=0.25);
Cycling until the similarity among all the side effects is calculated;
filling the similarity among all the side effects into a matrix to obtain a similarity matrix among the side effects;
for example, n kinds of side effect information are collected, so that an n×n matrix can be constructed;
the similarity matrix between the side effects is calculated by node positions of the side effects in the four-level tree data.
The word vector representation acquisition process between the side effects in the step two is as follows:
obtaining a data set consisting of q side effect words;
inputting each side effect word in the data set into a trained Glove model to output p-dimensional features;
inputting a trained Glove model to q side effect words to obtain a p multiplied by q feature matrix;
cosine similarity calculation is carried out on the p multiplied by q feature matrix, so that word vector representation among side effects is obtained;
for example, 1000 side effect words are input into a trained Glove model, each side effect outputs a 300-dimensional feature, a feature matrix of 1000 x 300 is obtained, and a cosine operation is performed to obtain a similarity matrix of 1000 x 1000.
The similarity matrix acquisition process between the medicine and the side effect in the step two is as follows:
extracting the association relation between the medicine and the side effect from the training set, and obtaining a medicine-side effect association matrix based on the association relation between the medicine and the side effect;
transposed medicine-side effect incidence matrix, and then performing cosine similarity calculation to obtain similarity matrix between medicine and side effect;
for example, c medicines and d side effects are collected, a c×d matrix can be constructed, so that the association relation can be filled in according to the abscissa and the ordinate to obtain an association matrix of the medicines and the side effects, and cosine similarity calculation is performed after the matrix is transposed to obtain a d×d similarity matrix;
thus two drug characterization and three side effect characterization were obtained;
other steps and parameters are the same as those of one of the first to seventh embodiments.
Detailed description nine: this embodiment differs from one to eight of the embodiments in that: in the second step, based on the similarity matrix between medicines and diseases, the similarity matrix between medicines and the similarity matrix between side effects, the word vector between the side effects represents, and the similarity matrix between medicines and the side effects obtains a second medicine side effect pair; the specific process is as follows:
step two and one,
Each row of the drug-disease similarity matrix is a feature;
each row of the drug-drug similarity matrix is a feature;
each row of the similarity matrix between side effects is a feature;
one feature for each behavior represented by a word vector between side effects;
each row of the similarity matrix between the drug and the side effect is a feature;
carrying out outer product operation on all features of the similarity matrix between the medicine and the disease and each feature in the similarity matrix between the side effects, word vector representation between the side effects and the similarity matrix between the medicine and the side effects to obtain 3 matrices;
carrying out outer product operation on all features of the drug-drug similarity matrix and each feature in the similarity matrix between side effects, word vector representation between side effects and the similarity matrix between drugs and side effects respectively to obtain 3 matrices;
performing outer product operation two by two to obtain a multi-channel matrix;
6 matrices are input into a two-dimensional convolutional neural network to learn deep representation of drugs and side effects;
the expression is:
wherein the method comprises the steps ofIs the nth drug-inter-disease similarity matrix or the ith row of the drug-inter-drug similarity matrix,/->Is the similarity matrix between the m-th side effects, the word vector representation between the side effects or the j-th line of the similarity matrix between the drug and the side effects, prot is the vector outer product operation, +.>Is a drug side effect pair; CNN is a two-dimensional convolutional neural network;
step two by two,
Each row of the drug-disease similarity matrix is a feature;
each row of the drug-drug similarity matrix is a feature;
each row of the similarity matrix between side effects is a feature;
one feature for each behavior represented by a word vector between side effects;
each row of the similarity matrix between the drug and the side effect is a feature;
carrying out element-level multiplication on all features of the similarity matrix between the medicine and the disease and each feature in the similarity matrix between the side effects, word vector representation between the side effects and the similarity matrix between the medicine and the side effects to obtain 3 vectors;
carrying out element-level multiplication on all features of the drug-drug similarity matrix, the similarity matrix between side effects, word vector representation between side effects and each feature in the similarity matrix between the drug and the side effects to obtain 3 vectors;
adding and inputting the 6 vectors into a fully-connected network to extract fusion characteristics with fine granularity;
the expression is:
wherein the method comprises the steps ofIs the nth drug-inter-disease similarity matrix or the ith row of the drug-inter-drug similarity matrix,/->Is the similarity matrix between the mth side effects, the word vector representation between the side effects or the j-th line of the similarity matrix between the drug and the side effects, +.,is a drug side effect pair;
step two, two and three,
Two drug side effects are serially connected to a fully-connected neural network
Where I is the parameter matrix representing the join operation, W is the parameter matrix that can be learned, P 2 Is a second drug side effect pair;
for example, a characteristic dimension is 1×200, and a characteristic dimension is also 1×200, and the concatenation becomes 1×400.
Other steps and parameters are the same as in one to eight of the embodiments.
Detailed description ten: this embodiment differs from one of the embodiments one to nine in that: in the third step, the side effects of the medicines learned in the first step and the second step are sent into a multi-layer perceptron in series to be predicted, whether the medicines are associated with the side effects or not is predicted, and the frequency scores of the medicines and the side effects are calculated when the medicines are associated with the side effects; the expression is:
y=MLP(P 1 ||P 2 )
wherein MLP is a multi-layer perceptron; y outputs an association score and a frequency score between drug side effect pairs.
Other steps and parameters are the same as in one of the first to ninth embodiments.
Experimental performance statistical measures of the error between the true and predicted samples were evaluated by using AUROC (area under Roc curve), AUPR (area under PR curve), RMSE (root mean square error) and MAE (mean absolute error)) as evaluation measures.
Table one: verification method test results
The present invention is capable of other and further embodiments and its several details are capable of modification and variation in light of the present invention, as will be apparent to those skilled in the art, without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1.基于多模态和多任务的药物副作用的频率分数预测方法,其特征在于:所述方法具体过程为:1. A frequency fraction prediction method for drug side effects based on multimodal and multi-task approaches, characterized in that: the specific process of the method is as follows: 步骤一、获得药物分子的化学结构语义特征、药物分子的化学序列语义特征、药物的生物医学文本特征和副作用的生物医学文本特征;Step 1: Obtain the semantic features of the chemical structure of the drug molecule, the semantic features of the chemical sequence of the drug molecule, the biomedical text features of the drug, and the biomedical text features of the side effects. 基于药物分子的化学结构语义特征、药物分子的化学序列语义特征、药物的生物医学文本特征和副作用的生物医学文本特征得到第一个药物副作用对;The first drug side effect pair is obtained based on the semantic features of the chemical structure of the drug molecule, the semantic features of the chemical sequence of the drug molecule, the biomedical text features of the drug, and the biomedical text features of the side effects. 步骤二、Step Two 步骤二一、通过杰卡德相似性和余弦相似性计算药物的相似性信息和副作用的相似性信息,并将药物的相似性信息和副作用的相似性信息映射到同一维度;Step 2: Calculate the similarity information of drugs and the similarity information of side effects using Jaccard similarity and cosine similarity, and map the similarity information of drugs and the similarity information of side effects to the same dimension. 所述药物的相似性信息为药物-疾病间相似性矩阵,药物-药物间相似性矩阵;The drug similarity information is a drug-disease similarity matrix and a drug-drug similarity matrix; 副作用的相似性信息为副作用间的相似性矩阵,副作用间的词向量表示,药物和副作用间的相似性矩阵;The similarity information for side effects includes a similarity matrix between side effects, word vector representations between side effects, and a similarity matrix between drugs and side effects. 步骤二二、基于药物-疾病间相似性矩阵,药物-药物间相似性矩阵、副作用间的相似性矩阵,副作用间的词向量表示,药物和副作用间的相似性矩阵得到第二个药物副作用对;Step 22: Based on the drug-disease similarity matrix, drug-drug similarity matrix, side effect similarity matrix, word vector representation of side effects, and drug-side effect similarity matrix, the second drug-side effect pair is obtained. 步骤三、将步骤一和步骤二分别学习到的药物副作用对串联送入多层感知机进行预测,预测药物和副作用间是否存在关联并且存在关联时药物和副作用的频率分数。Step 3: The drug side effects learned in Step 1 and Step 2 are fed into a multilayer perceptron for prediction. The prediction is to determine whether there is a correlation between the drug and the side effects, and if there is a correlation, the frequency scores of the drug and the side effects. 2.根据权利要求1所述的基于多模态和多任务的药物副作用的频率分数预测方法,其特征在于:所述步骤一中获得药物分子的化学结构语义特征、药物分子的化学序列语义特征、药物的生物医学文本特征和副作用的生物医学文本特征;2. The frequency score prediction method for drug side effects based on multimodal and multi-task methods according to claim 1, characterized in that: in step one, the chemical structure semantic features of drug molecules, the chemical sequence semantic features of drug molecules, the biomedical text features of drugs, and the biomedical text features of side effects are obtained. 基于药物分子的化学结构语义特征、药物分子的化学序列语义特征、药物的生物医学文本特征和副作用的生物医学文本特征得到第一个药物副作用对;The first drug side effect pair is obtained based on the semantic features of the chemical structure of the drug molecule, the semantic features of the chemical sequence of the drug molecule, the biomedical text features of the drug, and the biomedical text features of the side effects. 具体过程为:The specific process is as follows: 步骤一一、选择图注意力神经网络GAT对药物分子进行处理,获得药物分子的化学结构语义特征;Step 1: Select the Graph Attention Neural Network (GAT) to process the drug molecule and obtain the semantic features of the drug molecule's chemical structure. 步骤一二、选择Transformer模块对药物分子进行处理,获得药物分子的化学序列语义特征;Steps 1 and 2: Select the Transformer module to process the drug molecule and obtain the semantic features of the drug molecule's chemical sequence. 步骤一三、获取药物的生物医学文本特征和副作用的生物医学文本特征;Step 13: Obtain the biomedical textual features of the drug and the biomedical textual features of its side effects; 步骤一四、将步骤一一提取的药物分子的化学结构语义特征、步骤一二提取的药物分子的化学序列语义特征和步骤一三提取的药物的生物医学文本特征和副作用的生物医学文本特征分别通过全连接层降维到同一维度,得到降维后的药物分子的化学结构语义特征、降维后的药物分子的化学序列语义特征和降维后的药物的生物医学文本特征和降维后的副作用的生物医学文本特征;Step 14: The chemical structure semantic features of the drug molecule extracted in Step 11, the chemical sequence semantic features of the drug molecule extracted in Step 12, and the biomedical text features and biomedical text features of the drug and its side effects extracted in Step 13 are reduced to the same dimension through a fully connected layer to obtain the dimensionality-reduced chemical structure semantic features, chemical sequence semantic features, biomedical text features, and biomedical text features of the drug and its side effects. 步骤一五、Step 15 使用元素级的乘积操作计算降维后的药物分子的化学结构语义特征和副作用的生物医学文本特征的表征1;Characterization of the chemical structure semantic features and side effects of biomedical text features of drug molecules after dimensionality reduction using element-level product operations 1; 使用元素级的乘积操作计算降维后的药物分子的化学序列语义特征和副作用的生物医学文本特征的表征2;Characterization of chemical sequence semantic features and biomedical text features of side effects of drug molecules using element-level product operations 2; 使用元素级的乘积操作计算降维后的药物的生物医学文本特征和副作用的生物医学文本特征的表征3;The biomedical text features of drugs and the biomedical text features of side effects are represented by element-level product operations after dimensionality reduction. 将表征1、表征2、表征3加和后送入全连接层,全连接层输出特征依次经过激活函数,批量归一化层,得到第一个模块学习的药物副作用对。The summation of representations 1, 2, and 3 is fed into a fully connected layer. The output features of the fully connected layer are then passed through an activation function and a batch normalization layer to obtain the drug side effect pairs learned by the first module. 3.根据权利要求2所述的基于多模态和多任务的药物副作用的频率分数预测方法,其特征在于:所述步骤一一中选择图注意力神经网络GAT对药物分子进行处理,获得药物分子的化学结构语义特征;具体过程为:3. The frequency score prediction method for drug side effects based on multimodal and multitasking as described in claim 2, characterized in that: in step one-to-one, a graph attention neural network (GAT) is selected to process the drug molecule to obtain the semantic features of the drug molecule's chemical structure; the specific process is as follows: 收集药物分子的SMILES序列,通过RDKit工具将药物分子的SMILES序列转换为无向分子图G;Collect the SMILES sequences of drug molecules and convert them into an undirected molecular graph G using the RDKit tool; 无向分子图G=(V,E);Undirected molecular graph G = (V, E); 其中V表示原子集合,E表示原子之间的化学键集合;Where V represents the set of atoms, and E represents the set of chemical bonds between atoms; 利用原子的独热向量构建药物分子的特征矩阵;The feature matrix of drug molecules is constructed using the unique thermal vectors of atoms; 利用药物分子的二维结构构建出药物分子的邻接矩阵,药物的每个原子表示为一个节点,如果两个原子中有键存在,则在邻居矩阵中将两个原子节点对应的行和列设置为1,两个原子间没有键则将两个原子节点对应的行和列设置为0;The adjacency matrix of the drug molecule is constructed using the two-dimensional structure of the drug molecule. Each atom of the drug is represented as a node. If there is a bond between two atoms, the row and column corresponding to the two atom nodes are set to 1 in the neighbor matrix. If there is no bond between two atoms, the row and column corresponding to the two atom nodes are set to 0. 将药物分子的特征矩阵和药物分子的邻接矩阵输入图注意力神经网络GAT,图注意力神经网络GAT输出特征输入最大池化层,最大池化层输出药物分子的化学结构语义特征。The feature matrix and adjacency matrix of the drug molecule are input into the graph attention neural network (GAT). The output features of the GAT are input into the max pooling layer, and the max pooling layer outputs the chemical structure semantic features of the drug molecule. 4.根据权利要求3所述的基于多模态和多任务的药物副作用的频率分数预测方法,其特征在于:所述所述步骤一二中选择Transformer模块对药物分子进行处理,获得药物分子的化学序列语义特征;具体过程为:4. The frequency score prediction method for drug side effects based on multimodal and multi-task approaches according to claim 3, characterized in that: in steps one and two, the Transformer module is selected to process the drug molecule to obtain the chemical sequence semantic features of the drug molecule; the specific process is as follows: 获取现有语料库中子序列;Obtain subsequences from an existing corpus; 收集药物分子的SMILES序列;Collect the SMILES sequences of drug molecules; 通过BPE算法和语料库将药物分子的SMILES序列分解成子序列 The SMILES sequence of a drug molecule was decomposed into subsequences using the BPE algorithm and a corpus. 其中di为第i个药物分子的SMILES序列;Where d <sub>i </sub> is the SMILES sequence of the i-th drug molecule; sj为第j个药物分子的SMILES序列di的子序列, s <sub>j </sub> is a subsequence of the SMILES sequence d <sub>i</sub> of the j-th drug molecule. 接着将子序列送入Transformer模块中,提取药物分子的化学序列语义特征;具体过程为:Next, the subsequence is fed into the Transformer module to extract the chemical sequence semantic features of the drug molecule; the specific process is as follows: 设定set up a1=ma2 a 1 = ma 2 MultiHead(Q,K,V)=Concat(h1,…,hm)WMultiHead(Q,K,V)=Concat(h 1 ,...,h m )W 其中,Attention表示注意力权重;Q表示要查询矩阵,K是索引矩阵,V是根据注意力权重加权后得到的矩阵;MultiHead表示拼接m个注意力头得到的矩阵;Concat表示将多头注意力机制结果进行拼接;W表示可学习的参数矩阵;hm表示第m个注意力头学习的结果;a1表示维度大小,a2表示设置的特征维度;是参数矩阵,m是注意力头的数量;softmax表示将Q和K的内积映射到[0,1]之间的概率分布,表示注意力权重;T表示转置;为向量维度。Where Attention represents attention weights; Q represents the query matrix; K is the index matrix; V is the matrix obtained by weighting according to the attention weights; MultiHead represents the matrix obtained by concatenating m attention heads; Concat represents concatenating the results of the multi-head attention mechanism; W represents the learnable parameter matrix; hm represents the learning result of the m-th attention head; a1 represents the dimension size; and a2 represents the set feature dimension. It is a parameter matrix, where m is the number of attention heads; softmax represents mapping the inner product of Q and K to a probability distribution between [0,1], and represents the attention weights; T represents the transpose; For vector dimensions. 5.根据权利要求4所述的基于多模态和多任务的药物副作用的频率分数预测方法,其特征在于:所述步骤一三中获取药物的生物医学文本特征和副作用的生物医学文本特征;具体过程为:5. The method for predicting the frequency score of drug side effects based on multimodal and multi-task approaches according to claim 4, characterized in that: in steps one and three, the biomedical text features of the drug and the biomedical text features of the side effects are obtained; the specific process is as follows: 从WIKI或PubChem收集药物的生物医学文本;Collect biomedical texts of drugs from WIKI or PubChem; 从WIKI或PubChem收集副作用集的生物医学文本;Collect biomedical texts of side effect sets from WIKI or PubChem; 将药物的生物医学文本信息和副作用的生物医学文本信息分别输入BioBert预训练模型提取药物和副作用的生物医学文本特征;表示为:The biomedical text information of the drug and its side effects are respectively input into the BioBert pre-trained model to extract the biomedical text features of the drug and its side effects; represented as: 其中N是药物或副作用的数量,f是BioBert预训练模型的输出维度;R为实数,是药物的生物医学文本特征,是副作用的生物医学文本特征;是第个药物的医学文本,是第个副作用的医学文本;BERT为BioBert预训练模型。Where N is the number of drugs or side effects, f is the output dimension of the BioBert pre-trained model, and R is a real number. These are the biomedical textual characteristics of drugs. These are biomedical textual features related to side effects; It is the first The medical text of a drug, It is the first Medical texts about side effects; BERT is a pre-trained BioBert model. 6.根据权利要求5所述的基于多模态和多任务的药物副作用的频率分数预测方法,其特征在于:所述步骤一五表达式为:6. The method for predicting the frequency fraction of drug side effects based on multimodal and multitasking according to claim 5, characterized in that: the expression for step one five is: 其中σ是激活函数,W是可学习的参数矩阵,是药物分子的化学结构语义特征,是药物分子的化学序列语义特征,是药物的生物医学文本特征,⊙是表示对 分别和两两之间做元素级乘积操作,是副作用的生物医学文本特征,P1是第一个模块学习的药物副作用对,sum是向量的加法操作。Where σ is the activation function and W is the learnable parameter matrix. These are the semantic features of the chemical structure of drug molecules. These are the semantic features of the chemical sequence of drug molecules. These are biomedical textual features of drugs; ⊙ indicates... Separately and Perform element-wise multiplication between pairs of elements. P1 represents the biomedical text features of side effects, P1 represents the drug side effect pairs learned in the first module, and sum represents the vector addition operation. 7.根据权利要求6所述的基于多模态和多任务的药物副作用的频率分数预测方法,其特征在于:所述步骤二一中药物-疾病间相似性矩阵获取过程为:7. The method for predicting the frequency score of drug side effects based on multimodal and multi-task approaches according to claim 6, characterized in that: the process of obtaining the drug-disease similarity matrix in step two is as follows: 从比较毒理学基因组学数据库中提取药物-疾病之间的关联关系,基于药物-疾病之间的关联关系获得一个药物-疾病关联矩阵,对药物-疾病关联矩阵做余弦相似度计算,得到药物-疾病间相似性矩阵;Drug-disease associations are extracted from comparative toxicology genomics databases. A drug-disease association matrix is obtained based on these associations. Cosine similarity is calculated on the drug-disease association matrix to obtain a drug-disease similarity matrix. 所述步骤二一中药物-药物间相似性矩阵获取过程为:The process of obtaining the drug-drug similarity matrix in step two is as follows: 通过STITCT数据库查询药物-药物的相似性分数;Query drug-drug similarity scores using the STITCT database; 每组药物-药物的相似性分数为0-1000,随后将0-1000的分数以相同的比例圧缩到0-1之间;Each drug-drug similarity score was set from 0 to 1000, and then the scores from 0 to 1 were compressed to between 0 and 1 by the same ratio. 所有组药物-药物的相似性分数构成药物-药物间相似性矩阵。The drug-drug similarity scores of all groups constitute a drug-drug similarity matrix. 8.根据权利要求7所述的基于多模态和多任务的药物副作用的频率分数预测方法,其特征在于:所述步骤二一中副作用间的相似性矩阵获取过程为:8. The method for predicting the frequency score of drug side effects based on multimodal and multi-task approaches according to claim 7, characterized in that: the process of obtaining the similarity matrix among side effects in step two-one is as follows: 从ADReCS数据库获取副作用信息;Retrieve side effect information from the ADReCS database; 基于获取的副作用信息构建矩阵;A matrix is constructed based on the acquired side effect information; 如果两两副作用没有共同的父节点,则两两副作用相似度为0;If no two side effects have a common parent node, then the similarity between the two side effects is 0. 如果两两副作用有一个共同的父节点,则两两副作用相似度为μ;If any pair of side effects have a common parent node, then the similarity between the pair of side effects is μ. 如果两两副作用的父节点在更上一级,则两两副作用间的相似度为μ2If the parent node of each pair of side effects is at a higher level, then the similarity between the pair of side effects is μ2 . 循环往复,直到计算完所有的副作用间的相似度;This process is repeated until the similarity between all side effects has been calculated. 将所有的副作用间的相似度填入矩阵中获得副作用间的相似性矩阵;Fill the matrix with the similarity scores of all side effects to obtain the similarity matrix between side effects; 所述步骤二一中副作用间的词向量表示获取过程为:The process of obtaining the word vector representation of side effects in step two-one is as follows: 获得由q个副作用词组成的数据集;Obtain a dataset consisting of q side effect words; 将数据集中每个副作用词输入训练好的Glove模型输出p维特征;Input each side effect word in the dataset into the trained GloVe model to output p-dimensional features; 共有q个副作用词输入训练好的Glove模型,得到p×q的特征矩阵;A total of q side effect words are input into the trained GloVe model to obtain a p×q feature matrix; 对p×q的特征矩阵做余弦相似度计算,得到副作用间的词向量表示;Cosine similarity is calculated on the p×q feature matrix to obtain word vector representations between side effects; 所述步骤二一中药物和副作用间的相似性矩阵获取过程为:The process of obtaining the similarity matrix between drugs and side effects in step two is as follows: 从训练集中提取药物-副作用之间的关联关系,基于药物-副作用之间的关联关系获得一个药物-副作用关联矩阵;Extract the associations between drugs and side effects from the training set, and obtain a drug-side effect association matrix based on these associations; 对药物-副作用关联矩阵进行转置后做余弦相似度计算,得到药物和副作用间的相似性矩阵。After transposing the drug-side effect correlation matrix, cosine similarity calculation is performed to obtain the similarity matrix between drugs and side effects. 9.根据权利要求8所述的基于多模态和多任务的药物副作用的频率分数预测方法,其特征在于:所述步骤二二中基于药物-疾病间相似性矩阵,药物-药物间相似性矩阵、副作用间的相似性矩阵,副作用间的词向量表示,药物和副作用间的相似性矩阵得到第二个药物副作用对;具体过程为:9. The frequency score prediction method for drug side effects based on multimodal and multi-task approaches according to claim 8, characterized in that: in step two, a second drug-side effect pair is obtained based on the drug-disease similarity matrix, the drug-drug similarity matrix, the side effect similarity matrix, the word vector representation of side effects, and the drug-side effect similarity matrix; the specific process is as follows: 步骤二二一、Step Two Two One, 药物-疾病间相似性矩阵的每一行为一个特征;Each row of the drug-disease similarity matrix represents a feature; 药物-药物间相似性矩阵的每一行为一个特征;Each row of the drug-drug similarity matrix represents a feature; 副作用间的相似性矩阵的每一行为一个特征;Each row of the similarity matrix between side effects represents a feature; 副作用间的词向量表示的每一行为一个特征;Each line of the word vector representation between side effects is a feature; 药物和副作用间的相似性矩阵的每一行为一个特征;Each row of the similarity matrix between drugs and side effects represents a feature; 将药物-疾病间相似性矩阵的所有特征分别和副作用间的相似性矩阵、副作用间的词向量表示、药物和副作用间的相似性矩阵中每一个特征进行外积操作,得到3个矩阵;The drug-disease similarity matrix is multiplied by each feature of the side effect similarity matrix, the word vector representation of side effects, and the drug-side effect similarity matrix to obtain three matrices. 将药物-药物间相似性矩阵的所有特征分别和副作用间的相似性矩阵、副作用间的词向量表示、药物和副作用间的相似性矩阵中每一个特征进行外积操作,得到3个矩阵;Perform an outer product operation on each feature of the drug-drug similarity matrix and each feature of the side effect similarity matrix, the word vector representation of side effects, and the drug-side effect similarity matrix to obtain three matrices; 6个矩阵输入二维卷积神经网络学习药物和副作用的深层表示;A two-dimensional convolutional neural network is used to learn deep representations of drugs and their side effects by taking six matrix inputs. 表达式为:The expression is: 其中是第n个药物-疾病间相似性矩阵或药物-药物间相似性矩阵的第i行,是第m个副作用间的相似性矩阵、副作用间的词向量表示或药物和副作用间的相似性矩阵的第j行,Prot是向量外积操作,是药物副作用对;CNN是二维卷积神经网络;in It is the i-th row of the nth drug-disease similarity matrix or drug-drug similarity matrix. This represents the similarity matrix between the m-th side effects, the word vector representations of side effects, or the j-th row of the similarity matrix between drugs and side effects. Prot is the vector outer product operation. It refers to the side effects of medication; CNN is a two-dimensional convolutional neural network; 步骤二二二、Step Two Two Two, 药物-疾病间相似性矩阵的每一行为一个特征;Each row of the drug-disease similarity matrix represents a feature; 药物-药物间相似性矩阵的每一行为一个特征;Each row of the drug-drug similarity matrix represents a feature; 副作用间的相似性矩阵的每一行为一个特征;Each row of the similarity matrix between side effects represents a feature; 副作用间的词向量表示的每一行为一个特征;Each line of the word vector representation between side effects is a feature; 药物和副作用间的相似性矩阵的每一行为一个特征;Each row of the similarity matrix between drugs and side effects represents a feature; 将药物-疾病间相似性矩阵的所有特征分别和副作用间的相似性矩阵、副作用间的词向量表示、药物和副作用间的相似性矩阵中每一个特征进行元素级的乘积,得到3个向量;Element-wise multiplication is performed on all features of the drug-disease similarity matrix with each feature of the side effect similarity matrix, the word vector representation of side effects, and the drug-side effect similarity matrix to obtain three vectors. 将药物-药物间相似性矩阵的所有特征分别和副作用间的相似性矩阵、副作用间的词向量表示、药物和副作用间的相似性矩阵中每一个特征进行元素级的乘积,得到3个向量;Element-wise multiplication is performed on each feature of the drug-drug similarity matrix with each feature of the side-effect similarity matrix, the word vector representation of side effects, and the drug-side-effect similarity matrix, resulting in three vectors. 6个向量进行加和输入全连接网络提取细粒度的融合特征;The six vectors are summed and input into a fully connected network to extract fine-grained fusion features; 表达式为:The expression is: 其中是第n个药物-疾病间相似性矩阵或药物-药物间相似性矩阵的第i行,是第m个副作用间的相似性矩阵、副作用间的词向量表示或药物和副作用间的相似性矩阵的第j行,⊙是元素级的乘积,sum是向量的加法操作,W是可学习的参数矩阵,σ是激活函数,是药物副作用对;in It is the i-th row of the nth drug-disease similarity matrix or drug-drug similarity matrix. Let be the similarity matrix between the m-th side effects, the word vector representation of side effects, or the j-th row of the similarity matrix between drugs and side effects. ⊙ represents element-wise multiplication, sum is vector addition, W is the learnable parameter matrix, and σ is the activation function. It's a side effect of the drug; 步骤二二三、Steps two, two, three 两个药物副作用对串联送入全连接神经网络 The side effects of two drugs are fed in series into a fully connected neural network. 其中||是表示连接操作,W是可学习的参数矩阵,P2是第二个药物副作用对。Where || denotes the join operation, W is the learnable parameter matrix, and P2 is the second drug side effect pair. 10.根据权利要求9所述的基于多模态和多任务的药物副作用的频率分数预测方法,其特征在于:所述步骤三中将步骤一和步骤二分别学习到的药物副作用对串联送入多层感知机进行预测,预测药物和副作用间是否存在关联并且存在关联时药物和副作用的频率分数是多少;表达式为:10. The method for predicting the frequency score of drug side effects based on multimodal and multitasking according to claim 9, characterized in that: in step three, the drug side effect pairs learned in steps one and two are fed in series into a multilayer perceptron for prediction, predicting whether there is a correlation between the drug and the side effects, and what the frequency score of the drug and the side effects is when a correlation exists; the expression is: y=MLP(P1||P2)y = MLP( P1 || P2 ) 其中MLP是多层感知机;y输出药物副作用对之间的关联分数和频率分数。MLP stands for Multilayer Perceptron; y outputs the correlation score and frequency score between drug side effect pairs.
CN202310479801.XA 2023-04-28 2023-04-28 Frequency score prediction method for drug side effects based on multimodality and multitask Active CN116504331B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310479801.XA CN116504331B (en) 2023-04-28 2023-04-28 Frequency score prediction method for drug side effects based on multimodality and multitask

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310479801.XA CN116504331B (en) 2023-04-28 2023-04-28 Frequency score prediction method for drug side effects based on multimodality and multitask

Publications (2)

Publication Number Publication Date
CN116504331A true CN116504331A (en) 2023-07-28
CN116504331B CN116504331B (en) 2024-07-26

Family

ID=87322656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310479801.XA Active CN116504331B (en) 2023-04-28 2023-04-28 Frequency score prediction method for drug side effects based on multimodality and multitask

Country Status (1)

Country Link
CN (1) CN116504331B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117524346A (en) * 2023-11-20 2024-02-06 东北林业大学 A multi-view cancer drug response prediction system

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150324693A1 (en) * 2014-05-06 2015-11-12 International Business Machines Corporation Predicting drug-drug interactions based on clinical side effects
CN108984699A (en) * 2018-07-05 2018-12-11 江西中医药大学 Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature
CN110297908A (en) * 2019-07-01 2019-10-01 中国医学科学院医学信息研究所 Diagnosis and treatment program prediction method and device
CN111477344A (en) * 2020-04-10 2020-07-31 电子科技大学 Drug side effect identification method based on self-weighted multi-core learning
CN111554360A (en) * 2020-04-27 2020-08-18 大连理工大学 Drug relocation prediction method based on biomedical literature and domain knowledge data
CN111951886A (en) * 2019-05-17 2020-11-17 天津科技大学 A Drug Relocation Prediction Method Based on Bayesian Inductive Matrix Completion
CN112308326A (en) * 2020-11-05 2021-02-02 湖南大学 A Link Prediction Method for Biological Networks Based on Meta-Paths and Bidirectional Encoders
CN112863693A (en) * 2021-02-04 2021-05-28 东北林业大学 Drug target interaction prediction method based on multi-channel graph convolution network
CN113362886A (en) * 2021-07-26 2021-09-07 北京航空航天大学 Adverse reaction prediction method based on drug implicit characteristic fusion similarity
CN113793696A (en) * 2021-09-15 2021-12-14 中南大学 Similarity-based method, system, terminal and readable storage medium for predicting occurrence frequency of side effects of new drug
CN114038574A (en) * 2021-11-03 2022-02-11 山西医科大学 A drug relocation system and method based on deep learning of heterogeneous association network
CN114201577A (en) * 2020-09-17 2022-03-18 阿里巴巴集团控股有限公司 Information processing method, device, storage medium and processor
KR20220043297A (en) * 2020-09-29 2022-04-05 가천대학교 산학협력단 Method, System, and Computer-Readable Medium for Predicting Side Effects of Drugs based on Similarity Measurement
US20230086217A1 (en) * 2021-09-22 2023-03-23 Santa Clara University Multimodal Cell Complex Neural Networks for Prediction of Multiple Drug Side Effects Severity and Frequency

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150324693A1 (en) * 2014-05-06 2015-11-12 International Business Machines Corporation Predicting drug-drug interactions based on clinical side effects
CN108984699A (en) * 2018-07-05 2018-12-11 江西中医药大学 Merge the drug poisonous substance adverse reaction intelligent answer method of multichannel text feature
CN111951886A (en) * 2019-05-17 2020-11-17 天津科技大学 A Drug Relocation Prediction Method Based on Bayesian Inductive Matrix Completion
CN110297908A (en) * 2019-07-01 2019-10-01 中国医学科学院医学信息研究所 Diagnosis and treatment program prediction method and device
CN111477344A (en) * 2020-04-10 2020-07-31 电子科技大学 Drug side effect identification method based on self-weighted multi-core learning
CN111554360A (en) * 2020-04-27 2020-08-18 大连理工大学 Drug relocation prediction method based on biomedical literature and domain knowledge data
CN114201577A (en) * 2020-09-17 2022-03-18 阿里巴巴集团控股有限公司 Information processing method, device, storage medium and processor
KR20220043297A (en) * 2020-09-29 2022-04-05 가천대학교 산학협력단 Method, System, and Computer-Readable Medium for Predicting Side Effects of Drugs based on Similarity Measurement
CN112308326A (en) * 2020-11-05 2021-02-02 湖南大学 A Link Prediction Method for Biological Networks Based on Meta-Paths and Bidirectional Encoders
CN112863693A (en) * 2021-02-04 2021-05-28 东北林业大学 Drug target interaction prediction method based on multi-channel graph convolution network
CN113362886A (en) * 2021-07-26 2021-09-07 北京航空航天大学 Adverse reaction prediction method based on drug implicit characteristic fusion similarity
CN113793696A (en) * 2021-09-15 2021-12-14 中南大学 Similarity-based method, system, terminal and readable storage medium for predicting occurrence frequency of side effects of new drug
US20230086217A1 (en) * 2021-09-22 2023-03-23 Santa Clara University Multimodal Cell Complex Neural Networks for Prediction of Multiple Drug Side Effects Severity and Frequency
CN114038574A (en) * 2021-11-03 2022-02-11 山西医科大学 A drug relocation system and method based on deep learning of heterogeneous association network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
曹春萍;何亚喆;: "融合BSRU和ATT-CNN的化学物质与疾病的关系抽取方法", 小型微型计算机系统, no. 04, 9 April 2020 (2020-04-09) *
朱嘉静: "基于机器学习的药物不良反应关键问题研究", 中国博士电子期刊网, no. 3, 15 March 2021 (2021-03-15) *
汪浩;王海平;吴信东;刘琦;: "药物-疾病关系预测:一种推荐系统模型", 中国药理学通报, vol. 31, no. 12, 31 December 2015 (2015-12-31) *
薛斌;李益洲;李梦龙;: "基于化学信息学方法预测药物副作用的研究进展", 计算机与应用化学, no. 05, 28 October 2019 (2019-10-28) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117524346A (en) * 2023-11-20 2024-02-06 东北林业大学 A multi-view cancer drug response prediction system
CN117524346B (en) * 2023-11-20 2024-07-05 东北林业大学 Multi-view cancer drug response prediction system

Also Published As

Publication number Publication date
CN116504331B (en) 2024-07-26

Similar Documents

Publication Publication Date Title
CN110210037B (en) Syndrome-oriented medical field category detection method
Peng et al. Sequential diagnosis prediction with transformer and ontological representation
CN110021341B (en) A Heterogeneous Network-Based Prediction Method for GPCR Drugs and Targeted Pathways
CN114420310A (en) Prediction method of drug ATCCode based on graph conversion network
CN116580848A (en) A Multi-Head Attention Mechanism Based Method for Analyzing Cancer Multi-Omics Data
CN117153393A (en) A cardiovascular disease risk prediction method based on multi-modal fusion
CN116612810A (en) Medicine target interaction prediction method based on interaction inference network
CN117877756B (en) A method for adverse drug reaction detection based on contrastive learning
Zhao et al. CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data
Tavakoli Seq2image: Sequence analysis using visualization and deep convolutional neural network
CN118412039A (en) A dynamic evolution prediction modeling method for PD based on multi-scale genetic maps
CN116504331B (en) Frequency score prediction method for drug side effects based on multimodality and multitask
CN119068972A (en) A method and system for predicting drug-target interaction relationship
CN117476252A (en) A method for predicting etiology and pathology based on knowledge graph
CN116630062A (en) Method, system, and storage medium for detecting medical insurance fraud
Kumar et al. An NLP-based framework for extracting the catalysts involved in Hydrogen production from scientific literature
Hussein et al. Accurate Uncertainty Dataset Classification Using Hybrid Deep Learning Models.
CN114519355A (en) Medicine named entity recognition and entity standardization method
Saloom et al. Mutation types and pathogenicity classification using multi-label multi-class deep networks
Tu et al. Gene expression pattern recognition algorithm based on deep learning
CN116682575A (en) Method for risk prediction of traditional Chinese medicine ingredients
Chowdhury Cell type classification via deep learning on single-cell gene expression data
Phan et al. Deep learning based biomedical NER framework
Xu et al. A semi-supervised method for extracting multiple relations of adverse drug events from biomedical literature
CN120565103B (en) Medicine and disease association prediction method based on hyperbolic graph feature learning network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant