WO2024212719A1

WO2024212719A1 - Prediction method and apparatus, readable storage medium and electronic device

Info

Publication number: WO2024212719A1
Application number: PCT/CN2024/079019
Authority: WO
Inventors: 陈红阳; 李瑞凤
Original assignee: 之江实验室
Priority date: 2023-04-14
Filing date: 2024-02-28
Publication date: 2024-10-17
Also published as: CN116453615A

Abstract

A prediction method and apparatus, a readable storage medium and an electronic device. The method comprises: on the basis of the molecular structure of a molecule to be predicted, by means of a graph neural network model, determining specified sub-graphs corresponding to sub-structures of said molecule, so as to determine specified properties of the specified sub-graphs on the basis of specified features corresponding to the specified sub-graphs and preset characterization features of each specified property, thus predicting molecular properties of said molecule. Therefore, said molecule has corresponding molecular properties since said molecule comprises sub-structures having specified properties. Obviously, the prediction method provides explainability for said molecule to have the corresponding molecular properties, thus ensuring the credibility of the prediction result.

Description

Prediction method, device, readable storage medium and electronic device

Technical Field

本公开涉及化学领域，尤其涉及一种预测方法、装置、可读存储介质及电子设备。The present disclosure relates to the field of chemistry, and in particular to a prediction method, device, readable storage medium and electronic device.

Background Art

随着计算机技术的发展和业务深入融合的需要，在医疗和化学领域，通过模型对分子性质进行预测，以基于预测得到的分子性质进行化合物筛选，已经成为深度学习在医疗和化学领域常见的应用场景之一。With the development of computer technology and the need for in-depth business integration, in the fields of medicine and chemistry, predicting molecular properties through models and screening compounds based on the predicted molecular properties has become one of the common application scenarios of deep learning in the fields of medicine and chemistry.

目前，在预测分子性质时，通常需获取到待预测分子的分子结构，再将该分子结构输入预先训练完成的预测模型中，得到该预测模型输出的分子性质，作为预测得到的待预测分子的分子性质。Currently, when predicting molecular properties, it is usually necessary to obtain the molecular structure of the molecule to be predicted, and then input the molecular structure into a pre-trained prediction model to obtain the molecular properties output by the prediction model as the predicted molecular properties of the molecule to be predicted.

但是，通常情况下，由于预测模型仅能确定待预测分子的分子性质，而无法给出该待预测分子具有其对应的分子性质的原因，这就导致预测得到的分子性质的可信度不高。However, usually, since the prediction model can only determine the molecular properties of the molecule to be predicted, but cannot provide the reason why the molecule to be predicted has its corresponding molecular properties, the credibility of the predicted molecular properties is not high.

发明内容Summary of the invention

本公开提供一种预测方法、装置、可读存储介质及电子设备。The present disclosure provides a prediction method, device, readable storage medium and electronic device.

本公开采用下述技术方案：The present disclosure adopts the following technical solutions:

本公开提供一种预测方法，包括：The present disclosure provides a prediction method, comprising:

根据获取到的待预测分子包含的各原子以及所述各原子之间的化学键，建立以所述各原子为节点、以所述化学键为边的分子图；According to the obtained atoms contained in the molecule to be predicted and the chemical bonds between the atoms, a molecular graph is established with the atoms as nodes and the chemical bonds as edges;

将所述分子图输入预先训练完成的图神经网络模型，得到所述图神经网络模型输出的所述分子图的若干指定子图，所述指定子图对应于所述待预测分子的分子结构中包含的子结构；Inputting the molecular graph into a pre-trained graph neural network model to obtain several specified subgraphs of the molecular graph output by the graph neural network model, wherein the specified subgraphs correspond to substructures contained in the molecular structure of the molecule to be predicted;

针对每个指定子图，确定该指定子图的指定特征，并根据所述指定特征和预存的各指定性质分别对应的表征特征，确定融合特征；For each designated sub-graph, a designated feature of the designated sub-graph is determined, and a fusion feature is determined according to the designated feature and the representation features corresponding to the pre-stored designated properties;

将所述融合特征输入分类模型，得到所述分类模型输出的目标分类结果，所述目标分类结果用于表征该指定子图对应的子结构具有的指定性质；Inputting the fusion feature into a classification model to obtain a target classification result output by the classification model, wherein the target classification result is used to characterize a specified property of a substructure corresponding to the specified subgraph;

根据各指定子图分别对应的目标分类结果，预测所述待预测分子的分子性质。The molecular properties of the molecules to be predicted are predicted according to the target classification results corresponding to the designated subgraphs.

可选地，将所述分子图输入预先训练完成的图神经网络模型，得到所述图神经网络模型输出的所述分子图的若干指定子图，具体包括：Optionally, the molecular graph is input into a pre-trained graph neural network model to obtain several specified subgraphs of the molecular graph output by the graph neural network model, specifically including:

将所述分子图输入预先训练的图神经网络模型中，得到所述图神经网络模型输出的所述待预测分子中的各化学键分别对应的置信度，所述置信度用于表征所述化学键对应的边属于指定子图的概率；Inputting the molecular graph into a pre-trained graph neural network model, obtaining the confidence corresponding to each chemical bond in the molecule to be predicted output by the graph neural network model, wherein the confidence is used to characterize the probability that the edge corresponding to the chemical bond belongs to a specified subgraph;

根据各置信度，确定属于指定子图的各指定边，并根据各指定边以及连接所述各指定边的节点，确定指定子图。According to each confidence level, each designated edge belonging to the designated subgraph is determined, and according to each designated edge and nodes connecting the designated edges, the designated subgraph is determined.

可选地，将所述分子图输入预先训练的图神经网络模型中，得到所述图神经网络模型输出的所述待预测分子中的各化学键分别对应的置信度，具体包括：Optionally, the molecular graph is input into a pre-trained graph neural network model to obtain the confidence corresponding to each chemical bond in the molecule to be predicted output by the graph neural network model, specifically including:

确定所述分子图中各节点分别对应的节点特征；Determine the node features corresponding to each node in the molecular graph;

针对所述待预测分子中包含的每个化学键，根据该化学键对应的边连接的节点的节点特征，确定该化学键的键特征；For each chemical bond contained in the molecule to be predicted, determining the bond feature of the chemical bond according to the node features of the nodes connected by the edge corresponding to the chemical bond;

将所述键特征输入预先训练的图神经网络模型中，得到所述图神经网络模型输出的该化学键的置信度。The bond feature is input into a pre-trained graph neural network model to obtain the confidence of the chemical bond output by the graph neural network model.

可选地，确定所述分子图中各节点分别对应的节点特征，具体包括： Optionally, determining the node features corresponding to each node in the molecular graph specifically includes:

对所述分子图中包含的各节点和各边分别进行特征提取，确定所述各节点分别对应的初始特征和所述各边分别对应的初始特征；Extracting features from each node and each edge in the molecular graph to determine initial features corresponding to each node and initial features corresponding to each edge;

针对所述分子图中的每个节点，确定该节点的邻居节点，并根据该节点的初始特征、所述邻居节点的初始特征，以及所述邻居节点和该节点之间的边的初始特征，确定该节点的节点特征。For each node in the molecular graph, the neighbor nodes of the node are determined, and the node features of the node are determined according to the initial features of the node, the initial features of the neighbor nodes, and the initial features of the edges between the neighbor nodes and the node.

可选地，根据所述指定特征和预存的各指定性质分别对应的表征特征，确定融合特征，具体包括：Optionally, determining the fusion feature according to the designated feature and the representation features corresponding to the pre-stored designated properties respectively includes:

针对每个指定性质，根据所述指定特征和该指定性质的表征特征之间的相似度，以及该指定性质的表征特征，确定该指定性质对应于该指定子图的增强特征；For each designated property, determining, according to the similarity between the designated feature and the characterizing feature of the designated property, and the characterizing feature of the designated property, an enhanced feature of the designated property corresponding to the designated subgraph;

将所述指定特征和各指定性质分别对应于该指定子图的增强特征进行融合，得到所述融合特征。The designated feature and the enhanced features of each designated property corresponding to the designated sub-graph are fused to obtain the fused feature.

可选地，根据各指定子图分别对应的目标分类结果，预测所述待预测分子的分子性质，具体包括：Optionally, predicting the molecular properties of the molecule to be predicted according to the target classification results corresponding to the designated subgraphs respectively includes:

根据所述分子图和各指定子图，确定所述待预测分子的分子结构中除各指定子图对应的各子结构外的其他子结构，作为特定子结构；According to the molecular graph and the designated subgraphs, determining other substructures in the molecular structure of the molecule to be predicted except for the substructures corresponding to the designated subgraphs as specific substructures;

确定所述特定子结构对应的特定子图，并确定所述特定子图的特定特征；Determining a specific subgraph corresponding to the specific substructure, and determining specific features of the specific subgraph;

将所述特定特征输入所述分类模型，得到所述分类模型输出的特定分类结果；Inputting the specific feature into the classification model to obtain a specific classification result output by the classification model;

根据所述特定分类结果和各指定子图分别对应的目标分类结果，预测所述待预测分子的分子性质。The molecular properties of the molecule to be predicted are predicted according to the specific classification results and the target classification results respectively corresponding to the designated subgraphs.

可选地，所述图神经网络模型和所述分类模型采用下述方式训练得到：Optionally, the graph neural network model and the classification model are trained in the following manner:

针对每个已标注指定性质的样本分子，根据该样本分子包含的各原子以及所述各原子之间的化学键，建立以各所述原子为节点，以所述化学键为边的样本分子图，作为训练样本，并将所述指定性质作为所述训练样本的标注；For each sample molecule that has been labeled with a specified property, a sample molecule graph is established with each atom as a node and the chemical bonds as an edge according to the atoms contained in the sample molecule and the chemical bonds between the atoms, as a training sample, and the specified property is used as a label for the training sample;

将各训练样本分别输入待训练的图神经网络模型中，得到所述待训练的图神经网络模型输出的各训练样本分别对应的样本指定子图；Input each training sample into the graph neural network model to be trained, and obtain a sample specified subgraph corresponding to each training sample output by the graph neural network model to be trained;

确定各样本指定子图分别对应的样本特征，并根据各样本特征以及所述各训练样本分别对应的标注，确定各指定性质的表征特征；Determine the sample features corresponding to the designated subgraphs of each sample, and determine the characterization features of each designated property according to the sample features and the labels corresponding to the training samples;

根据各样本特征以及所述各指定性质的表征特征，确定所述各训练样本分别对应的融合特征；Determine, according to the features of each sample and the characterization features of each specified property, the fusion features corresponding to each training sample;

将各融合特征输入待训练的分类模型，得到所述待训练的分类模型输出的样本分类结果；Inputting each fusion feature into the classification model to be trained to obtain a sample classification result output by the classification model to be trained;

根据各训练样本分别对应的样本指定子图的样本分类结果，确定所述各训练样本分别对应的样本性质；Determine the sample properties corresponding to each training sample according to the sample classification results of the sample designated subgraphs corresponding to each training sample;

根据所述各训练样本分别对应的样本性质及其标注，对所述图神经网络模型和所述分类模型进行训练。The graph neural network model and the classification model are trained according to the sample properties and annotations corresponding to each training sample.

本公开提供一种预测装置，包括：The present disclosure provides a prediction device, comprising:

第一确定模块，用于根据获取到的待预测分子包含的各原子以及所述各原子之间的化学键，建立以所述各原子为节点、以所述化学键为边的分子图；A first determination module is used to establish a molecular graph with each atom as a node and the chemical bonds as an edge according to each atom contained in the acquired molecule to be predicted and the chemical bonds between the atoms;

第二确定模块，用于将所述分子图输入预先训练完成的图神经网络模型，得到所述图神经网络模型输出的所述分子图的若干指定子图，所述指定子图对应于所述待预测分子的分子结构中包含的子结构；A second determination module is used to input the molecular graph into a pre-trained graph neural network model to obtain a plurality of specified subgraphs of the molecular graph output by the graph neural network model, wherein the specified subgraphs correspond to substructures contained in the molecular structure of the molecule to be predicted;

融合模块，用于针对每个指定子图，确定该指定子图的指定特征，并根据所述指定特征和预存的各指定性质分别对应的表征特征，确定融合特征；A fusion module, used for determining, for each specified sub-graph, a specified feature of the specified sub-graph, and determining a fusion feature according to the specified feature and the representation features corresponding to the pre-stored specified properties;

分类模块，用于将所述融合特征输入分类模型，得到所述分类模型输出的目标分类结果，所述目标分类结果用于表征该指定子图对应的子结构具有的指定性质；A classification module, used for inputting the fusion feature into a classification model to obtain a target classification result output by the classification model, wherein the target classification result is used to characterize a specified property of a substructure corresponding to the specified subgraph;

预测模块，用于根据各指定子图分别对应的目标分类结果，预测所述待预测分子的分子性质。A prediction module is used to predict the target classification results of the molecules to be predicted according to the target classification results corresponding to each specified subgraph. Molecular properties.

本公开提供了一种计算机可读存储介质，所述存储介质存储有计算机程序，所述计算机程序被处理器执行时实现上述预测方法。The present disclosure provides a computer-readable storage medium, wherein the storage medium stores a computer program, and when the computer program is executed by a processor, the above-mentioned prediction method is implemented.

本公开提供了一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现上述预测方法。The present disclosure provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the above-mentioned prediction method is implemented when the processor executes the program.

BRIEF DESCRIPTION OF THE DRAWINGS

此处所说明的附图用来提供对本公开的进一步理解，构成本公开的一部分，本公开的示意性实施例及其说明用于解释本公开，并不构成对本公开的不当限定。在附图中：The drawings described herein are used to provide a further understanding of the present disclosure and constitute a part of the present disclosure. The illustrative embodiments of the present disclosure and their descriptions are used to explain the present disclosure and do not constitute an improper limitation on the present disclosure. In the drawings:

图1为本公开提供的预测方法的流程示意图；FIG1 is a schematic diagram of a flow chart of a prediction method provided by the present disclosure;

图2为本公开提供的分子图的示意图；FIG2 is a schematic diagram of a molecular graph provided by the present disclosure;

图3为本公开提供的预测方法的流程示意图；FIG3 is a schematic diagram of a flow chart of a prediction method provided by the present disclosure;

图4为本公开提供的预测装置的结构示意图；FIG4 is a schematic diagram of the structure of a prediction device provided by the present disclosure;

图5为本公开提供的对应于图1的电子设备示意图。FIG. 5 is a schematic diagram of an electronic device corresponding to FIG. 1 provided by the present disclosure.

DETAILED DESCRIPTION

为使本公开的目的、技术方案和优点更加清楚，下面将结合本公开具体实施例及相应的附图对本公开技术方案进行清楚、完整地描述。显然，所描述的实施例仅是本公开一部分实施例，而不是全部的实施例。基于本公开中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the present disclosure clearer, the technical solutions of the present disclosure will be clearly and completely described below in combination with the specific embodiments of the present disclosure and the corresponding drawings. Obviously, the described embodiments are only part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by ordinary technicians in this field without making creative work are within the scope of protection of the present disclosure.

在相关技术中，直接将待预测分子的分子结构输入预测模型中，由预测模型输出待预测分子的分子性质，但预测模型无法对分子性质进行解释。即，预测模型无法给出待预测分子具有其对应的分子性质的原因。进而导致目前基于模型得到的待预测分子的分子性质的可信度较低的情况。In the related art, the molecular structure of the molecule to be predicted is directly input into the prediction model, and the prediction model outputs the molecular properties of the molecule to be predicted, but the prediction model cannot explain the molecular properties. That is, the prediction model cannot give the reason why the molecule to be predicted has its corresponding molecular properties. This leads to the low credibility of the molecular properties of the molecule to be predicted based on the current model.

本申请提供一种预测方法，基于待预测分子的分子结构，通过图神经网络模型，确定该待预测分子的子结构对应的指定子图，进而基于指定子图具有的指定性质，预测该待预测分子的分子性质。可见，本申请中的该预测方法，可基于该待预测分子的分子结构中包含的子结构的性质，来预测待预测分子的分子性质，也就是说，该待预测分子具有其对应的分子性质，是因为该待预测分子包含具有指定性质的子结构。显然，该预测方法为该待预测分子具有其对应的分子性质提供了可解释性，保证了该预测结果的可信度。The present application provides a prediction method, which is based on the molecular structure of the molecule to be predicted, and through a graph neural network model, determines the specified subgraph corresponding to the substructure of the molecule to be predicted, and then predicts the molecular properties of the molecule to be predicted based on the specified properties of the specified subgraph. It can be seen that the prediction method in the present application can predict the molecular properties of the molecule to be predicted based on the properties of the substructure contained in the molecular structure of the molecule to be predicted, that is, the molecule to be predicted has its corresponding molecular properties because the molecule to be predicted contains a substructure with specified properties. Obviously, the prediction method provides explainability for the molecule to be predicted to have its corresponding molecular properties, and ensures the credibility of the prediction result.

在本公开实施例提供的预测方法中，涉及到的图神经网络模型和分类模型可以是经预先训练得到的。该预测方法的执行过程可由用于识别待预测分子的分子性质的电子设备，例如服务器来执行。执行该图神经网络模型和分类模型的训练过程的电子设备与执行预测方法的电子设备可相同也可不同，本公开对此不做限制。In the prediction method provided in the embodiment of the present disclosure, the graph neural network model and classification model involved may be pre-trained. The execution process of the prediction method may be performed by an electronic device for identifying the molecular properties of the molecule to be predicted, such as a server. The electronic device that executes the training process of the graph neural network model and the classification model may be the same as or different from the electronic device that executes the prediction method, and the present disclosure does not limit this.

以下结合附图，详细说明本公开各实施例提供的技术方案。The technical solutions provided by various embodiments of the present disclosure are described in detail below in conjunction with the accompanying drawings.

图1为本公开提供的预测方法的流程示意图，具体包括以下步骤S100至S108。FIG1 is a schematic flow chart of the prediction method provided by the present disclosure, which specifically includes the following steps S100 to S108 .

S100：根据获取到的待预测分子包含的各原子以及所述各原子之间的化学键，建立以原子为节点、以化学键为边的分子图。S100: Building a molecular graph with atoms as nodes and chemical bonds as edges based on the acquired atoms contained in the molecule to be predicted and the chemical bonds between the atoms.

本公开中的该预测方法，可通过图神经网络模型确定指定子图。而通常情况下，图神经网络模型用于对图结构进行处理，且图结构可准确表征待预测分子的分子结构。基于此，可根据待预测分子的分子结构，确定分子图。The prediction method disclosed in the present invention can determine the specified subgraph through the graph neural network model. In general, the graph neural network model is used to process the graph structure, and the graph structure can accurately characterize the molecular structure of the molecule to be predicted. Based on this, the molecular graph can be determined according to the molecular structure of the molecule to be predicted.

具体的，针对每个分子，该分子中包含有各原子，且各原子之间存在化学键。其中，该化学键包含离子键和共价键。 Specifically, for each molecule, the molecule contains atoms, and chemical bonds exist between the atoms, wherein the chemical bonds include ionic bonds and covalent bonds.

以下以执行主体是服务器为例进行说明，该服务器可确定待预测分子。其中，该待预测分子可被携带在该服务器接收到的预测请求中，也可被携带在为该待预测分子根据预设的预测条件生成的预测任务中。则该服务器可对接收到的预测请求或生成的预测任务进行解析，确定预测请求或预测任务中携带的待预测分子。The following description is made by taking the execution subject as an example of a server, and the server can determine the molecule to be predicted. The molecule to be predicted can be carried in the prediction request received by the server, or can be carried in the prediction task generated for the molecule to be predicted according to the preset prediction conditions. Then the server can parse the received prediction request or the generated prediction task to determine the molecule to be predicted carried in the prediction request or the prediction task.

接着，该服务器可根据该待预测分子的分子结构，确定该待预测分子中包含的各原子，以及各原子之间的化学键。Next, the server may determine each atom contained in the molecule to be predicted and the chemical bonds between the atoms according to the molecular structure of the molecule to be predicted.

最后，该服务器可根据确定出的各原子，确定该分子图中的各节点，再针对每个化学键，根据该化学键连接的各节点，确定该分子图中各节点之间的边。这样，就构建出了以原子为节点且以化学键为边的分子图。则构建出的分子图中，包含有各原子对应的节点，以及各节点间的边。其中，该分子图中节点间的边用于表征分子图中与该边相连的两个节点之间的化学键。以图2为例。Finally, the server can determine each node in the molecular graph based on the determined atoms, and then for each chemical bond, determine the edges between the nodes in the molecular graph based on the nodes connected by the chemical bond. In this way, a molecular graph with atoms as nodes and chemical bonds as edges is constructed. The constructed molecular graph contains nodes corresponding to each atom and edges between nodes. Among them, the edges between nodes in the molecular graph are used to characterize the chemical bonds between two nodes connected to the edge in the molecular graph. Take Figure 2 as an example.

图2为本公开提供的分子图的示意图。图中以甲醇分子为例。该服务器可确定该甲醇分子包含的各原子：C、H、O，并确定各原子之间分别对应的化学键。然后，针对每个原子，确定该原子对应的节点，并针对每个化学键，根据该化学键连接的各节点，确定该分子图中各节点之间的边。FIG2 is a schematic diagram of a molecular graph provided by the present disclosure. The figure takes a methanol molecule as an example. The server can determine the atoms contained in the methanol molecule: C, H, O, and determine the chemical bonds corresponding to each atom. Then, for each atom, the node corresponding to the atom is determined, and for each chemical bond, the edges between the nodes in the molecular graph are determined according to the nodes connected by the chemical bond.

S102：将所述分子图输入预先训练完成的图神经网络模型，得到所述图神经网络模型输出的所述分子图的若干指定子图，所述指定子图对应于所述待预测分子的分子结构中包含的子结构。S102: Input the molecular graph into a pre-trained graph neural network model to obtain several specified subgraphs of the molecular graph output by the graph neural network model, wherein the specified subgraphs correspond to substructures contained in the molecular structure of the molecule to be predicted.

在本申请提供的一个或多个实施例中，如前所述的，该服务器可通过图神经网络模型，确定对应于该待预测分子的分子结构中包含的子结构的指定子图，以便于后续基于该指定子图具有的指定性质，预测该待预测分子的指定性质。In one or more embodiments provided in the present application, as described above, the server can determine the specified subgraph corresponding to the substructure contained in the molecular structure of the molecule to be predicted through a graph neural network model, so as to facilitate the subsequent prediction of the specified properties of the molecule to be predicted based on the specified properties of the specified subgraph.

具体的，该服务器中设置有预先训练完成的图神经网络模型。其中，该图神经网络模型用于确定该待预测分子对应的分子图中包含的指定子图。该指定子图对应于该待预测分子的分子结构中包含的子结构。该指定子图为用于表征该待预测分子的分子性质的子结构对应的子图。Specifically, a pre-trained graph neural network model is provided in the server. The graph neural network model is used to determine a specified subgraph contained in a molecular graph corresponding to the molecule to be predicted. The specified subgraph corresponds to a substructure contained in the molecular structure of the molecule to be predicted. The specified subgraph is a subgraph corresponding to a substructure used to characterize the molecular properties of the molecule to be predicted.

以NaOH分子为例，其对应的指定子图对应的子结构可为OH，则通过OH具有的指定性质，可预测该NaOH分子具有的性质。于是，OH可作为用于表征NaOH分子对应的分子性质的子结构。OH对应的指定子图可为用于表征NaOH分子对应的分子性质的子结构对应的子图。Taking the NaOH molecule as an example, the substructure corresponding to the designated subgraph corresponding to the NaOH molecule may be OH. Then, the properties of the NaOH molecule may be predicted by the designated properties of OH. Thus, OH may be used as a substructure for characterizing the molecular properties corresponding to the NaOH molecule. The designated subgraph corresponding to OH may be a subgraph corresponding to the substructure for characterizing the molecular properties corresponding to the NaOH molecule.

于是，该服务器可将上述步骤S100中确定出的分子图输入预先训练完成的图神经网络模型中，得到该图神经网络模型输出的该分子图包含的若干指定子图。其中，针对每个指定子图，该指定子图对应于该待预测分子的分子结构中包含的子结构。Therefore, the server can input the molecular graph determined in the above step S100 into the pre-trained graph neural network model to obtain several specified subgraphs contained in the molecular graph output by the graph neural network model. Among them, for each specified subgraph, the specified subgraph corresponds to a substructure contained in the molecular structure of the molecule to be predicted.

S104：针对每个指定子图，确定该指定子图的指定特征，并根据所述指定特征和预存的各指定性质分别对应的表征特征，确定融合特征。S104: For each designated sub-graph, a designated feature of the designated sub-graph is determined, and a fusion feature is determined according to the designated feature and the representation features corresponding to the pre-stored designated properties.

在本申请提供的一个或多个实施例中，该服务器在确定出指定子图后，可基于指定子图对应的子结构具有的指定性质，预测该待预测分子对应的分子性质。在该过程中，若直接对该指定子图对应的指定特征进行分类确定指定子图对应的子结构具有的指定性质，可能会出现无法解释该子结构为何具有该指定性质的情况。因此，该服务器可采用将预存的各指定性质分别对应的表征特征和指定子图的指定特征进行融合，得到融合结果，以基于融合结果来确定指定子图对应的子结构具有的指定性质的方式，来确定该指定子图对应的子结构具有的指定性质。In one or more embodiments provided in the present application, after determining the designated subgraph, the server can predict the molecular properties corresponding to the molecule to be predicted based on the designated properties of the substructure corresponding to the designated subgraph. In this process, if the designated features corresponding to the designated subgraph are directly classified to determine the designated properties of the substructure corresponding to the designated subgraph, it may be impossible to explain why the substructure has the designated properties. Therefore, the server can fuse the characterization features corresponding to the pre-stored designated properties and the designated features of the designated subgraph to obtain a fusion result, and determine the designated properties of the substructure corresponding to the designated subgraph based on the fusion result.

具体的，该服务器中预先存储有各指定性质分别对应的表征向量(即表征特征)。其中，该指定性质可为有毒、无毒等性质，也可为易溶于水、微溶于水、不溶于水等性质。具体该指定性质对应的类型可根据需要进行设置，本公开对此不做限制。Specifically, the server pre-stores characterization vectors (i.e., characterization features) corresponding to each specified property. The specified property may be toxic, non-toxic, or soluble in water, slightly soluble in water, or insoluble in water. The type corresponding to the specified property may be set as required, and the present disclosure does not limit this.

同时，针对每个指定性质，该指定性质对应的表征向量可用于表征具有该指定性质，即，指定子图的指定特征与该指定性质的表征向量越相似，该指定子图对应的子结构具有该指定性质的概率越高。该指定子图中包含的各原子以及各原子之间的化学键，确定该指定子图对应的指定特征。At the same time, for each specified property, the representation vector corresponding to the specified property can be used to represent the specified property, that is, the more similar the specified feature of the specified subgraph is to the representation vector of the specified property, the more similar the substructure corresponding to the specified subgraph is. The higher the probability of having the specified property, the higher the probability of having the specified property. The atoms contained in the specified subgraph and the chemical bonds between the atoms determine the specified feature corresponding to the specified subgraph.

最后，该服务器可将预先存储的各指定性质分别对应的表征向量和该指定子图对应的指定特征进行拼接，并将拼接结果作为融合特征。Finally, the server may concatenate the pre-stored representation vectors corresponding to the designated properties and the designated features corresponding to the designated sub-graph, and use the concatenation result as the fusion feature.

S106：将所述融合特征输入分类模型，得到所述分类模型输出的目标分类结果，所述目标分类结果用于表征该指定子图对应的子结构具有的指定性质。S106: Input the fusion feature into a classification model to obtain a target classification result output by the classification model, wherein the target classification result is used to characterize a specified property of a substructure corresponding to the specified subgraph.

在本申请提供的一个或多个实施例中，在确定出融合特征后，该服务器可通过分类模型对该融合特征进行分类，以确定该指定子图对应的子结构具有的指定性质。In one or more embodiments provided in the present application, after determining the fusion feature, the server may classify the fusion feature through a classification model to determine the specified property of the substructure corresponding to the specified subgraph.

具体的，该服务器中预先设置有分类模型，该分类模型用于确定指定子图对应的子结构具有的指定性质。Specifically, a classification model is pre-set in the server, and the classification model is used to determine the specified properties of the substructure corresponding to the specified subgraph.

于是，该服务器可将该融合特征作为输入，输入预先训练完成的分类模型中，得到该分类模型输出的该指定子图对应的目标分类结果。其中，该目标分类结果为该指定子图对应的子结构具有的指定性质。以该指定性质为有毒和无毒为例，则该目标分类结果可为该指定子图对应的子结构有毒，或该指定子图对应的子结构无毒。Therefore, the server can use the fusion feature as input to the pre-trained classification model to obtain the target classification result corresponding to the specified subgraph output by the classification model. The target classification result is the specified property of the substructure corresponding to the specified subgraph. Taking the specified property of toxic and non-toxic as an example, the target classification result can be that the substructure corresponding to the specified subgraph is toxic, or the substructure corresponding to the specified subgraph is non-toxic.

当然，该目标分类结果还可为该指定子图对应的子结构具有各指定性质的概率。同样，以该指定性质为有毒和无毒为例，则该目标分类结果可为该指定子图对应的子结构有毒的概率为20％，无毒的概率为80％等。该目标分类结果的具体表现形式可根据需要进行设置，本公开对此不做限制。Of course, the target classification result can also be the probability that the substructure corresponding to the specified subgraph has each specified property. Similarly, taking the specified property of toxicity and non-toxicity as an example, the target classification result can be that the probability that the substructure corresponding to the specified subgraph is toxic is 20%, the probability that it is non-toxic is 80%, etc. The specific expression form of the target classification result can be set as needed, and the present disclosure does not limit this.

S108：根据各指定子图分别对应的目标分类结果，预测所述待预测分子的分子性质。S108: Predicting the molecular properties of the molecule to be predicted according to the target classification results corresponding to the designated subgraphs.

在本申请提供的一个或多个实施例中，该预测方法需基于指定子图具有的指定性质，预测该待预测分子的分子性质。其中，该分子性质可为该待预测分子具有的指定性质，也可为该待预测分子具有指定性质的概率。In one or more embodiments provided in the present application, the prediction method needs to predict the molecular property of the molecule to be predicted based on the specified property of the specified subgraph, wherein the molecular property can be the specified property of the molecule to be predicted, or the probability that the molecule to be predicted has the specified property.

具体的，针对每个指定子图，该服务器可确定该指定子图对应的目标分类结果。Specifically, for each designated subgraph, the server may determine a target classification result corresponding to the designated subgraph.

然后，针对每个指定性质，该服务器可确定具有该指定性质的指定子图的数量，作为指定数量，并判断该指定数量是否超过该待预测分子包含的指定子图的数量的预设比例(例如，二分之一)。Then, for each designated property, the server may determine the number of designated subgraphs having the designated property as the designated number, and determine whether the designated number exceeds a preset ratio (eg, one-half) of the number of designated subgraphs included in the molecule to be predicted.

若是，则该服务器可确定该待预测分子具有该指定性质。If so, the server may determine that the molecule to be predicted has the specified property.

若否，则该服务器可确定该待预测分子不具有该指定性质。If not, the server may determine that the molecule to be predicted does not have the specified property.

进一步的，若某分子中包含有毒的子结构，则该分子大概率也有毒。而若该分子包含三个指定子图，其中，仅有一个指定子图对应的子结构是有毒的，显然，确定出的该待预测分子的分子性质可不包含“有毒”这一指定性质。为了避免上述情况的发生，该服务器可针对每个指定性质，判断该待预测分子包含的各指定子图中，是否包含具有该指定性质的指定子图。Furthermore, if a molecule contains a toxic substructure, the molecule is likely to be toxic. If the molecule contains three specified subgraphs, of which only one of the substructures corresponding to the specified subgraph is toxic, then obviously, the molecular properties of the molecule to be predicted may not include the specified property of "toxic". In order to avoid the above situation, the server can determine, for each specified property, whether the specified subgraphs contained in the molecule to be predicted contain a specified subgraph with the specified property.

当然，该服务器也可针对每个指定性质，根据该待预测分子包含的各指定子图具有该指定性质的概率，确定该待预测分子具有该指定性质的概率。Of course, the server may also determine, for each designated property, the probability that the molecule to be predicted has the designated property according to the probability that each designated subgraph included in the molecule to be predicted has the designated property.

于是，该服务器可根据确定出的该待预测分子具有的各指定性质，或该待预测分子具有各指定性质的概率，确定该待预测分子具有的分子性质。其中，该服务器可直接将该待预测分子具有各指定性质的概率，作为该待预测分子具有的分子性质，也可为将概率超过预设的特定概率的指定性质作为该待预测分子具有的分子性质。具体如何基于各指定子图分别对应的目标分类结果确定该待预测分子的分子性质可根据需要进行设置，本公开对此不做限制。Therefore, the server can determine the molecular properties of the molecule to be predicted based on the determined specified properties of the molecule to be predicted, or the probability that the molecule to be predicted has each specified property. Among them, the server can directly use the probability of the molecule to be predicted having each specified property as the molecular property of the molecule to be predicted, or can use the specified property whose probability exceeds a preset specific probability as the molecular property of the molecule to be predicted. How to determine the molecular property of the molecule to be predicted based on the target classification results corresponding to each specified subgraph can be set as needed, and the present disclosure does not limit this.

基于图1所示的预测方法，基于待预测分子的分子结构，通过图神经网络模型，确定该待预测分子的子结构对应的指定子图，进而基于指定子图具有的指定性质，预测该待预测分子的分子性质。可见，该待预测分子具有其对应的分子性质，是因为该待预测分子包含具有指定性质的子结构。显然，该预测方法为该待预测分子具有其对应的分子性质提供了可解释性，保证了该预测结果的可信度。Based on the prediction method shown in FIG1 , based on the molecular structure of the molecule to be predicted, the graph neural network model is used to determine the designated subgraph corresponding to the substructure of the molecule to be predicted, and then the molecular properties of the molecule to be predicted are predicted based on the designated properties of the designated subgraph. It can be seen that the molecule to be predicted has its corresponding molecular properties because the molecule to be predicted The molecule contains substructures with specified properties. Obviously, the prediction method provides explainability for the predicted molecule to have its corresponding molecular properties, ensuring the credibility of the prediction result.

进一步的，对于分子来说，分子具有的化学键通常可影响分子对应的物理性质。而本申请提供的该预测方法，其目的是确定出可用于表征待预测分子的分子性质的子结构的指定子图，再通过指定子图具有的指定性质，预测待预测分子的分子性质。则基于相同思想，若确定出可用于表征待预测分子的分子性质的化学键，再基于确定出的化学键确定指定子图，则基于指定子图对应的子结构的目标分类结果，即可确定待预测分子的分子性质。Furthermore, for molecules, the chemical bonds possessed by the molecules can usually affect the corresponding physical properties of the molecules. The purpose of the prediction method provided by the present application is to determine a designated subgraph of a substructure that can be used to characterize the molecular properties of the molecule to be predicted, and then predict the molecular properties of the molecule to be predicted by the designated properties possessed by the designated subgraph. Based on the same idea, if the chemical bonds that can be used to characterize the molecular properties of the molecule to be predicted are determined, and then the designated subgraph is determined based on the determined chemical bonds, then the molecular properties of the molecule to be predicted can be determined based on the target classification results of the substructure corresponding to the designated subgraph.

具体的，该服务器可将步骤S100中确定出的分子图作为输入，输入预先训练的图神经网络模型中，得到该图神经网络模型输出的该待预测分子中的各化学键分别对应的置信度。其中，该图神经网络用于确定该待预测分子对应的分子图中包含的指定子图。针对每个化学键，该化学键对应的置信度用于表征该化学键对应的边属于指定子图的概率。Specifically, the server can use the molecular graph determined in step S100 as input into a pre-trained graph neural network model to obtain the confidence corresponding to each chemical bond in the molecule to be predicted output by the graph neural network model. The graph neural network is used to determine the specified subgraph contained in the molecular graph corresponding to the molecule to be predicted. For each chemical bond, the confidence corresponding to the chemical bond is used to characterize the probability that the edge corresponding to the chemical bond belongs to the specified subgraph.

于是，该服务器可根据确定出的各置信度，确定属于该指定子图的各指定边。Then, the server may determine the designated edges belonging to the designated subgraph according to the determined confidence levels.

最后，该服务器可根据确定出的各指定边以及连接各指定边的节点，确定指定子图。Finally, the server may determine the designated subgraph according to the determined designated edges and the nodes connecting the designated edges.

更进一步的，针对每个化学键，该化学键的性质可基于连接该化学键的原子的性质表征。于是，在确定各化学键分别对应的置信度时，该服务器还可针对每个化学键，确定该化学键对应的键特征，再基于该键特征，确定该化学键对应的置信度。Furthermore, for each chemical bond, the properties of the chemical bond can be characterized based on the properties of the atoms connecting the chemical bond. Therefore, when determining the confidence corresponding to each chemical bond, the server can also determine the bond feature corresponding to each chemical bond, and then determine the confidence corresponding to the chemical bond based on the bond feature.

具体的，该服务器可确定该分子图中各节点分别对应的节点特征。Specifically, the server may determine the node features corresponding to each node in the molecular graph.

于是，针对该待预测分子中包含的每个化学键，该服务器可确定该化学键对应的边连接的两个节点，并确定对该两个节点分别进行特征提取，确定该两个节点的节点特征。Therefore, for each chemical bond contained in the molecule to be predicted, the server may determine two nodes connected by the edge corresponding to the chemical bond, and perform feature extraction on the two nodes respectively to determine the node features of the two nodes.

接着，该服务器可将该两个节点的节点特征进行拼接，作为该化学键的键特征。Then, the server may concatenate the node features of the two nodes as the bond features of the chemical bond.

最后，该服务器可将该键特征输入预先训练的图神经网络模型中，得到该图神经网络模型输出的该化学键的置信度。Finally, the server can input the bond feature into a pre-trained graph neural network model to obtain the confidence of the chemical bond output by the graph neural network model.

当然，该服务器也可对该化学键进行特征提取，确定该化学键对应的初始特征，再将该化学键对应的初始特征和该两个节点的节点特征进行融合，将融合结果作为该化学键的键特征。Of course, the server may also perform feature extraction on the chemical bond, determine the initial feature corresponding to the chemical bond, and then fuse the initial feature corresponding to the chemical bond with the node features of the two nodes, and use the fusion result as the bond feature of the chemical bond.

另外，在本公开中，针对该分子图中的每个原子，该原子对应的性质不仅受该原子本身影响，还受与该原子相连的其他原子影响。因此，在确定各原子分别对应的节点的节点特征时，还可基于该节点的邻居节点的特征确定。In addition, in the present disclosure, for each atom in the molecular graph, the property corresponding to the atom is not only affected by the atom itself, but also by other atoms connected to the atom. Therefore, when determining the node characteristics of the nodes corresponding to each atom, it can also be determined based on the characteristics of the neighboring nodes of the node.

具体的，针对该分子图中包含的每个节点和每个边，该服务器可对该节点和该边进行特征提取，确定该节点的初始特征和该边的初始特征。Specifically, for each node and each edge included in the molecular graph, the server may extract features of the node and the edge to determine the initial features of the node and the initial features of the edge.

然后，针对该分子图中的每个节点，该服务器可确定该节点的邻居节点。Then, for each node in the molecular graph, the server may determine the neighbor nodes of the node.

最后，该服务器可根据该节点的初始特征，该邻居节点的初始特征，以及该邻居节点和该节点之间的边的初始特征，确定该节点的节点特征。Finally, the server may determine the node characteristics of the node based on the initial characteristics of the node, the initial characteristics of the neighboring node, and the initial characteristics of the edge between the neighboring node and the node.

当然，需要说明的是，该服务器还可将确定出的该节点的节点特征，重新作为该节点的初始特征，并根据重新确定出的该节点的初始特征，继续基于该节点的初始特征和该节点的邻居节点的初始特征，来确定该节点的节点特征。以此来实现将该分子图中各原子的性质沿各化学键进行传递这一目的，进一步保证了基于该节点的节点特征确定出的化学键的置信度的准确性。Of course, it should be noted that the server can also use the determined node feature of the node as the initial feature of the node, and continue to determine the node feature of the node based on the initial feature of the node and the initial features of the node's neighbor nodes according to the re-determined initial feature of the node. In this way, the purpose of transferring the properties of each atom in the molecular graph along each chemical bond is achieved, and the accuracy of the confidence of the chemical bond determined based on the node feature of the node is further guaranteed.

进一步的，在本申请中，确定融合特征的目的，是基于该指定特征和预存的各指定性质分别对应的表征特征之间的相似度，来确定该指定特征具有的指定性质。而对于每个指定性质，若基于该指定性质对应的表征特征和该指定特征之间的相似度，对该指定特征进行增强，则基于增强结果确定的目标分类结果的准确度会更高。基于此，该服务器可基于指定性质的表征特征对该指定特征进行增强。Furthermore, in the present application, the purpose of determining the fusion feature is to determine the specified property of the specified feature based on the similarity between the specified feature and the characterization features corresponding to each pre-stored specified property. For each specified property, if the specified feature is enhanced based on the similarity between the characterization feature corresponding to the specified property and the specified feature, the accuracy of the target classification result determined based on the enhanced result will be higher. Based on this, the server can enhance the specified feature based on the characterization feature of the specified property.

具体的，该服务器可针对每个指定性质，根据该指定性质的表征特征和指定子图的指定特征之间的相似度，以及该指定性质的表征特征，确定该指定性质对应于该指定子图的增强特征。Specifically, the server may determine, for each specified property, that the specified property corresponds to the specified subgraph according to the similarity between the characterizing feature of the specified property and the specified feature of the specified subgraph, and the characterizing feature of the specified property. Enhanced features of graphs.

则在确定出各指定性质分别对应于该指定子图的增强特征后，该服务器可将该指定特征和各指定性质分别对应于该指定子图的增强特征进行融合，得到融合特征。则后续可基于融合特征得到该指定子图的目标分类结果。After determining that each designated property corresponds to the enhanced feature of the designated sub-graph, the server may fuse the designated feature with the enhanced feature of each designated property corresponding to the designated sub-graph to obtain a fused feature. Subsequently, the target classification result of the designated sub-graph may be obtained based on the fused feature.

其中，针对每个指定性质，该指定性质的表征特征可采用下述方式确定：Wherein, for each specified property, the characterization feature of the specified property can be determined in the following manner:

针对每个指定性质，将该指定性质的各样本分子分别对应的样本分子图输入预先训练的图神经网络模型中，得到所述图神经网络模型输出的该指定性质的各样本分子分别对应的指定子图，再基于各指定子图分别对应的指定特征，确定该指定性质的表征特征。For each specified property, the sample molecule graphs corresponding to each sample molecule of the specified property are input into a pre-trained graph neural network model to obtain the specified subgraphs corresponding to each sample molecule of the specified property output by the graph neural network model, and then the characterization features of the specified property are determined based on the specified features corresponding to each specified subgraph.

当然，该服务器也可从该指定性质对应的各样本分子中，选择任一样本分子，将该样本分子的分子图输入预先训练的图神经网络模型中，得到所述图神经网络模型输出的该样本分子对应的指定子图，再将该指定子图的指定特征作为该指定性质的表征特征。Of course, the server can also select any sample molecule from the sample molecules corresponding to the specified property, input the molecular graph of the sample molecule into the pre-trained graph neural network model, obtain the specified subgraph corresponding to the sample molecule output by the graph neural network model, and then use the specified features of the specified subgraph as the characterization features of the specified property.

具体如何确定各指定性质分别对应的表征特征，可根据需要进行设置，本公开对此不做限制。How to determine the characterization features corresponding to each specified property can be set according to needs, and this disclosure does not impose any restrictions on this.

更进一步的，对于待预测分子来说，除确定出的各指定子图对应的子结构外，该待预测分子的分子结构中，还包含有其他子结构，而其他子结构的性质也可影响该待预测分子的性质。因此，该服务器还可将上述其他子结构作为特定子结构，并基于特定子结构确定该待预测分子的性质。Furthermore, for the molecule to be predicted, in addition to the substructures corresponding to the specified subgraphs, the molecular structure of the molecule to be predicted also includes other substructures, and the properties of the other substructures may also affect the properties of the molecule to be predicted. Therefore, the server may also use the other substructures as specific substructures and determine the properties of the molecule to be predicted based on the specific substructures.

具体的，该服务器可根据该分子图和各指定图，确定该待预测分子的分子结构中除各指定子图对应的各子结构外的其他子结构，作为特定子结构。Specifically, the server may determine, according to the molecular graph and the designated graphs, other substructures in the molecular structure of the molecule to be predicted except for the substructures corresponding to the designated subgraphs as specific substructures.

于是，该服务器可确定该特定子结构对应的特定子图，并确定该特定子图的特定特征。Then, the server may determine the specific subgraph corresponding to the specific substructure, and determine the specific features of the specific subgraph.

接着，该服务器可将该特定特征输入分类模型，得到该分类模型输出的特定分类结果。Next, the server may input the specific feature into the classification model to obtain a specific classification result output by the classification model.

最后，该服务器可根据该特定分类结果和各指定子图分别对应的目标分类结果，预测该待预测分子的分子性质。如图3所示。Finally, the server can predict the molecular properties of the molecule to be predicted based on the specific classification result and the target classification results corresponding to each designated subgraph, as shown in FIG3 .

图3为本公开提供的预测方法的流程示意图，其中，该服务器可将该待预测分子的分子图作为输入，输入图神经网络模型中，得到该图神经网络模型输出的指定子图和特定子图。则该服务器可将该指定子图的指定特征和预设的各指定性质的表征特征进行融合，得到融合特征，再将该融合特征和该特定子图分别输入分类模型中，得到该分类模型输出该融合特征对应的目标分类结果，以及该特定子图对应的特定分类结果。最后，该服务器可基于该特定分类结果和目标分类结果，确定预测结果，该预测结果为该待预测分子具有的分子性质。FIG3 is a flow chart of the prediction method provided by the present disclosure, wherein the server can use the molecular graph of the molecule to be predicted as input, input it into the graph neural network model, and obtain the specified subgraph and the specific subgraph output by the graph neural network model. The server can then fuse the specified features of the specified subgraph with the preset characterization features of each specified property to obtain a fused feature, and then input the fused feature and the specific subgraph into the classification model respectively, to obtain the target classification result corresponding to the fused feature output by the classification model, and the specific classification result corresponding to the specific subgraph. Finally, the server can determine the prediction result based on the specific classification result and the target classification result, and the prediction result is the molecular property of the molecule to be predicted.

另外，本公开中的图神经网络模型和该分类模型可采用下述方式训练得到：In addition, the graph neural network model and the classification model in the present disclosure can be trained in the following manner:

具体的，该服务器可获取若干已标注指定性质的样本分子。并针对每个已标注指定性质的样本分子，根据该样本分子包含的各原子以及所述各原子之间的化学键，建立以原子为节点且以化学键为边的样本分子图，作为训练样本。同时，该服务器可将该指定性质作为该训练样本的标注。Specifically, the server may obtain a number of sample molecules that have been labeled with specified properties. For each sample molecule that has been labeled with specified properties, according to the atoms contained in the sample molecule and the chemical bonds between the atoms, a sample molecule graph with atoms as nodes and chemical bonds as edges is established as a training sample. At the same time, the server may use the specified properties as the label of the training sample.

其次，该服务器可将各训练样本分别输入待训练的图神经网络模型中，得到该图神经网络模型输出的各训练样本分别对应的样本指定子图。Secondly, the server can input each training sample into the graph neural network model to be trained, and obtain the sample specified subgraph corresponding to each training sample output by the graph neural network model.

接着，该服务器可确定各样本指定子图分别对应的样本特征，并根据各样本特征以及各训练样本分别对应的标注，确定各指定性质的表征特征。Next, the server may determine the sample features corresponding to the designated subgraphs of each sample, and determine the characterization features of each designated property according to the sample features and the annotations corresponding to the training samples.

然后，该服务器可根据各样本特征以及各指定性质的表征特征，确定各训练样本分别对应的融合特征。Then, the server may determine the fusion features corresponding to each training sample according to the features of each sample and the characterization features of each specified property.

之后，该服务器可将各融合特征输入待训练的分类模型，得到该分类模型输出的样本分类结果。Afterwards, the server can input each fusion feature into the classification model to be trained to obtain the sample classification result output by the classification model.

于是，该服务器可根据各训练样本分别对应的样本指定子图的样本分类结果，确定所述各训练样本分别对应的样本性质。Therefore, the server can determine the sample classification results of the sample specified sub-graph corresponding to each training sample. Determine the sample properties corresponding to each training sample.

最后，该服务器可根据各训练样本分别对应的样本性质及其标注，确定损失，并以损失最小为目标，对该图神经网络模型和该分类模型进行训练。Finally, the server can determine the loss according to the sample properties and labels corresponding to each training sample, and train the graph neural network model and the classification model with the goal of minimizing the loss.

进一步的，本公开中的图神经网络还可采用下述方式训练得到：Furthermore, the graph neural network in the present disclosure can also be trained in the following manner:

具体的，该服务器可获取若干已标注指定性质的样本分子。并针对每个已标注指定性质的样本分子，根据该样本分子包含的各原子以及所述各原子之间的化学键，建立以原子为节点且以化学键为边的样本分子图，作为训练样本。Specifically, the server may obtain a number of sample molecules with specified properties labeled, and for each sample molecule with specified properties labeled, according to the atoms contained in the sample molecule and the chemical bonds between the atoms, establish a sample molecule graph with atoms as nodes and chemical bonds as edges as training samples.

其次，该服务器可针对每个训练样本，确定该训练样本包含的指定子图，作为该训练样本的目标样本子图。Secondly, the server may determine, for each training sample, a designated subgraph included in the training sample as a target sample subgraph of the training sample.

然后，该服务器可将各训练样本分别输入待训练的图神经网络模型中，得到该图神经网络模型输出的各训练样本分别对应的样本指定子图。Then, the server can input each training sample into the graph neural network model to be trained, and obtain the sample specified subgraph corresponding to each training sample output by the graph neural network model.

最后，该服务器可根据各训练样本分别对应的样本指定子图和目标样本子图，确定该图神经网络模型的损失，并以该损失最小为优化目标，调整该图神经网络模型的模型参数。Finally, the server can determine the loss of the graph neural network model according to the sample designated subgraph and the target sample subgraph corresponding to each training sample, and adjust the model parameters of the graph neural network model with the minimum loss as the optimization goal.

基于同样思路，本公开还提供一种预测装置，如图4所示。Based on the same idea, the present disclosure also provides a prediction device, as shown in FIG4 .

图4为本公开提供的预测装置，其中：FIG4 is a prediction device provided by the present disclosure, wherein:

第一确定模块200，用于根据获取到的待预测分子包含的各原子以及所述各原子之间的化学键，建立以原子为节点、以化学键为边的分子图。The first determination module 200 is used to establish a molecular graph with atoms as nodes and chemical bonds as edges according to the obtained atoms contained in the molecule to be predicted and the chemical bonds between the atoms.

第二确定模块202，用于将所述分子图输入预先训练完成的图神经网络模型，得到所述图神经网络模型输出的所述分子图的若干指定子图，所述指定子图对应于所述待预测分子的分子结构中包含的子结构。The second determination module 202 is used to input the molecular graph into a pre-trained graph neural network model to obtain several specified subgraphs of the molecular graph output by the graph neural network model, wherein the specified subgraphs correspond to substructures contained in the molecular structure of the molecule to be predicted.

融合模块204，用于针对每个指定子图，确定该指定子图的指定特征，并根据所述指定特征和预存的各指定性质分别对应的表征特征，确定融合特征。The fusion module 204 is used to determine, for each designated sub-graph, a designated feature of the designated sub-graph, and determine a fusion feature according to the designated feature and the pre-stored representation features corresponding to each designated property.

分类模块206，用于将所述融合特征输入分类模型，得到所述分类模型输出的目标分类结果，所述目标分类结果用于表征该指定子图对应的子结构具有的指定性质。The classification module 206 is used to input the fusion feature into the classification model to obtain the target classification result output by the classification model, and the target classification result is used to characterize the specified property of the substructure corresponding to the specified subgraph.

预测模块208，用于根据各指定子图分别对应的目标分类结果，预测所述待预测分子的分子性质。The prediction module 208 is used to predict the molecular properties of the molecule to be predicted according to the target classification results corresponding to each designated subgraph.

可选地，第一确定模块200，用于将所述分子图输入预先训练的图神经网络模型中，得到所述图神经网络模型输出的所述待预测分子中的各化学键分别对应的置信度，所述置信度用于表征所述化学键对应的边属于指定子图的概率，根据各置信度，确定属于指定子图的各指定边，并根据各指定边以及连接所述各指定边的节点，确定指定子图。Optionally, the first determination module 200 is used to input the molecular graph into a pre-trained graph neural network model to obtain the confidence corresponding to each chemical bond in the molecule to be predicted output by the graph neural network model, wherein the confidence is used to characterize the probability that the edge corresponding to the chemical bond belongs to a specified subgraph, and based on each confidence, determine each specified edge belonging to the specified subgraph, and determine the specified subgraph based on each specified edge and the nodes connecting the each specified edge.

可选地，第一确定模块200，用于确定所述分子图中各节点分别对应的节点特征，针对所述待预测分子中包含的每个化学键，根据该化学键对应的边连接的节点的节点特征，确定该化学键的键特征，将所述键特征输入预先训练的图神经网络模型中，得到所述图神经网络模型输出的该化学键的置信度。Optionally, the first determination module 200 is used to determine the node features corresponding to each node in the molecular graph, and for each chemical bond contained in the molecule to be predicted, determine the bond features of the chemical bond according to the node features of the nodes connected by the edges corresponding to the chemical bond, and input the bond features into a pre-trained graph neural network model to obtain the confidence of the chemical bond output by the graph neural network model.

可选地，第一确定模块200，用于对所述分子图中包含的各节点和各边分别进行特征提取，确定所述各节点分别对应的初始特征和所述各边分别对应的初始特征，针对所述分子图中的每个节点，确定该节点的邻居节点，并根据该节点的初始特征、所述邻居节点的初始特征，以及所述邻居节点和该节点之间的边的初始特征，确定该节点的节点特征。Optionally, the first determination module 200 is used to extract features from each node and each edge contained in the molecular graph, determine the initial features corresponding to each node and the initial features corresponding to each edge, determine the neighbor nodes of each node in the molecular graph, and determine the node features of the node based on the initial features of the node, the initial features of the neighbor nodes, and the initial features of the edges between the neighbor nodes and the node.

可选地，融合模块204，用于针对每个指定性质，根据所述指定特征和该指定性质的表征特征之间的相似度，以及该指定性质的表征特征，确定该指定性质对应于该指定子图的增强特征，将所述指定特征和各指定性质分别对应于该指定子图的增强特征进行融合，得到融合特征。Optionally, the fusion module 204 is used to determine, for each specified property, an enhanced feature of the specified property corresponding to the specified sub-graph based on the similarity between the specified feature and the characterization feature of the specified property, and the characterization feature of the specified property, and fuse the specified feature and the enhanced features of the specified sub-graph corresponding to each specified property to obtain a fused feature.

可选地，预测模块208，用于根据所述分子图和各指定子图，确定所述待预测分子的分子结构中除各指定子图对应的各子结构外的其他子结构，作为特定子结构，确定所述特定子结构对应的特定子图，并确定所述特定子图的特定特征，将所述特定特征输入所述分类模型，得到所述分类模型输出的特定分类结果，根据所述特定分类结果和各指定子图分别对应的目标分类结果，预测所述待预测分子的分子性质。Optionally, the prediction module 208 is used to determine, based on the molecular graph and each designated subgraph, other substructures in the molecular structure of the molecule to be predicted except for each substructure corresponding to each designated subgraph as specific substructures, and determine the A specific subgraph corresponding to the specific substructure is determined, and specific features of the specific subgraph are determined. The specific features are input into the classification model to obtain a specific classification result output by the classification model, and the molecular properties of the molecule to be predicted are predicted according to the specific classification result and the target classification results corresponding to each designated subgraph.

所述装置还包括：The device also includes:

训练模块210，用于采用下述方式训练得到所述图神经网络模型和所述分类模型：针对每个已标注指定性质的样本分子，根据该样本分子包含的各原子以及所述各原子之间的化学键，建立以原子为节点，以化学键为边的样本分子图，作为训练样本，并将所述指定性质作为所述训练样本的标注，将各训练样本分别输入待训练的图神经网络模型中，得到所述图神经网络模型输出的各训练样本分别对应的样本指定子图，确定各样本指定子图分别对应的样本特征，并根据各样本特征以及所述各训练样本分别对应的标注，确定各指定性质的表征特征，根据各样本特征以及所述各指定性质的表征特征，确定所述各训练样本分别对应的融合特征，将各融合特征输入待训练的分类模型，得到所述分类模型输出的样本分类结果，根据各训练样本分别对应的样本指定子图的样本分类结果，确定所述各训练样本分别对应的样本性质，根据所述各训练样本分别对应的样本性质及其标注，对所述图神经网络模型和所述分类模型进行训练。The training module 210 is used to train the graph neural network model and the classification model in the following manner: for each sample molecule that has been labeled with a specified property, a sample molecule graph with atoms as nodes and chemical bonds as edges is established as a training sample based on the atoms contained in the sample molecule and the chemical bonds between the atoms, and the specified property is used as the annotation of the training sample. Each training sample is input into the graph neural network model to be trained to obtain a sample specified subgraph corresponding to each training sample output by the graph neural network model, determine the sample features corresponding to each sample specified subgraph, and determine the characterization features of each specified property based on each sample feature and the annotation corresponding to each training sample, determine the fusion features corresponding to each training sample based on each sample feature and the characterization features of each specified property, input each fusion feature into the classification model to be trained to obtain a sample classification result output by the classification model, determine the sample properties corresponding to each training sample based on the sample classification result of the sample specified subgraph corresponding to each training sample, and train the graph neural network model and the classification model based on the sample properties corresponding to each training sample and their annotations.

本公开还提供了一种计算机可读存储介质，该存储介质存储有计算机程序，计算机程序可用于执行上述图1提供的预测方法。The present disclosure also provides a computer-readable storage medium, which stores a computer program. The computer program can be used to execute the prediction method provided in FIG. 1 above.

本公开还提供了图5所示的电子设备的示意结构图。如图5所述，在硬件层面，该电子设备包括处理器、内部总线、网络接口、内存以及非易失性存储器，当然还可能包括其他业务所需要的硬件。处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行，以实现上述图1所述的预测方法。当然，除了软件实现方式之外，本公开并不排除其他实现方式，比如逻辑器件抑或软硬件结合的方式等等，也就是说以下处理流程的执行主体并不限定于各个逻辑单元，也可以是硬件或逻辑器件。The present disclosure also provides a schematic structural diagram of an electronic device as shown in FIG5. As shown in FIG5, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, and of course may also include hardware required for other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs it to implement the prediction method described in FIG1 above. Of course, in addition to the software implementation, the present disclosure does not exclude other implementation methods, such as logic devices or a combination of software and hardware, etc., that is to say, the execution subject of the following processing flow is not limited to each logic unit, but can also be hardware or logic devices.

在20世纪90年代，对于一个技术的改进可以很明显地区分是硬件上的改进(例如，对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而，随着技术的发展，当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此，不能说一个方法流程的改进就不能用硬件实体模块来实现。例如，可编程逻辑器件(Programmable Logic Device，PLD)(例如现场可编程门阵列(Field Programmable Gate Array，FPGA))就是这样一种集成电路，其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上，而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且，如今，取代手工地制作集成电路芯片，这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现，它与程序开发撰写时所用的软件编译器相类似，而要编译之前的原始代码也得用特定的编程语言来撰写，此称之为硬件描述语言(Hardware Description Language，HDL)，而HDL也并非仅有一种，而是有许多种，如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等，目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚，只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中，就可以很容易得到实现该逻辑方法流程的硬件电路。In the 1990s, it was very clear whether the improvement of a technology was hardware improvement (for example, improvement of the circuit structure of diodes, transistors, switches, etc.) or software improvement (improvement of the method flow). However, with the development of technology, many improvements of the method flow today can be regarded as direct improvements of the hardware circuit structure. Designers almost always obtain the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be implemented with hardware entity modules. For example, a programmable logic device (PLD) (such as a field programmable gate array (FPGA)) is such an integrated circuit whose logical function is determined by the user's programming of the device. Designers can "integrate" a digital system on a PLD by programming themselves, without having to ask chip manufacturers to design and make dedicated integrated circuit chips. Moreover, nowadays, instead of manually making integrated circuit chips, this kind of programming is mostly implemented by "logic compiler" software, which is similar to the software compiler used when developing and writing programs. The original code before compilation must also be written in a specific programming language, which is called Hardware Description Language (HDL). There is not only one kind of HDL, but many kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., and the most commonly used ones are VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should also know that it is only necessary to program the method flow slightly in the above-mentioned hardware description languages and program it into the integrated circuit, and then it is easy to obtain the hardware circuit that realizes the logic method flow.

控制器可以按任何适当的方式实现，例如，控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit，ASIC)、可编程逻辑控制器和嵌入微控制器的形式，控制器的例子包括但不限于以下微控制器： ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320，存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道，除了以纯计算机可读程序代码方式实现控制器以外，完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件，而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至，可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller may be implemented in any suitable manner, for example, the controller may take the form of a microprocessor or processor and a computer-readable medium storing a computer-readable program code (such as software or firmware) executable by the (micro)processor, a logic gate, a switch, an application specific integrated circuit (ASIC), a programmable logic controller, and an embedded microcontroller. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, the memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art also know that in addition to implementing the controller in a purely computer-readable program code, the controller can be made to implement the same function in the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers by logically programming the method steps. Therefore, this controller can be considered as a hardware component, and the devices included therein for implementing various functions can also be regarded as structures within the hardware component. Or even, the devices for implementing various functions can be regarded as both software modules for implementing the method and structures within the hardware component.

上述实施例阐明的系统、装置、模块或单元，具体可以由计算机芯片或实体实现，或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的，计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

为了描述的方便，描述以上装置时以功能分为各种单元分别描述。当然，在实施本公开时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, the above device is described in terms of functions and is divided into various units and described separately. Of course, when implementing the present disclosure, the functions of each unit can be implemented in the same or multiple software and/or hardware.

本领域内的技术人员应明白，本公开的实施例可提供为方法、系统、或计算机程序产品。因此，本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程病灶检测设备的处理器以产生一个机器，使得通过计算机或其他可编程病灶检测设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present disclosure is described with reference to the flowcharts and/or block diagrams of the methods, devices (systems), and computer program products according to the embodiments of the present disclosure. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the processes and/or boxes in the flowchart and/or block diagram, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable lesion detection device to produce a machine, so that the instructions executed by the processor of the computer or other programmable lesion detection device produce a device for implementing the functions specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程病灶检测设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable lesion detection device to operate in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程病灶检测设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable lesion detection device so that a series of operating steps are executed on the computer or other programmable device to produce computer-implemented processing, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in a flowchart and/or one or more boxes in a block diagram.

在一个典型的配置中，计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in a computer-readable medium, in the form of random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer-readable media include permanent and non-permanent, removable and non-removable media that can implement information storage by any method or technology. The information can be computer-readable instructions, data structures, program modules or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices or any other non-transmission media that can be used Storing information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory media such as modulated data signals and carrier waves.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, commodity or device. In the absence of more restrictions, the elements defined by the sentence "comprises a ..." do not exclude the existence of other identical elements in the process, method, commodity or device including the elements.

本领域技术人员应明白，本公开的实施例可提供为方法、系统或计算机程序产品。因此，本公开可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且，本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It will be appreciated by those skilled in the art that the embodiments of the present disclosure may be provided as methods, systems or computer program products. Therefore, the present disclosure may take the form of a complete hardware embodiment, a complete software embodiment or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

本公开可以在由计算机执行的计算机可执行指令的一般上下文中描述，例如程序模块。一般地，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本公开，在这些分布式计算环境中，由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中，程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The present disclosure may be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. The present disclosure may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in local and remote computer storage media, including storage devices.

本公开中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于系统实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in the present disclosure is described in a progressive manner, and the same or similar parts between the embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiment.

以上所述仅为本公开的实施例而已，并不用于限制本公开。对于本领域技术人员来说，本公开可以有各种更改和变化。凡在本公开的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本公开的权利要求范围之内。 The above description is only an embodiment of the present disclosure and is not intended to limit the present disclosure. For those skilled in the art, the present disclosure may have various modifications and variations. Any modification, equivalent substitution, improvement, etc. made within the spirit and principle of the present disclosure shall be included in the scope of the claims of the present disclosure.

Claims

A prediction method, characterized in that the method comprises:

According to the obtained atoms contained in the molecule to be predicted and the chemical bonds between the atoms, a molecular graph is established with the atoms as nodes and the chemical bonds as edges;

Inputting the molecular graph into a pre-trained graph neural network model to obtain a plurality of specified subgraphs of the molecular graph output by the graph neural network model, wherein the specified subgraphs correspond to substructures contained in the molecular structure of the molecule to be predicted;

For each designated sub-graph, a designated feature of the designated sub-graph is determined, and a fusion feature is determined according to the designated feature and the representation features corresponding to the pre-stored designated properties;

Inputting the fusion feature into a classification model to obtain a target classification result output by the classification model, wherein the target classification result is used to characterize a specified property of a substructure corresponding to the specified subgraph;

The molecular properties of the molecules to be predicted are predicted according to the target classification results corresponding to the designated subgraphs.

The method according to claim 1, characterized in that the molecular graph is input into a pre-trained graph neural network model to obtain several specified subgraphs of the molecular graph output by the graph neural network model, specifically comprising:

Inputting the molecular graph into a pre-trained graph neural network model, obtaining the confidence corresponding to each chemical bond in the molecule to be predicted output by the graph neural network model, wherein the confidence is used to characterize the probability that the edge corresponding to each chemical bond belongs to a specified subgraph;

According to the confidences, the designated edges belonging to the designated subgraph are determined, and according to the designated edges and the nodes connected by the designated edges, the designated subgraph is determined.

The method of claim 2, characterized in that the molecular graph is input into a pre-trained graph neural network model to obtain the confidence corresponding to each chemical bond in the molecule to be predicted output by the graph neural network model, specifically comprising:

Determine the node features corresponding to each node in the molecular graph;

For each chemical bond contained in the molecule to be predicted, determining the bond feature of the chemical bond according to the node features of the nodes connected by the edge corresponding to the chemical bond;

The bond feature is input into the pre-trained graph neural network model to obtain the confidence of the chemical bond output by the graph neural network model.

The method according to claim 3, characterized in that determining the node features corresponding to each node in the molecular graph specifically comprises:

Extracting features from each node and each edge in the molecular graph to determine initial features corresponding to each node and initial features corresponding to each edge;

For each node in the molecular graph, the neighbor nodes of the node are determined, and the node features of the node are determined according to the initial features of the node, the initial features of the neighbor nodes, and the initial features of the edges between the neighbor nodes and the node.

The method according to claim 1, characterized in that determining the fusion feature according to the designated feature and the representation features corresponding to the pre-stored designated properties respectively comprises:

For each designated property, determining, according to the similarity between the designated feature and the characterizing feature of the designated property, and the characterizing feature of the designated property, an enhanced feature of the designated property corresponding to the designated subgraph;

The designated feature and the enhanced features of each designated property corresponding to the designated sub-graph are fused to obtain the fused feature.

The method according to claim 1, characterized in that predicting the molecular properties of the molecule to be predicted according to the target classification results corresponding to each designated subgraph respectively comprises:

According to the molecular graph and the designated subgraphs, determining other substructures in the molecular structure of the molecule to be predicted except for the substructures corresponding to the designated subgraphs as specific substructures;

Determining a specific subgraph corresponding to the specific substructure, and determining specific features of the specific subgraph;

Inputting the specific feature into the classification model to obtain a specific classification result output by the classification model;

The molecular properties of the molecule to be predicted are predicted according to the specific classification results and the target classification results respectively corresponding to the designated subgraphs.

The method according to claim 1, characterized in that the graph neural network model and the classification model are trained in the following manner:

For each sample molecule that has been labeled with a specified property, a sample molecule graph is established with each atom as a node and each chemical bond as an edge according to each atom contained in the sample molecule and each chemical bond between the atoms, as a training sample, and the specified property is used as a label for the training sample;

Input each training sample into the graph neural network model to be trained, and obtain a number of sample-specified subgraphs corresponding to each training sample output by the graph neural network model to be trained;

Determine the sample features corresponding to the designated subgraphs of each sample, and determine the characterization features of each designated property according to the sample features and the labels corresponding to the training samples;

Determine, according to the features of each sample and the characterization features of each specified property, the fusion features corresponding to each training sample;

Inputting each fusion feature into the classification model to be trained to obtain a sample classification result output by the classification model to be trained;

Determine the sample properties corresponding to each training sample according to the sample classification results of the sample designated subgraphs corresponding to each training sample;

The graph neural network model and the classification model are trained according to the sample properties and annotations corresponding to each training sample.

A prediction device, characterized in that the device comprises:

A first determination module is used to establish a molecular graph with each atom as a node and each chemical bond as an edge according to each atom contained in the acquired molecule to be predicted and each chemical bond between the atoms;

A second determination module is used to input the molecular graph into a pre-trained graph neural network model to obtain a plurality of specified subgraphs of the molecular graph output by the graph neural network model, wherein the specified subgraphs correspond to substructures contained in the molecular structure of the molecule to be predicted;

A fusion module, used for determining, for each specified sub-graph, a specified feature of the specified sub-graph, and determining a fusion feature according to the specified feature and the representation features corresponding to the pre-stored specified properties;

A classification module, used for inputting the fusion feature into a classification model to obtain a target classification result output by the classification model, wherein the target classification result is used to characterize a specified property of a substructure corresponding to the specified subgraph;

The prediction module is used to predict the molecular properties of the molecule to be predicted according to the target classification results corresponding to each designated subgraph.

A computer-readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the method according to any one of claims 1 to 7.

An electronic device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method described in any one of claims 1 to 7 when executing the program.