[go: up one dir, main page]

CN110600085B - Tree-LSTM-based organic matter physicochemical property prediction method - Google Patents

Tree-LSTM-based organic matter physicochemical property prediction method Download PDF

Info

Publication number
CN110600085B
CN110600085B CN201910500140.8A CN201910500140A CN110600085B CN 110600085 B CN110600085 B CN 110600085B CN 201910500140 A CN201910500140 A CN 201910500140A CN 110600085 B CN110600085 B CN 110600085B
Authority
CN
China
Prior art keywords
tree
lstm
physical
molecular
organic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910500140.8A
Other languages
Chinese (zh)
Other versions
CN110600085A (en
Inventor
申威峰
粟杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201910500140.8A priority Critical patent/CN110600085B/en
Publication of CN110600085A publication Critical patent/CN110600085A/en
Application granted granted Critical
Publication of CN110600085B publication Critical patent/CN110600085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method for predicting physical and chemical properties of an organic matter based on Tree-LSTM comprises two parts of generating a prediction model and predicting physical and chemical properties, wherein the method for generating the prediction model comprises the following steps: 1) The molecular structure of the organic matter is normalized and encoded and a tree-shaped data structure (molecular characteristic descriptor) is generated; 2) Training a Tree-LSTM model by using the molecular characteristic descriptors and the physicochemical property experimental data of the organic matters to obtain a sea surface temperature prediction model based on LSTM; the predicted organic physicochemical properties include: and normalizing the molecular structure, encoding and inputting the normalized molecular structure into a prediction model to obtain an output result of the physical and chemical properties of the organic matters. The invention can automatically extract the relation between the molecular structure and the physical and chemical properties of the computer, is more suitable for learning the molecular structure information of various organic matters, and can obtain better prediction results.

Description

Tree-LSTM-based organic matter physicochemical property prediction method
Technical Field
The invention relates to the field of chemistry C07, in particular to a method for predicting chemical substance quantitative structure association properties based on an artificial intelligence technology.
Background
The physicochemical property is basic data closely related to chemistry and chemical engineering, such as critical property, boiling point, generated heat, octanol water partition coefficient and the like are closely related to scientific research and production practice of chemistry and chemical engineering, and predicted values of various scientific and reasonable physicochemical properties can reduce measurement work of the physicochemical property, and save a large amount of manpower and material resources. The physical and chemical property data acquisition is generally difficult to develop due to harsh experimental measurement conditions or objective factors such as easy decomposition of the substance to be measured, and is mainly estimated by a group contribution method and a topological coefficient method based on multiple linear regression at present. However, the group contribution method and the topology coefficient method require manual extraction of molecular structural features before prediction, so that the application range of the two methods is limited.
The Tree-LSTM recurrent neural network is improved on the basis of an LSTM (Long-Short Term Memory) recurrent neural network, the neural network can learn more complex dependency relationships than a sequence structure, autonomously learn the contribution of a molecular Tree topology structure to predicted data for input data, and particularly overcome the defect that other neural networks cannot reproduce atomic connection relationships in molecules, and is more suitable for mining implicit relationships between molecular structures and physical and chemical properties. The existing group contribution method needs to disassemble molecules into different groups (molecular substructure fragments), and can realize the prediction of the physical and chemical properties of the organic matters by using multi-element linear fitting. The group contribution method predicts different resolution schemes of various group contribution methods, and some molecules cannot find a proper resolution scheme, so that prediction is biased or cannot be completed. The existing topology index rule is limited by complex calculation of the topology index and cannot intuitively represent the local structure of the molecule, so that the method has no wider physical and chemical property prediction capability. Thus, no method for predicting the physicochemical properties of the organic matter by using the Tree-LSTM recurrent neural network system alone has emerged.
Disclosure of Invention
The invention provides a method for predicting physical and chemical properties of an organic matter based on Tree-LSTM, which solves the technical problems of the prior art that the prediction range is not wide, the coverage quality is not wide and the prediction precision is not high.
In order to solve the problems, the invention adopts the following technical scheme:
the method comprises the following steps: step A, generating a prediction model; step B, predicting two physical and chemical properties;
the step A comprises the following steps:
a1, acquiring experimental data of physical and chemical properties of an organic matter and molecular structure information of the organic matter, and capturing a large amount of data from various databases by utilizing a web crawler technology;
a2, normalizing the molecular structure of a single organic matter (by a normalization algorithm of a graph), traversing each atom in the single organic matter molecule and generating corresponding atomic characteristic descriptors, sequencing all the atomic characteristic descriptors of the single organic matter molecule according to a dictionary sequence, and taking the smallest atomic characteristic descriptor as a molecular characteristic descriptor;
a3, according to the step A2, generating a molecular characteristic descriptor representing each representative molecular structure standardization diagram and a corresponding linear code according to all the obtained organic molecular structures;
a4, splitting all organic molecules into various chemical bonds, arranging character strings representing the chemical bonds according to each molecule, and generating word vectors for the character strings by adopting a word embedding algorithm;
a5, building a neural network model based on Tree-LSTM, and loading the physicochemical data acquired by A1 and the molecular structure data processed by A2-A4, wherein the Tree-LSTM automatically adapts to the topological shape of the molecular structure standardization diagram. Manually adjusting various super parameters and training a model, and preferentially selecting parameters from the training process to obtain a Tree-LSTM-based organic matter physical and chemical prediction model;
the step B comprises the following steps:
b1, processing the organic molecular structure without experimental data of a certain physical and chemical property by adopting the steps A2-A4, loading the generated characteristic descriptor and code into a physical and chemical property prediction model obtained by A5, and inputting the data of the unknown physical and chemical property predicted by the molecular characteristic descriptor.
As a further refinement, the step A5 includes the following:
a51: building a Tree-LSTM model under a Linux system or a Windows system;
a52: setting the input dimension of the Tree-LSTM and the length of input data; a53: setting the data quantity proportion of a Tree-LSTM training set and a test set; a54: setting a Tree-LSTM model optimizer and a learning rate; a55: setting the width of hidden layer neurons; a56: setting the iteration times of the model; a57: and continuously adjusting parameters, checking the convergence degree of the model according to model loss, and preferentially selecting high convergence degree parameters to form a Tree-LSTM-based physicochemical property prediction model.
Drawings
FIG. 1 is a flow chart of the organic physical and chemical property prediction of the invention;
FIG. 2 is a computational graph of the Tree-LSTM recurrent neural network in predicting the nature of an acetodoxime substance;
fig. 3 is a graph showing the predictive effect of the Tree-LSTM physicochemical property prediction model on the critical temperature of an organic matter, x represents the predicted value, and straight line represents the actual value.
FIG. 4 is an example of molecular characterization descriptor generation with an acetdoxin substance.
Fig. 5 is a coding rule of molecular characterization illustrating the meaning of each bit coding.
Detailed Description
The invention will now be described in detail with reference to the drawings and specific examples, it being pointed out that the examples described below are only for a better understanding of the invention and do not limit the invention itself.
The invention provides a method for predicting physical and chemical properties of an organic matter based on Tree-LSTM, which is shown in figure 1 and comprises the following two steps: step A, generating a prediction model; step B, predicting two physical and chemical properties;
step A, generating a prediction model:
a1, acquiring experimental data of physical and chemical properties of the organic matters and molecular structure information of the organic matters, and capturing a large amount of data from various databases by utilizing a web crawler technology.
The physicochemical properties of the a11 organic matter mainly comprise: critical properties, normal boiling point, transfer properties, self-ignition point, flash point, toxicity, octanol water partition coefficient, biochemical activity, etc.
The A12 molecular structure information mainly takes SMILES expression, SMART expression, MOL file and SDF file as carriers.
A2, normalizing the structure of a single organic molecule, traversing each atom in the single organic molecule and generating corresponding atomic characteristic descriptors, sequencing all the atomic characteristic descriptors of the single organic molecule according to dictionary sequence, taking the smallest atomic characteristic descriptor as a molecular characteristic descriptor and encoding the molecular characteristic descriptor.
A21, generating a normalized graph from the two-dimensional topological graph of the organic matter molecule through a graph normalization algorithm in graph theory so as to realize isomorphic comparison of the molecular graph, wherein Nauty and Faulon graph normalization algorithms can be adopted.
The A22 coding method is as follows:
the first method, directly uses the molecular feature descriptor output by the Faulon normalization algorithm as the code of the organic matter, and is illustrated in fig. 4.
The second method encodes the molecular signature descriptors in a linear encoded format, as exemplified in table 1.
A3, according to the step A2, generating a molecular characteristic descriptor and a corresponding code of each molecule according to the obtained molecular structure information of all the organic matters.
A4, splitting all organic molecules into various chemical bonds, arranging character strings representing the chemical bonds according to each molecule, and generating word vectors for the character strings by adopting a word embedding algorithm.
A5, building a neural network model based on the Tree-LSTM, loading the physicochemical data acquired by the A1 and the molecular structure data processed by the A2-A4, continuously adjusting parameters, and preferentially selecting the parameters to obtain an organic physical chemistry prediction model based on the Tree-LSTM.
The step B comprises the following steps:
b1, processing an organic molecular structure without experimental data of a certain physical and chemical property by adopting the steps A2-A4, loading a generated characteristic descriptor and a code into a physical and chemical property prediction model, and obtaining data of unknown physical and chemical properties;
step A4 further comprises the following:
a41: traversing each molecule in the database, traversing the connected chemical bonds and atoms by taking each atom in each molecule as a starting point, forming a character string like 'A-B', and recording to form the original data. Description: "A" represents the elemental sign of atom A, "B" represents the elemental sign of atom B, and "-" represents the type of chemical bond between atom A and atom B.
A42: splitting character strings in the original data, such as A-B, to form a sub-character string set in three combination modes: combining: "A" and "-B", combination two: "A-" and "B", combination three: "A", "-" and "B".
A43: and building a neural network based on a skip-gram algorithm under a Linux system or a Windows system to obtain an embedded vector of each character string in the character string set obtained by the A42.
As a further refinement, the step A5 includes the following:
a51: building a Tree-LSTM model under a Linux system or a Windows system;
a52: the feature descriptors or linear codes for each molecule are parsed into a tree-shaped data structure and a corresponding embedded vector obtained by A4 is matched for each node (for each atom in the molecule) in the tree-shaped structure.
A52: setting the input dimension of the Tree-LSTM and the length of input data; the input dimension in the present invention is 1 and the length is 50.
A53: setting the data quantity proportion of a Tree-LSTM training set and a test set; the ratio in the present invention is 4:1.
A54: setting a Tree-LSTM model optimizer and a learning rate; the method adopts an Adam algorithm optimizer, and the learning rate is 0.001:
a55: setting the width of each hidden layer neuron;
a56: setting the iteration times of the model;
a57: adjusting hidden ganglion points under the same iteration times, and adjusting iteration times under the same hidden ganglion points, checking the convergence degree of the model according to the model integral loss and the iteration loss, and preferentially selecting high convergence degree parameters to form a Tree-LSTM-based physicochemical property prediction model.
The Tree-LSTM neural network structure is shown in fig. 2.
The Tree-LSTM has two mathematical models, one is a child node addition model and the other is a child node independent model.
The core of the Tree-LSTM is the control unit state c, the control including the forget gate f j Input gate i j Output door o j . Current node j, forget gate f j Responsible for controlling how much of the child node's c is saved to the current node's c j The method comprises the steps of carrying out a first treatment on the surface of the Input gate i j Responsible for controlling how much of the current node's instant state is input to the current cell state c j The method comprises the steps of carrying out a first treatment on the surface of the Current input unit state u j Then control how much new node information is added to the output; output door o j Responsible for controlling the current cell state c j How much hidden layer output h is the current node j . The calculation formulas of the child node addition model are respectively as follows:
f jk =σ(W (f) x j +U (f) h k +b (f) ) (2)
c j =i j ·u j +f j (6)
h j =o j ·tanh(c j ) (7)
wherein W is (f) 、W (i) 、W (o) Weight matrix of forgetting gate, input gate and output gate, b (f) 、b (i) 、b (o) Bias terms of a forget gate, an input gate and an output gate, respectively, and sigma is a sigmoid function. The following is a child node independent model calculation formula:
c j =i j ·u j +f j (14)
h j =o j ·tanh(c j ) (15)
the two models differ in whether or not to pair node h jl Adding, wherein the independent model of the child node adds a parameter to hjl of each child node, and the adding model of the child node is h of the child node jl Sum ofProviding training parameters.
The Tree-LSTM recurrent neural network structure is shown in fig. 3. Inputs to the LSTM include: cell state c of child node jl Hidden layer output value h of child node jl Input value x of current node j The method comprises the steps of carrying out a first treatment on the surface of the The output of LSTM includes: cell state c at the current time j And hidden layer output value h of LSTM at current time j
Wherein the current input unit state u j From input x of the current node j Hidden layer output value h of child node jl (in case of the child node addition model, here the hidden layer output value h of the child node jl Sum of) The calculation formula of the common decision is shown in the formula (4) or the formula (12).
Wherein W is (u) Is a weight matrix of input cell states, b (u) Is an offset term for the state of the input cell, and tanh is a hyperbolic tangent function. Current cell state c j From forgetting door f j (including child node cell state c jl Child node geneticForgetting door f jl ) Input gate i j And the currently entered cell state u j The common decision is that the calculation formula is shown in the formula (4) or (12), wherein the symbol and the expression are multiplied by elements. Hidden layer output value h of current node j From the output gate o j And the current cell state c j The calculation formula of the common decision is shown in the formula (7) or the formula (15).
The Tree-LSTM neural network output is determined by a single-layer or multi-layer neural network, for example, the single-layer neural network is used as an output layer, and the calculation formula is as follows:
p i =w*h ij +b (16)
properties p of the ith component i The Tree-LSTM neural network output hj of the root node of the Tree structure represented by the molecular signature of the component is related, w and b are trainable parameters.
In the invention, mean Square Error (MSE) or Mean Absolute Error (MAE) is adopted as a loss function (loss), and the calculation formula is as follows:
wherein N is the number of samples, x exp For observations, x prep Is a predicted value.
Experimental example
The effect of the Tree-LSTM based physicochemical property prediction method will be exemplified below. Taking the critical temperature of the organic matter as an example, the property is taken as basic data of various thermodynamic models and physical property estimation models, and the prediction of the property has certain practical and representative significance.
And acquiring experimental data of the critical temperature and molecular structure information of corresponding substances, wherein the total of 1759 organic substances are obtained, 1407 substances are used as training sets, and 352 substances are used as test sets.
The construction principle of the molecular feature descriptor is illustrated by taking an acetadome in a sample as an example, and is shown in fig. 4 in detail. The molecular characteristic descriptor is a data structure for storing molecular structure information, which is formed by taking one atom in a molecule as a starting point and expanding the molecule according to a tree structure. The acetate in this example starts with a carbon atom marked zero. Starting from this root atom C0, a predetermined distance (or height) is searched downward and the atoms encountered on the path, and the type of chemical bond attached to the atom, are recorded to record the characteristics of the molecule. From the root atom, all atoms in the molecule are traversed to obtain an atomic feature descriptor. If different root atoms are selected, different atomic feature descriptors are generated, and the feature descriptors are arranged in descending order according to dictionary sequence, wherein the first one is a molecular feature descriptor. Fig. 4 depicts using an acetate axime as an example: the tree-like expansion and atomic characterization of (a) molecular structure (B) molecular structure as atomic characterization descriptors for different heights (C) from height=0 and height=1. Wherein the sub-atoms of an atom are shown by nested brackets, and when the type of chemical bond is not specified, the sub-atoms of the atomic characterization Fu Zhongyuan are shown as single bonds. Otherwise, the chemical bond is represented as follows ("=" is a double bond; "#" is a triple bond; ":" is an aromatic bond.)
In order to store molecular characteristic descriptors conveniently, the invention develops a linear code to represent a tree-shaped unfolding structure of molecules, the linear code of the molecular characteristic descriptors of various depths is shown in a table 1, each atom is separated in a character string by 'I', and the meanings of numerals and letters in the atoms are shown in figure 5. The first atom and the root atom, whose current depth is 0, are denoted by "S", and have no parent atom, so that the parent atom is encoded as "S", and the chemical bond connected with the parent atom is not present, so that the encoding is also "S".
1759 organic matters are converted into molecular characteristic descriptors and are subjected to linear coding. Before inputting these substances into the neural network, they are parsed into a tree structure and the embedding vectors obtained in step a43 are associated for each node (atom) therein. For each molecule in the sample, each atom corresponds to each node in the Tree-LSTM neural network, and the embedded vector of each atom is the input vector of the node. In the case of 300 initial iterations, the number of output layer nodes is continuously adjusted, and finally 128 output layer nodes are taken as the optimal value in this example. The neural network structure of the Tree-LSTM is determined by the molecules of each organic matter, is a dynamic neural network, and is self-adaptive to the topological structures of different molecules. In this example, in the first 300 training, the learning rate is 0.008, and then the learning rate is adjusted to 0.00001 training 5000 times. To prevent overfitting, the calculation is terminated early when the loss function value is no longer decreasing. Finally, the prediction results in table 3 are obtained, and the higher the coincidence degree of the experimental value and the prediction value in the table is, the better the prediction effect is. Table 2 shows the statistical evaluation parameters of the Tree-LSTM neural network for training and prediction of critical temperatures for organics. In fig. 3, the x represents the predicted value, and the straight line represents the experimental value, so that it can be seen that for most data points, the present invention obtains a better prediction effect by using Tree-LSTM.
Table 1 molecular characterization Fu Xianxing coding examples
TABLE 2 statistical parameters for training and predicting critical temperatures of organics
TABLE 3 prediction results of critical temperature fraction of organics
Comparing the present invention with the representative methods of the radical contribution method, joback and Constantinou-Gani (CG) methods, the following results were obtained under the same bill of materials, as shown in table 4:
TABLE 4 comparison of the predictive power of the invention with classical radical contribution
The bill of materials used for comparison in Table 4 contains 460 total materials, and the Joback method predicts only 352 materials among them, and when the prediction method of the present invention is used for predicting the 352 materials, the prediction method of the present invention shows a method due to Joback. The CG method predicts a smaller and slightly worse amount of predictable material than the present invention. When the present invention predicts all of the species of the bill of species, 452 species of species therein can be covered and an acceptable accuracy is achieved. Superscript a indicates all predictable materials and superscript b indicates materials having a number of carbon atoms greater than 3.

Claims (2)

1. A method for predicting physical and chemical properties of organic matters based on Tree-LSTM is characterized in that a molecular diagram of the organic matters is converted into a standard diagram so as to be convenient for a computer to identify and learn, thereby enabling the computer to capture structural characteristics of the molecules, and enabling the computer to correlate the characteristics with the organic matters and the physical or chemical properties, and finally, predicting the properties of the matters is realized, and the method comprises the following steps: step A, producing a prediction model; step B, predicting two physical and chemical properties;
the step A comprises the following steps:
a1, acquiring experimental data of physical and chemical properties of an organic matter and molecular structure information of the organic matter, and capturing a large amount of data from various databases by utilizing a web crawler technology;
a2, carrying out single organic molecule structure standardization through a graph standardization algorithm, traversing each atom in a single organic molecule, generating corresponding atomic characteristic descriptors, sequencing all the atomic characteristic descriptors of the single organic molecule according to dictionary sequence, and taking the smallest atomic characteristic descriptor as a molecular characteristic descriptor;
a3, according to the step A2, generating a molecular characteristic descriptor and a corresponding code of each molecule according to the obtained molecular structure information of all the organic matters;
a4, splitting all organic molecules into various chemical bonds, arranging character strings representing the chemical bonds according to each molecule, and generating word vectors for the character strings by adopting a word embedding algorithm;
a5, building a neural network model based on Tree-LSTM, and loading the physicochemical data acquired by A1 and the molecular structure data processed by A2-A4, wherein the Tree-LSTM automatically adapts to the topological shape of the molecular structure standardization diagram; manually adjusting various super parameters and training a model, and preferentially selecting parameters from the training process to obtain a Tree-LSTM-based organic matter physical and chemical prediction model;
the step B comprises the following steps:
b1, processing the organic molecular structure without experimental data of a certain physical and chemical property by adopting the steps A2-A4, loading the generated characteristic descriptor and code into a physical and chemical property prediction model obtained by A5, and inputting the data of the unknown physical and chemical property predicted by the molecular characteristic descriptor.
2. The method for predicting physicochemical properties of an organic matter based on Tree-LSTM according to claim 1, wherein said step A5 comprises the steps of:
a51: building a Tree-LSTM-based neural network under a Linux system or a Windows system; a52: setting the input dimension of the Tree-LSTM and the length of input data; a53: setting the data quantity proportion of a Tree-LSTM training set and a test set; a54: setting a Tree-LSTM model optimizer and a learning rate; a55: setting the width of hidden layer neurons; a56: setting the iteration times of the model; a57: and continuously adjusting parameters, checking the convergence degree of the model according to model loss, and preferentially selecting high convergence degree parameters to form a Tree-LSTM-based physicochemical property prediction model.
CN201910500140.8A 2019-06-01 2019-06-01 Tree-LSTM-based organic matter physicochemical property prediction method Active CN110600085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910500140.8A CN110600085B (en) 2019-06-01 2019-06-01 Tree-LSTM-based organic matter physicochemical property prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910500140.8A CN110600085B (en) 2019-06-01 2019-06-01 Tree-LSTM-based organic matter physicochemical property prediction method

Publications (2)

Publication Number Publication Date
CN110600085A CN110600085A (en) 2019-12-20
CN110600085B true CN110600085B (en) 2024-04-09

Family

ID=68852617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910500140.8A Active CN110600085B (en) 2019-06-01 2019-06-01 Tree-LSTM-based organic matter physicochemical property prediction method

Country Status (1)

Country Link
CN (1) CN110600085B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524557B (en) * 2020-04-24 2024-04-05 腾讯科技(深圳)有限公司 Inverse synthesis prediction method, device, equipment and storage medium based on artificial intelligence
CN111710375B (en) * 2020-05-13 2023-07-04 中国科学院计算机网络信息中心 A method and system for predicting molecular properties
CN111899807B (en) * 2020-06-12 2024-05-28 中国石油天然气股份有限公司 Molecular structure generation method, system, equipment and storage medium
CN111899814B (en) * 2020-06-12 2024-05-28 中国石油天然气股份有限公司 Single molecule and mixture physical property calculation method, device and storage medium
CN115171807B (en) * 2022-09-07 2022-12-06 合肥机数量子科技有限公司 Molecular coding model training method, molecular coding method and molecular coding system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108836A (en) * 2017-12-15 2018-06-01 清华大学 A kind of ozone concentration distribution forecasting method and system based on space-time deep learning
CN109033738A (en) * 2018-07-09 2018-12-18 湖南大学 A kind of pharmaceutical activity prediction technique based on deep learning
CN109476721A (en) * 2016-04-04 2019-03-15 英蒂分子公司 CD8- specificity capturing agent, composition and use and preparation method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150017694A1 (en) * 2008-11-06 2015-01-15 Kiverdi, Inc. Engineered CO2-Fixing Chemotrophic Microorganisms Producing Carbon-Based Products and Methods of Using the Same
US10430685B2 (en) * 2016-11-16 2019-10-01 Facebook, Inc. Deep multi-scale video prediction
US10699185B2 (en) * 2017-01-26 2020-06-30 The Climate Corporation Crop yield estimation using agronomic neural network
EP3474201A1 (en) * 2017-10-17 2019-04-24 Tata Consultancy Services Limited System and method for quality evaluation of collaborative text inputs

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109476721A (en) * 2016-04-04 2019-03-15 英蒂分子公司 CD8- specificity capturing agent, composition and use and preparation method
CN108108836A (en) * 2017-12-15 2018-06-01 清华大学 A kind of ozone concentration distribution forecasting method and system based on space-time deep learning
CN109033738A (en) * 2018-07-09 2018-12-18 湖南大学 A kind of pharmaceutical activity prediction technique based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
深度神经网络在化学中的应用研究;秦琦枫等;《江西化工》;20180615(第03期);1-4页 *

Also Published As

Publication number Publication date
CN110600085A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN110600085B (en) Tree-LSTM-based organic matter physicochemical property prediction method
Guo et al. A just-in-time modeling approach for multimode soft sensor based on Gaussian mixture variational autoencoder
CN111461157A (en) Self-learning-based cross-modal Hash retrieval method
CN109063416A (en) Gene expression prediction technique based on LSTM Recognition with Recurrent Neural Network
Kuok et al. Broad Bayesian learning (BBL) for nonparametric probabilistic modeling with optimized architecture configuration
CN114066036B (en) Cost prediction method and device based on self-correction fusion model
CN117524353B (en) A molecular macromodel based on multi-dimensional molecular information, construction method and application
CN113570161B (en) Method for constructing stirred tank reactant concentration prediction model based on width transfer learning
CN118153438B (en) Wind pressure field prediction method and device for dependent low building
CN117877621A (en) A drug response prediction method based on multi-source heterogeneous networks
CN114373093A (en) Fine-grained image classification method based on direct-push type semi-supervised deep learning
Tanaka et al. Automated structure discovery and parameter tuning of neural network language model based on evolution strategy
Peng et al. Application of non-Gaussian feature enhancement extraction in gated recurrent neural network for fault detection in batch production processes
CN112199884A (en) Item molecule generation method, device, equipment and storage medium
Wang et al. A Novel Multi‐Input AlexNet Prediction Model for Oil and Gas Production
CN119128721A (en) Gas classification and recognition method based on graph neural network based on edge labeling framework
CN114841114B (en) High-energy-efficiency capacitance extraction method based on machine learning
CN117891226A (en) Abnormality detection method and system for spacecraft control system based on MIC and graph neural network fusion
CN104346448A (en) Incremental data mining method based on genetic programming algorithm
Huang et al. Nonlinear model order selection: a gmm clustering approach based on a genetic version of em algorithm
Feinauer et al. Mean dimension of generative models for protein sequences
CN107451537B (en) Face recognition method based on deep learning multi-layer non-negative matrix factorization
Akyildiz et al. Probabilistic sequential matrix factorization
Urhan et al. Soft-sensor design for a crude distillation unit using statistical learning methods
CN115311521B (en) Black-box video adversarial sample generation method and evaluation method based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant