JP7387962B2

JP7387962B2 - Intelligent generation method of drug molecules based on reinforcement learning and docking

Info

Publication number: JP7387962B2
Application number: JP2022543606A
Authority: JP
Inventors: 魏志強; 王茜; 劉昊; 李陽陽; 王卓亜
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2021-07-09
Filing date: 2021-07-21
Publication date: 2023-11-29
Anticipated expiration: 2041-07-21
Also published as: WO2023279436A1; CN113488116B; JP2023531846A; CN113488116A

Description

本発明は医薬品化学及びコンピュータの技術分野に関し、具体的には、強化学習及びドッキングに基づく医薬品分子のインテリジェント生成方法に関する。 The present invention relates to the technical field of medicinal chemistry and computing, and in particular to a method for intelligent generation of pharmaceutical molecules based on reinforcement learning and docking.

医薬品化学の分野では、安全で効果的な化合物の設計や製造は鍵である。これは時間やお金がかかり、複雑で困難であり、複数のパラメータを最適化させるプロセスである。有望の化合物でも臨床試験において失敗していまうリスクが高く（＞９０％）、その結果、不要な資源浪費をもたらす。現在、１種の新薬を市販するまでには平均コストが１０億ドルを遥かに上回っており、発見から市販まで平均で１３年がかかる。医薬品の場合は、発見から商業的な生産までは時間がよりかかり、例えば、高エネルギー分子は２５年を必要とする。分子を発見するための重要なステップは計算研究又は合成と特徴付け用の候補を生成することである。これは非常に困難なタスクであり、可能な分子の化学空間が巨大であり、すなわち、潜在的な医薬品類似化合物の数が１０^２３～１０^６０種類であり、合成された全ての化合物の数が約１０^８個の桁であるためである。リピンスキーによる薬学における「５つの規則」などヒューリスティック手法が、可能な空間を絞り込むが、大きな課題に直面している。 In the field of medicinal chemistry, the design and production of safe and effective compounds is key. This is a time consuming, expensive, complex and difficult process that requires optimization of multiple parameters. Even promising compounds have a high risk (>90%) of failing in clinical trials, resulting in unnecessary wastage of resources. Currently, the average cost to bring a new drug to market is well over $1 billion, and it takes an average of 13 years from discovery to market. Pharmaceuticals take longer to go from discovery to commercial production, for example, high-energy molecules require 25 years. An important step in discovering molecules is generating candidates for computational studies or synthesis and characterization. This is a very difficult task, as the chemical space of possible molecules is huge, i.e. the number of potential drug-like compounds is 10 ²³ to 10 ⁶⁰ , and the number of all synthesized compounds is This is because there are approximately 10 ⁸ digits. Heuristic methods such as Lipinski's "Five Rules" for pharmaceutical science narrow down the space of possibilities, but face major challenges.

コンピュータ技術の革命により、ＡＩを使った創薬がトレンドになりつつある。従来、この目的を達成するために、定量的構造－活性関係（ＱＳＡＲ）、分子置換、分子シミュレーション、分子ドッキングなど、さまざまな計算モデルの組み合わせが用いられてきた。しかし、従来の方法は本質的に組み合わせられたものであり、多くの分子の不安定性や合成不可能性を招くことが多い。近年、深層学習モデルに基づいて薬物に類似した化合物を設計するための生成モデルが多く登場しており、例えば、変分オートエンコーダによる分子生成法や、生成的敵対的ネットワークによる分子生成法などがある。しかし、現在の方法は候補化合物の生成速度、有効性や分子活性の面でまだ改良の余裕がある。 Due to the revolution in computer technology, drug discovery using AI is becoming a trend. Traditionally, combinations of various computational models have been used to achieve this goal, including quantitative structure-activity relationships (QSAR), molecular replacement, molecular simulation, and molecular docking. However, conventional methods are combinatorial in nature and often lead to instability or inability to synthesize many molecules. In recent years, many generative models for designing drug-like compounds based on deep learning models have appeared, such as molecule generation methods using variational autoencoders and molecule generation methods using generative adversarial networks. be. However, current methods still have room for improvement in terms of the production rate, efficacy, and molecular activity of candidate compounds.

本発明は、Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデル及びドッキングシミュレーションに基づいて、最適な性質を有する新しい医薬品分子を生成する、強化学習及びドッキングに基づく医薬品分子のインテリジェント生成方法を提供する。Ａｃｔｏｒネットワークには双方向トランスフォーマーエンコーダメカニズム及びＤｅｎｓｅＮｅｔネットワークによるモデリングが使用される。 The present invention provides an intelligent generation method of drug molecules based on reinforcement learning and docking, which generates new drug molecules with optimal properties based on an actor-critical reinforcement learning model and docking simulation. The Actor network uses a bidirectional transformer encoder mechanism and modeling with a DenseNet network.

上記の問題を解决するために、本発明は、以下の技術案によって達成される。
強化学習及びドッキングに基づく医薬品分子のインテリジェント生成方法は、具体的には、
医薬品設計のための仮想フラグメントコンビネーションライブラリを構築するステップ１であって、
医薬品分子仮想フラグメントコンビネーションライブラリは従来のツールキットによって１組の分子をフラグメント化したものであり、分子を分割する際に、フラグメントは分類されず、全て同じものと取り扱われるステップ１と、
フラグメント類似性を計算して分子フラグメントコーディングを行うステップ２であって、
化学類似性を計算する従来の組み合わせ方法によって異なる分子フラグメントの間の類似性を測定し、類似性に基づく平衡二分木を構築することによって、全てのフラグメントを２進文字列にコーディングし、類似するフラグメントについて類似するコーディングを付与するステップ２と、
Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいて分子を生成して最適化するステップ３であって、
（１）Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づくフレームワークの説明
Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいて分子を生成して最適化し、分子の単一のフラグメント及び該フラグメント記述における１ｂｉｔを選択して変更を行い、当該ビットでの値を入れ替えて、すなわち、０であれば、１に変更し、逆の場合にも同様であり、分子に用いられる変化の度合いを追跡することを可能とし、コーディングされるリードビットを一定に維持し、これにより、モデルでは末端でのビット変更のみを許可し、モデルが既知の化合物付近の分子しか検索でいないようにし、
Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいてフラグメント化される分子状態、すなわち、現在の状態から始まり、Ａｃｔｏｒは全てのフラグメントを抽出してチェックし、異なるフラグメントの分子での位置情報を導入し、トランスフォーマーエンコーダメカニズムを利用してそれぞれの分子の各フラグメントのアテンション係数を計算し、次に、ＤｅｎｓｅＮｅｔネットワーク出力確率によって置換対象のフラグメント及び置換用のフラグメントを決定し、全ての制約に対する新しい状態の満足度に従って、新しい状態を採点し、ｃｒｉｔｉｃは、次に、新しい状態と現在の状態の価値から増加させる報酬の間の差ＴＤ－Ｅｒｒｏｒがａｃｔｏｒに供給されるか否かを調べて、ＹＥＳの場合、ａｃｔｏｒのアクションが強化され、ＮＯの場合、アクションが阻止され、次に、現在の状態を新しい状態で置換し、このプロセスを所定の回数繰り返し、
（２）強化学習モデルの報酬メカニズムの最適化
分子自体の固有属性情報及び分子計算活性情報の２つの特性について最適化された分子を設計し、強化学習モデルの報酬メカニズム部分はパーセプトロンモデルを構築することで報酬結果の予測を行い、パーセプトロンモデルは訓練と予測の２つの段階を含み、訓練過程では、データセットは、従来の文献報告により活性を有するものとして知られている分子由来のデータセットの陽性サンプルと、同じ数量のＺＩＮＣライブラリからランダムにサンプリングしたものに由来するデータセットの陰性サンプルとの２つの由来を含み、陽性サンプル及び陰性サンプルの順序を乱したものを順次ドッキングして得られた計算活性情報及び従来のツールキットによって算出された分子固有属性情報を入力として、複数の訓練によってモデルは活性計算情報及び属性情報と本当に活性があるか否かとの潜在的な相関関係を学習し、予測過程では、該モデルは、先進的かつ効率的な医薬品ドッキングソフトウェアを用いて生成分子と疾患に関連する標的の従来の関連ＰＤＢファイルとについて仮想分子ドッキングを行って得られる生成分子の計算活性情報と、汎用ソフトウェアパッケージを用いて計算された生成分子の固有属性情報とを入力として、生成分子が実際の活性を有するか否かを予測し、生成される分子の活性をさらに最適化させ、強化学習モデルのＡｃｔｏｒは、有効な分子を生成するごとに報酬が付与され、工夫して予測モデルの期待に合致する分子を取得した場合、より高い報酬が付与されるステップ３とを含む。 In order to solve the above problems, the present invention is achieved by the following technical solution.
Specifically, the intelligent generation method of drug molecules based on reinforcement learning and docking is
Step 1 of constructing a virtual fragment combination library for drug design, comprising:
A drug molecule virtual fragment combination library is a set of molecules fragmented using a conventional toolkit, and when dividing a molecule, the fragments are not classified and are all treated as the same (step 1);
Step 2 of calculating fragment similarity and performing molecular fragment coding,
Measure the similarity between different molecular fragments by traditional combinatorial methods to calculate chemical similarity, and code all fragments into binary strings to make them similar by constructing a balanced binary tree based on similarity. Step 2 of assigning similar coding to the fragments;
Step 3 of generating and optimizing molecules based on an actor-critical reinforcement learning model,
(1) Description of the framework based on the actor-critic reinforcement learning model Generate and optimize a molecule based on the actor-critic reinforcement learning model, select a single fragment of the molecule and 1 bit in the fragment description, and change it. and transpose the value in that bit, i.e. if it is 0, change it to 1, and vice versa, making it possible to track the degree of change used in the numerator and coded We keep the lead bits constant, which allows the model to only change bits at the ends, ensuring that the model only searches for molecules near known compounds,
The molecular state to be fragmented based on the Actor-critical reinforcement learning model, i.e. starting from the current state, the Actor extracts and checks all fragments, introduces the position information in the molecule of different fragments, and transforms the encoder The mechanism is used to calculate the attention coefficient of each fragment of each molecule, and then the fragment to be replaced and the fragment for replacement are determined by the DenseNet network output probability, according to the new state's satisfaction with all constraints. Scoring the new state, the critic then checks whether the difference TD-Error between the new state and the reward to be increased from the value of the current state is provided to the actor, and if YES, the actor's the action is reinforced, if NO, the action is blocked, then replacing the current state with a new state and repeating this process a predetermined number of times;
(2) Optimization of the reward mechanism of the reinforcement learning model A molecule is designed that is optimized for the two characteristics of the molecule itself, unique attribute information and molecular calculation activity information, and a perceptron model is constructed for the reward mechanism part of the reinforcement learning model. The perceptron model includes two stages: training and prediction, and in the training process, the dataset is derived from molecules known to have activity according to previous literature reports. The data set contains two origins: positive samples and negative samples derived from random samples from the same quantity of ZINC library, obtained by sequential docking of positive and negative samples out of order. Using calculated activity information and molecule-specific attribute information calculated by conventional toolkits as input, the model learns the potential correlation between the activity calculation information and attribute information and whether or not it really has activity through multiple trainings, In the prediction process, the model uses the computational activity information of the generated molecules obtained by performing virtual molecular docking between the generated molecules and conventional associated PDB files of disease-related targets using advanced and efficient drug docking software. and unique attribute information of the produced molecule calculated using a general-purpose software package, predict whether the produced molecule has actual activity or not, and further optimize and strengthen the activity of the produced molecule. The actor of the learning model includes step 3 in which a reward is given each time a valid molecule is generated, and a higher reward is given if a molecule that meets the expectations of the prediction model is obtained through devising.

さらに、前記ステップ１では、分子分割において、１つの環原子から延伸している全ての単結合が破壊され、分割を分子するときのフラグメントチェーンリストが作成されて元の分割点を記録して記憶し、後の分子設計における連結点として機能し、ライゲーションポイントの総数が一定であれば、ライゲーションポイント数の異なるフラグメントの交換を可能とし、この過程においてオープンソースツールキットＲＤＫｉｔを用いて分子開裂を行い、重原子が１２個を超える断片が捨てられ、４個以上のライゲーションポイントを有する断片も捨てられ、
さらに、前記ステップ２では、フラグメントの間の類似性計算において、「医薬品類似」分子を比較する際には、具体的には、最大共通下部構造Ｔａｎｉｍｏｔｏ－ＭＣＳ（ＴＭＣＳ）を用いて類似性を比較し、小さなフラグメントの場合、レーベンシュタイン距離を改良したダメラウ・レーベンシュタイン距離を導入し、この場合、２つの文字列の間のダメラウ・レーベンシュタイン距離を以下のように定義し、
２つの分子Ｍ１とＭ２との間のＴＭＣＳ距離を以下のように定義し、
この場合、２つの分子Ｍ１とＭ２との間の類似性、及び対応するｓｍｉｌｅｓ表記Ｓ１及びＳ２、すなわち
、を測定する。 Furthermore, in step 1, all single bonds extending from one ring atom are broken during molecule splitting, and a fragment chain list is created to record and store the original splitting point. It functions as a connecting point in later molecular design, and if the total number of ligation points is constant, it allows the exchange of fragments with different numbers of ligation points, and in this process, the open source tool kit RDKit is used to perform molecule cleavage. , fragments with more than 12 heavy atoms are discarded, fragments with more than 4 ligation points are also discarded,
Furthermore, in step 2, when comparing "drug-like" molecules in the similarity calculation between fragments, the maximum common substructure Tanimoto-MCS (TMCS) is used to specifically compare the similarity. However, for small fragments, we introduce the Damerau-Levenshtein distance, which is an improved version of the Levenshtein distance, and in this case, we define the Damerau-Levenshtein distance between two strings as follows,
Define the TMCS distance between two molecules M1 and M2 as follows,
In this case, the similarity between the two molecules M1 and M2 and the corresponding smiles notations S1 and S2, i.e.
, to measure.

さらに、前記ステップ２では、分子フラグメントコードにおいて、前記文字列はフラグメント類似性に基づく平衡二分木を構築することにより作成され、次に、該木は各フラグメントに２進文字列を生成するものであり、その延伸において分子を表記する２進文字列を生成し、ライゲーションポイントの順序はそれぞれのフラグメントの識別子とされ、木を集合する際には、全てのフラグメントの間の類似性を計算し、次に、ボトムアップ型貪欲法によってフラグメントペアを形成し、ここでは、まず最も類似する２つのフラグメントをペアとし、次に、この過程を繰り返して、フラグメントが最も類似している２対を連結して４リーフ付き新木を形成し、測定の結果、算出した２つのサブ木の間の類似性はこれらの木のいずれか２つのフラグメントの間の最大類似性であり、
全てのフラグメントが単一の木に連結されるまで連結過程を繰り返し、
全てのフラグメントが二分木に記憶されると、前記二分木を用いて全てのフラグメントについてコードを生成し、
ルートからフラグメントを記憶するリーフまでの経路からそれぞれのフラグメントのコードを決定し、木のそれぞれの分岐については、左向きであれば、コードに１を追加し（「１」）、右向きであれば、０を追加し（「０」）、このようにして、コードの最右の文字がフラグメントに最も近い分岐に対応するようになる。 Further, in step 2, in the molecular fragment code, the string is created by constructing a balanced binary tree based on fragment similarity, and then the tree generates a binary string for each fragment. In its stretching, a binary string representing the molecule is generated, the order of the ligation points is taken as an identifier for each fragment, and when assembling the tree, the similarity between all fragments is calculated, Next, fragment pairs are formed using a bottom-up greedy method, where the two most similar fragments are first paired, and then this process is repeated to connect the two most similar pairs of fragments. As a result of measurement, the calculated similarity between two sub-trees is the maximum similarity between any two fragments of these trees,
Repeat the concatenation process until all fragments are concatenated into a single tree,
Once all fragments are stored in a binary tree, generate code for all fragments using the binary tree;
Determine the code of each fragment from the path from the root to the leaf that stores the fragment, and for each branch of the tree, add 1 ('1') to the code if it is pointing to the left, and if it is pointing to the right, Add a 0 ('0'), so that the rightmost character of the code corresponds to the branch closest to the fragment.

従来技術に比べて、本発明の有益な効果は以下のとおりである。
本発明は、Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデル及びドッキングシミュレーション方法に基づいて、新規分子を生成する。該モデルは、所望の性質を付与するためにどのように分子を修飾して改良するかを学習する。
（１）従来の強化学習方法と異なり、本発明は、如何にリード化合物のフラグメントを変換することによって、従来の化合物に近い構造の新規化合物を生成し、検索対象の化学空間を絞り込むかに着目する。
（２）本発明は、Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいて、Ａｃｔｏｒネットワークには双方向トランスフォーマーエンコーダメカニズム及びＤｅｎｓｅＮｅｔネットワークによるモデリングを利用し、様々なフラグメントの分子での位置情報を導入し、トランスフォーマーエンコーダメカニズムを利用してそれぞれの分子の各フラグメントのアテンション係数を計算し、フラグメントの分子での相対位置又は絶対位置情報を保存することで、並行訓練を実現する。
（３）強化学習の報酬メカニズムによって単層パーセプトロンモデルが作成され、該モデルの入力は、分子関連属性情報と活性情報との２つの部分の情報を含み、該活性情報は、ドッキングソフトウェアを用いて生成分子と疾患関連標的とについて分子ドッキングを行うことにより得られ、生成される分子の活性はさらに最適化させる。
（４）本発明の方法では、候補生成物の規模については、特定の疾患に対応する標的に対しては、２００万以上の候補生成分子の生成が予測される。
（５）本発明の方法では、分子ドッキング部分によって１０００個以上の超高次元パラメータが追加され、分子活性と関連属性情報が融合され、最適化させた８０％以上の高品質ＡＩ分子が生成され得る。
（６）本発明の方法は大規模なスーパーコンピューティングプラットフォームに依拠し、分子生成速度が顕著に向上する。 Compared with the prior art, the beneficial effects of the present invention are as follows.
The present invention generates new molecules based on an actor-critical reinforcement learning model and docking simulation method. The model learns how to modify and improve molecules to impart desired properties.
(1) Unlike conventional reinforcement learning methods, the present invention focuses on how to generate new compounds with structures close to conventional compounds by converting fragments of lead compounds and narrow down the chemical space to be searched. do.
(2) The present invention is based on an actor-critical reinforcement learning model, uses a bidirectional transformer encoder mechanism and modeling by a DenseNet network in the actor network, introduces position information in molecules of various fragments, and transforms the transformer encoder Parallel training is achieved by calculating the attention coefficient of each fragment of each molecule using a mechanism and storing the relative or absolute position information of the fragments in the molecule.
(3) A single-layer perceptron model is created by the reward mechanism of reinforcement learning, and the input of the model includes two parts of information: molecule-related attribute information and activity information, and the activity information is generated using docking software. The activity of the produced molecules is further optimized by performing molecular docking of the produced molecules and disease-related targets.
(4) In terms of the scale of candidate products, the method of the present invention predicts the production of 2 million or more candidate molecules for a target corresponding to a specific disease.
(5) In the method of the present invention, more than 1000 ultra-high-dimensional parameters are added by the molecular docking part, molecular activity and related attribute information are fused, and more than 80% of optimized high-quality AI molecules are generated. obtain.
(6) The method of the present invention relies on a large-scale supercomputing platform, which significantly increases the rate of molecule production.

Ｍｐｒｏ関連化合物の仮想分子フラグメントライブラリである。A virtual molecular fragment library of Mpro-related compounds. Ｍｐｒｏ関連化合物の全てのフラグメントを含む二分木のサブ部分である。A subpart of a binary tree containing all fragments of Mpro-related compounds. Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルのフレームワーク図である。FIG. 2 is a framework diagram of an actor-critical reinforcement learning model. Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルにおけるａｃｔｏｒの詳細な情報である。This is detailed information on actors in the actor-critical reinforcement learning model. 新型コロナウイルスＭｐｒｏ標的に対する活性化合物分子の生成である。Generation of active compound molecules against the novel coronavirus Mpro target.

以下、実施例によって図面を参照しながら本発明の技術案をさらに説明するが、本発明の特許範囲は実施例を何ら限定するものではない。 Hereinafter, the technical solution of the present invention will be further explained using examples with reference to the drawings, but the patent scope of the present invention is not limited to the examples in any way.

実施例１
本実施例は、主として、新型コロナウイルスのＭｐｒｏ標的に対する活性化合物の生成を目的とし、１組の出発リード化合物を基にして、これらのフラグメントの一部を置換することでこれらの分子を改良して最適化させ、所望の性質を有するＭｐｒｏを標的とする新規活性化合物を生成する。本実施例では、Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデル及びドッキングシミュレーション方法に基づいて、最適な性質を有する新規医薬品分子を生成する。以下、本実施例の技術案について詳細に説明する。 Example 1
This example is primarily aimed at generating active compounds against the Mpro target of the novel coronavirus, based on a set of starting lead compounds and improving these molecules by substituting some of their fragments. to generate new active compounds targeting Mpro with desired properties. In this example, a new drug molecule with optimal properties is generated based on an actor-critical reinforcement learning model and a docking simulation method. The technical proposal of this embodiment will be described in detail below.

Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデル及びドッキングに基づく医薬品分子のインテリジェント生成方法であって、具体的には、下記のステップ１～ステップ３を含む。 The present invention is a method for intelligently generating drug molecules based on an actor-critical reinforcement learning model and docking, and specifically includes steps 1 to 3 below.

ステップ１．医薬品設計のための仮想フラグメントコンビネーションライブラリを構築する。 Step 1. Building a virtual fragment combination library for drug design.

医薬品分子仮想フラグメントコンビネーションライブラリは１組の分子をフラグメント化したものである。本実施例の仮想フラグメントライブラリは、図１に示すように、医薬品化学データベースであるＣｈＥＭＢＬデータベースからのＭｐｒｏ標的に関連する１０１７２個の化合物と、実験室において分子ドッキングによりスクリーニングされたＭｐｒｏを標的とする１７５個のリード化合物とから構成される。分子のフラグメント化の通常の方法は、分子を環構造、側鎖やリーガーなどのものに分けることである。本発明では、フラグメントを分類しない以外、分子分割は略同じ手段に従って行われる。このため、全てのフラグメントは同じものとして取り扱われる。分子を切断するために、１つの環原子から延伸している全ての単結合が破壊される。分子を分割する際に、フラグメントチェーンリストが作成されて元の分割点を記録して記憶し、後の分子設計における連結点として機能する。ライゲーションポイントの総数が一定であれば、ライゲーションポイント数の異なるフラグメントの交換を可能とする。この過程において、分子開裂は従来の化学情報学のオープンソースツールキットＲＤＫｉｔによって行われる。この過程において、重原子が１２個を超える断片が捨てられ、４個以上のライゲーションポイントを有する断片も捨てられる。これらの制約は面白い候補対象を多く生成することを維持しながら複雑さを低減させるためである。 A drug molecule virtual fragment combination library is a fragmented set of molecules. As shown in Figure 1, the virtual fragment library of this example includes 10172 compounds related to the Mpro target from the ChEMBL database, which is a medicinal chemistry database, and the Mpro target screened by molecular docking in the laboratory. It consists of 175 lead compounds. A common method of molecular fragmentation is to separate the molecule into ring structures, side chains, leaguers, etc. In the present invention, molecular resolution is performed according to substantially the same means, except that fragments are not sorted. Therefore, all fragments are treated as the same. To cleave a molecule, all single bonds extending from one ring atom are broken. When a molecule is split, a fragment chain list is created to record and remember the original split points and serve as linking points in later molecule designs. If the total number of ligation points is constant, it is possible to exchange fragments with different numbers of ligation points. In this process, molecular cleavage is performed by the conventional cheminformatics open source toolkit RDKit. In this process, fragments with more than 12 heavy atoms are discarded, and fragments with more than 4 ligation points are also discarded. These constraints are intended to reduce complexity while still generating a large number of interesting candidate objects.

ステップ２．フラグメント類似性を計算して分子フラグメントコーディングを行う。 Step 2. Perform molecular fragment coding by calculating fragment similarities.

ステップ２．１フラグメント間んお類似性の計算
本実施例では、全てのフラグメントは２進文字列としてコーディングされ、なお、コーディングは類似するフラグメントが類似するコードを得ることを目的とする。このため、フラグメントの間の類似性についての測定が行わなければならない。化学類似性を計算する方法が多くある。分子の指紋は直接的な２進コードであり、ここでは、類似する分子は原則的には類似するコードが付与される。ただし、分子フラグメント及びそれに固有のスパース表現の形式を比較した結果、ここでの目的に関しても、分子の指紋の寄与がそれほど大きくない。化学的には、分子の間の類似性を視覚的に測定する方法としては、最大共通下部構造Ｔａｎｉｍｏｔｏ－ＭＣＳ（ＴＭＣＳ）類似性を利用することである。

Step 2.1 Calculating inter-fragment similarity In this example, all fragments are coded as binary strings, and the coding is aimed at obtaining similar codes for similar fragments. For this reason, measurements of similarity between fragments must be made. There are many ways to calculate chemical similarity. The fingerprint of a molecule is a direct binary code, where similar molecules are in principle given similar codes. However, after comparing molecular fragments and their inherent sparse representation formats, the contribution of molecular fingerprints is not very large, even for our purposes here. Chemically, a way to visually measure the similarity between molecules is to use maximum common substructure Tanimoto-MCS (TMCS) similarity.

ここで、ｍｃｓ（Ｍ１，Ｍ２）は分子Ｍ１及びＭ２の最大共通下部構造の原子数であり、ａｔｏｍｓ（Ｍ１）及びａｔｏｍｓ（Ｍ２）はそれぞれ分子Ｍ１及びＭ２の原子数である。 Here, mcs (M1, M2) is the number of atoms in the maximum common substructure of molecules M1 and M2, and atoms (M1) and atoms (M2) are the numbers of atoms in molecules M1 and M2, respectively.

Ｔａｎｉｍｏｔｏ－ＭＣＳ類似性の利点の１つはフラグメントの構造を直接比較するので、他の特定の表記に依存しないことにある。「医薬品類似」分子を比較する際には、通常、このような方法は好適である。しかし、小さなフラグメントの場合、Ｔａｎｉｍｏｔｏ－ＭＣＳ類似性には欠点がある。このため、本発明では、２つのテキスト文字列の間の類似性を測定する一般的な方法であるレーベンシュタイン距離が導入されている。レーベンシュタイン距離は、２つの文字列を同じとするのに必要な最小の挿入、削除及び置換の回数として定義される。ただし、置換による編集距離への影響を考慮して、本実施例では、レーベンシュタイン距離を改良した的ダメラウ・レーベンシュタイン距離が導入され、すなわち、２つの文字列の間のダメラウ・レーベンシュタイン距離は以下のように定義される。
One of the advantages of Tanimoto-MCS similarity is that it directly compares the structures of fragments and therefore does not rely on any other specific notation. Such methods are generally preferred when comparing "drug-like" molecules. However, for small fragments, Tanimoto-MCS similarity has drawbacks. For this reason, the present invention introduces the Levenshtein distance, a common method to measure the similarity between two text strings. Levenshtein distance is defined as the minimum number of insertions, deletions, and substitutions required to make two strings the same. However, in consideration of the effect of substitution on the edit distance, this example introduces the Damerau-Levenshtein distance, which is an improved version of the Levenshtein distance, that is, the Damerau-Levenshtein distance between two character strings is It is defined as below.

妥協案として、２つの分子Ｍ１とＭ２との間の類似性、及び対応するｓｍｉｌｅｓ表記Ｓ１及びＳ２を測定するようになり、すなわち、以下のとおりである。
As a compromise, we now measure the similarity between two molecules M1 and M2 and the corresponding smiles notations S1 and S2, namely:

ステップ２．２分子フラグメントのコーディング
全てのフラグメントは２進文字列にコーディングされる。前記文字列はフラグメント類似性に基づく平衡二分木を構築することにより作成され。次に、該木は各フラグメントに２進文字列を生成するものであり、その延伸において分子を表記する２進文字列を生成する。ライゲーションポイントの順序はそれぞれのフラグメントの識別子とされる。木を集合する際には、全てのフラグメントの間の類似性を計算する。次に、ボトムアップ型貪欲法によってフラグメントペアを形成し、ここでは、まず最も類似する２つのフラグメントをペアとする。次に、この過程を繰り返して、フラグメントが最も類似している２対を連結して４リーフ付き新木を形成する。測定の結果、算出した２つのサブ木の間の類似性はこれらの木のいずれか２つのフラグメントの間の最大類似性である。全てのフラグメントが単一の木に連結されるまで、連結過程を繰り返す。 Step 2.2 Coding of Molecular Fragments All fragments are coded into binary strings. The string is created by constructing a balanced binary tree based on fragment similarity. The tree then generates a binary string for each fragment, and its extension generates a binary string representing the molecule. The order of ligation points is taken as an identifier for each fragment. When assembling a tree, we calculate the similarity between all fragments. Next, fragment pairs are formed by a bottom-up greedy method, in which the two most similar fragments are first paired. This process is then repeated to connect the two most similar pairs of fragments to form a new four-leaf tree. As a result of the measurement, the calculated similarity between two subtrees is the maximum similarity between any two fragments of these trees. Repeat the concatenation process until all fragments are concatenated into a single tree.

全てのフラグメントが二分木に記憶されると、前記二分木を用いて全てのフラグメントについてコードを生成する。ルートからフラグメントを記憶するリーフまでの経路からそれぞれのフラグメントのコードを決定する。木のそれぞれの分岐については、図２に示すように、左向きであれば、コードに１を追加し（「１」）、右向きであれば、（「０」）を追加し、このようにして、コードの最右の文字がフラグメントに最も近い分岐に対応するようになる。 Once all fragments are stored in a binary tree, the binary tree is used to generate code for all fragments. The code of each fragment is determined from the path from the root to the leaf that stores the fragment. For each branch of the tree, as shown in Figure 2, if it goes to the left, add 1 ('1') to the code, if it goes to the right, add ('0'), and in this way , the rightmost character of the code will correspond to the branch closest to the fragment.

ステップ３．Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいて分子を生成して最適化する。 Step 3. Generate and optimize molecules based on an actor-critical reinforcement learning model.

ステップ３．１Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づくフレームワークの説明
本発明では、Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいて分子を生成して最適化し、最適化は、分子の単一のフラグメント及び該フラグメント記述における１ｂｉｔを選択して変更を行うことである。当該ビットでの値を入れ替える。すなわち、０であれば、１に変更し、逆の場合にも同様である。こにより、分子に用いられる変化の度合いを追跡することが可能になり、コードの末端でビットを変更することは、非常に類似するフラグメントの変化を表し、開始部位での変化は大幅に異なるタイプのフラグメントの変化を表すためである。図３に示すように、コーディングされるリードビットを一定に維持し、これにより、モデルでは末端でのビット変更のみを許可し、モデルが既知の化合物付近の分子しか検索でいないようにする。 Step 3.1 Description of Framework Based on Actor-Critic Reinforcement Learning Model In the present invention, molecules are generated and optimized based on the actor-critic reinforcement learning model, and the optimization consists of a single fragment of the molecule and the fragment. This is to select and change one bit in the description. Swap the value in the relevant bit. That is, if it is 0, it is changed to 1, and vice versa. This makes it possible to track the degree of change used in the molecule; changing a bit at the end of the code represents a change in a very similar fragment, while a change at the start site represents a significantly different type of change. This is to represent changes in fragments. As shown in Figure 3, we keep the coded lead bits constant, allowing the model to only change bits at the ends, ensuring that the model only searches for molecules near known compounds.

Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいてフラグメント化される分子状態、すなわち、現在の状態Ｓから始まる。Ａｃｔｏｒは全てのフラグメントを抽出してチェックし、双方向トランスフォーマーエンコーダメカニズム及びＤｅｎｓｅＮｅｔネットワークを利用して置換対象のフラグメント及び置換用のフラグメントを決定し、すなわち、Ａｃｔｏｒにより採用されるアクションＡｉは新しい状態Ｓｉを取得する。全ての制約に対する新しい状態の満足度に従って、新しい状態Ｓｉについて採点Ｒを行う。次にｃｒｉｔｉｃは、ＳｉとＳの価値から増加させる報酬の間の差Ｔｄ－ｅｒｒｏｒがａｃｔｏｒに供給されるか否かを調べる。ＹＥＳの場合、ａｃｔｏｒのアクションＡｉが強化され、ＮＯの場合、アクションが阻止される。次に、現在の状態を新しい状態で置換し、このプロセスを所定の回数繰り返す。ここで、損失関数ｌｏｓｓ＝－ｌｏｇ（ｐｒｏｂ）＊ｔｄ＿ｅｒｒｏｒである We begin with a molecular state, ie, the current state S, that is fragmented based on the actor-critical reinforcement learning model. The Actor extracts and checks all the fragments and determines the fragment to be replaced and the fragment for replacement using the bidirectional transformer encoder mechanism and the DenseNet network, i.e., the action Ai adopted by the Actor changes the new state Si get. Score R for the new state Si according to the new state's satisfaction with all constraints. Next, the critic checks whether the difference Td-error between the reward to be increased from the value of Si and S is provided to the actor. If YES, the action Ai of the actor is strengthened, and if NO, the action is blocked. The current state is then replaced with a new state and the process is repeated a predetermined number of times. Here, the loss function loss=-log(prob)*td_error

ステップ３．２強化学習モデルＡｃｔｏｒのネットワーク構造
Ａｃｔｏｒネットワークは、双方向トランスフォーマーエンコーダメカニズム及びＤｅｎｓｅＮｅｔネットワークによるモデリングを利用して、さまざまなフラグメントの分子での位置情報を導入し、トランスフォーマーエンコーダメカニズムを利用して、各分子のさまざまなフラグメントのアテンション係数を計算し、該構造の一回の読み取りは１分子のコーディングフラグメントを表し、向前き及び後向きに出力して連結し、連結された表記をＤｅｎｓｅＮｅｔニューラルネットワークを通じて、どのフラグメントを変化するかを計算し、変化後の確率分布の推定を行う。 Step 3.2 Reinforcement learning model Actor network structure The Actor network uses the bidirectional transformer encoder mechanism and modeling by DenseNet network to introduce the molecular position information of various fragments, and uses the transformer encoder mechanism to introduce the molecular position information of various fragments. , calculate the attention coefficients of different fragments of each molecule, one read of the structure represents a coding fragment of one molecule, output and concatenate forward and backward, and convert the concatenated representation into a DenseNet neural network. Through this process, we calculate which fragments to change and estimate the probability distribution after the change.

フラグメントの置換概率は分子の前進フラグメントと後続フラグメントに依存する。このため、各分子はフラグメント配列として構成され、この配列はトランスフォーマーエンコーダメカニズムに一括して伝達される。各分子のさまざまなフラグメントのアテンション係数を計算することにより、各フラグメントの重要性が得られる。図４に示すように、次に、フォワード及びバックワードトランスフォーマーエンコーダによって１分子のさまざまなフラグメント相関性を有するベクトル化表記が入力され、最後に、連結の結果はＤｅｎｓｅＮｅｔネットワークによって分類され、どのフラグメントを変化するかの計算及び変化後の確率分布の推定が行われる。 The fragment displacement probability depends on the forward and trailing fragments of the molecule. To this end, each molecule is organized as a fragment sequence, and this sequence is transmitted en masse to the transformer encoder mechanism. By calculating the attention coefficients of the various fragments of each molecule, the importance of each fragment is obtained. As shown in Figure 4, vectorized representations with different fragment correlations of one molecule are then input by forward and backward transformer encoders, and finally, the concatenation result is classified by a DenseNet network to identify which fragments. Calculation of whether the change will occur and estimation of the probability distribution after the change are performed.

ステップ３．３強化学習モデルの報酬メカニズムの最適化
創薬では、最も重大な課題は複数の特性を最適化させた分子の設計であり、これらの特性には好適な関連性がない場合がある。提案されている方法では、このような状況に対応できることを確かめるために、２種の異なる特性が選択され、これらの特性は医薬品としての分子のフィージビリティを表し得る。本発明の目的は、実際の活性分子の性質により近い医薬品の分子を生成し、すなわち、所望の「最適位置」で分子を生成することである。前記したとおり、選択された性質は分子自体の固有属性情報（例えば、ＭＷ、ｃｌｏｇＰやＰＳＡなど）及び分子計算活性情報（すなわち、分子と特定の疾患の対応する標的とのドッキング結果の情報）である。なお、本発明では、強化学習モデルの報酬メカニズム部分は単層パーセプトロンモデルを構築することで報酬結果の予測を行う。このモデルは訓練と予測との２つの段階を含む。訓練過程では、データセットは、従来の文献報告により活性を有するものとして知られている分子由来のデータセットの陽性サンプルと、同じ数量のＺＩＮＣライブラリからランダムにサンプリングするものに由来するデータセットの陰性サンプルとの２つの部分の由来を含み、陽性サンプル及び陰性サンプルの順序を乱したものを順次ドッキングして得られた計算活性情報及び従来のツールキットによって算出された分子固有属性情報を入力として、複数の訓練によって、モデルは活性計算情報及び属性情報と本当に活性があるか否かとの潜在的な相関関係を学習する。予測過程では、該モデルでは、生成分子の計算活性情報は、進的かつ効率的な医薬品ドッキングソフトウェアを用いて生成分子と疾患に関連する標的とについて仮想分子ドッキングを行うことにより得られる。該モデルは、医薬品ドッキングソフトウェア、例えばＬｅｄｏｃｋによって、各ｅｐｏｃｈによって生成される５１２個以下の分子とＭｐｒｏ新型コロナウイルスに関連する異なるコンフォメーションの３８０個の標的に関する従来のＰＤＢファイルとについて仮想分子ドッキングを行う。生成分子の固有属性情報は、汎用ソフトウェアパッケージＲＤＫｉｔを用いて計算されるものであり、生成分子の計算活性情報及び分子自体の固有属性情報の合計１１４３個の超高次元パラメータを単層パーセプトロンの入力として、生成分子が実際の活性を有するか否かを予測し、生成される分子の活性をさらに最適化させる。該強化学習フレームワークのａｃｔｏｒは、有効な分子を生成するごとに報酬が付与され、工夫して予測モデルの期待に合致する分子を取得した場合、より高い報酬が付与される。 Step 3.3 Optimizing the Reward Mechanism of Reinforcement Learning Models In drug discovery, the most critical challenge is the design of molecules that optimize multiple properties, and these properties may not have favorable relationships. . In order to ensure that the proposed method is compatible with this situation, two different properties are selected, which can represent the feasibility of the molecule as a drug. The aim of the present invention is to produce molecules of pharmaceutical products closer to the properties of the actual active molecules, ie to produce molecules in the desired "optimal position". As mentioned above, the selected properties are information on the intrinsic attributes of the molecule itself (e.g., MW, clogP, PSA, etc.) and molecular computational activity information (i.e., information on the docking results of the molecule with the corresponding target of a specific disease). be. In the present invention, the reward mechanism part of the reinforcement learning model predicts the reward result by constructing a single-layer perceptron model. This model includes two stages: training and prediction. During the training process, the dataset is divided into positive samples from molecules known to be active according to previous literature reports, and negative samples from the dataset from random samplings from the same amount of ZINC libraries. Inputting the calculated activity information obtained by sequentially docking the positive and negative samples, including the origin of the two parts with the sample, and the molecule-specific attribute information calculated by a conventional toolkit, Through multiple trainings, the model learns the potential correlation between activation calculation information and attribute information and whether or not there is really activation. In the prediction process, in the model, computational activity information of the product molecules is obtained by performing virtual molecular docking of the product molecules with disease-related targets using an advanced and efficient drug docking software. The model performs virtual molecule docking with pharmaceutical docking software, e.g. Ledock, for up to 512 molecules generated by each epoch and a conventional PDB file of 380 targets of different conformations related to the Mpro novel coronavirus. conduct. The unique attribute information of the generated molecule is calculated using the general-purpose software package RDKit, and a total of 1143 ultra-high-dimensional parameters, including the calculated activity information of the generated molecule and the unique attribute information of the molecule itself, are input into a single-layer perceptron. As a result, it is possible to predict whether the generated molecule has actual activity or not, and further optimize the activity of the generated molecule. The actors of the reinforcement learning framework are rewarded each time they generate a valid molecule, and are rewarded with a higher reward if they make efforts to obtain a molecule that meets the expectations of the prediction model.

最終的に生成された新型コロナウイルスＭｐｒｏ標的に対する活性化合物分子は図５に示される。 The final generated active compound molecules against the novel coronavirus Mpro target are shown in Figure 5.

なお、以上の本発明の前記実施例は説明的なものに過ぎず、本発明を限定するものではなく、このため、本発明は上記の特定の形態に限定されるものではない。当業者が本発明の原理を逸脱することなく本発明に基づて得る他の形態は全て本発明の特許範囲に属する。 It should be noted that the above-described embodiments of the present invention are merely illustrative and do not limit the present invention, and therefore the present invention is not limited to the specific embodiments described above. All other forms that a person skilled in the art may deduce based on the invention without departing from the principles of the invention are within the patentable scope of the invention.

Claims

An intelligent generation method for pharmaceutical molecules based on reinforcement learning and docking, specifically:
Step 1 of constructing a virtual fragment combination library for drug design, comprising:
A drug molecule virtual fragment combination library is a set of molecules fragmented using a conventional toolkit, and when dividing a molecule, the fragments are not classified and are all treated as the same (step 1);
Step 2 of calculating fragment similarity and performing molecular fragment coding,
Measure the similarity between different molecular fragments by traditional combinatorial methods to calculate chemical similarity, and code all fragments into binary strings to make them similar by constructing a balanced binary tree based on similarity. give similar coding for fragments ,
In calculating the similarity between fragments, when comparing "drug-like" molecules, we specifically compare the similarity using the maximum common substructure Tanimoto-MCS, and for small fragments, we use the Levenshtein distance. In this case, we define the Damerau-Levenshtein distance between two strings as follows,
Define the TMCS distance between two molecules M1 and M2 as follows,
In this case, the similarity between the two molecules M1 and M2 and the corresponding smiles notations S1 and S2, i.e.
Step 2 of measuring
Step 3 of generating and optimizing molecules based on an actor-critical reinforcement learning model,
(1) Description of the framework based on the actor-critic reinforcement learning model Generate and optimize a molecule based on the actor-critic reinforcement learning model, select a single fragment of the molecule and 1 bit in the fragment description, and change it. and transpose the value in that bit, i.e. if it is 0, change it to 1, and vice versa, making it possible to track the degree of change used in the numerator and coded We keep the lead bits constant, which allows the model to only change bits at the ends, allowing the model to only search for molecules near known compounds,
The molecular state to be fragmented based on the Actor-critical reinforcement learning model, i.e. starting from the current state, the Actor extracts and checks all fragments, introduces the position information in the molecule of different fragments, and transforms the encoder The mechanism is used to calculate the attention coefficient of each fragment of each molecule, and then the fragment to be replaced and the fragment for replacement are determined by the DenseNet network output probability, according to the new state's satisfaction with all constraints. Scoring the new state, the critic then checks whether the difference TD-Error between the new state and the reward to be increased from the value of the current state is provided to the actor, and if YES, the actor's the action is reinforced, if NO, the action is blocked, then replacing the current state with a new state and repeating this process a predetermined number of times;
(2) Optimization of the reward mechanism of reinforcement learning model
The reward mechanism part of the reinforcement learning model predicts the reward result by designing a molecule with unique attribute information of the molecule itself and molecular computational activity information that matches expectations, and constructing a perceptron model that includes two stages: training and prediction. During the training process, the dataset consists of positive samples of the dataset derived from molecules known to have activity according to previous literature reports, and data derived from random sampling from the ZINC library of the same quantity. The calculation activity information obtained by sequentially docking the positive and negative samples in a disordered order and the molecule-specific attribute information calculated by the conventional toolkit are input. , Through multiple trainings, the model learns the potential correlation between activity calculation information and attribute information and whether it is really active, and in the prediction process, the model uses advanced and efficient drug docking software. The computational activity information of the generated molecule obtained by virtual molecule docking of the generated molecule and the conventional related PDB file of the disease-related target, and the unique attribute information of the generated molecule calculated using a general-purpose software package. As input , the actor of the reinforcement learning model predicts whether or not the generated molecule has actual activity , and is given a reward each time it generates an effective molecule. 3. An intelligent generation method comprising step 3 in which a higher reward is given if a molecule having molecular calculation activity information can be obtained .

In step 1, in the molecule splitting, all single bonds extending from one ring atom are broken, and when splitting the molecule, a fragment chain list is created to record and store the original splitting point, Serves as a connecting point in later molecular design,
As long as the total number of ligation points is constant, it is possible to exchange fragments with different numbers of ligation points,
In this process, molecular cleavage was performed using the open source toolkit RDKit,
The method for intelligent generation of pharmaceutical molecules based on reinforcement learning and docking according to claim 1, characterized in that fragments with more than 12 heavy atoms are discarded, and fragments with more than 4 ligation points are also discarded.

In step 2, in the molecular fragment code, the string is created by constructing a balanced binary tree based on fragment similarity, and then the tree generates a binary string for each fragment. , in its stretching generates a binary string representing the molecule, the order of the ligation points being taken as an identifier for each fragment,
When assembling a tree, we calculate the similarity between all fragments, and then form fragment pairs using a bottom-up greedy method, where we first pair the two most similar fragments, then Then, by repeating this process, we concatenate the two pairs of fragments that are most similar to form a new tree with 4 leaves. is the maximum similarity between the two fragments,
Repeat the concatenation process until all fragments are concatenated into a single tree,
Once all fragments are stored in the binary tree, generate codes for all fragments using the binary tree, determine the code for each fragment from the path from the root to the leaf that stores the fragment, and For a branch, if it goes left, we add 1 to the code, if it goes right, we add 0, so that the rightmost character of the code corresponds to the branch closest to the fragment. 2. The intelligent generation method of pharmaceutical molecules based on reinforcement learning and docking according to claim 1.